Read XML file with Nokogiri

fairyliquid · Sep 9, 2010

Hi

This has been bugging me all day...
I'm a Ruby newbie and I need some help..
I'm trying to read an XML file to retrieve 2 values.

Below is an example of the xml.

<?xml version="1.0" encoding="utf-16"?>
<resources applicationName="Testapp" author="Joe Bloggs">
<resource type="assembly" name="reader.dll" version="1.0.0.1"/>
<resource type="sql" name="script.sql" version="1.0.0.0"/>
<resource type="file" name="text.txt" version="2.0.0.3"/>
<resource type="assembly" name="writer.dll" version="1.12.0.765"/>
</resources>

I want to retrieve the "name" and "version" for each "resource" that is of type="assembly" using Nokogiri.

I need to use Nokogiri as it is able to read UTF16 files and can handle & etc...

Can someone help me please? I don't have the code to hand that I wrote today, but can update this request tomorrow with my attempts so far.

I find the XML thing hard.

My Ruby version is 1.9.1 P429

Thanks in advance.

fairyliquid · Sep 10, 2010

As promised here is the code I wrote so far;

filename = Test.xml'
file_content = File.read(filename)
xmldoc = Nokogiri::XML(file_content)
puts "Root attribute : " + xmldoc.root.attributes["applicationName"]

puts "Type : " + xmldoc.elements.elements[1].attributes["resource/type"]

I can get the "applicationName" attribute, but cannot bring back the "type" attribute.

feherke · Sep 10, 2010

Hi

Like this ?

Ruby:

[red]require[/red] [green][i]'nokogiri'[/i][/green]
xml[teal]=[/teal]Nokogiri[teal]::[/teal]XML File[teal].[/teal]open [green][i]'fairyliquid.xml'[/i][/green]
xml[teal].[/teal]xpath[teal]([/teal][green][i]'//resource'[/i][/green][teal]).[/teal]each [teal]{[/teal] [teal]|[/teal]node[teal]|[/teal] puts [green][i]" - #{ node['name'] } ( #{ node['version'] } )"[/i][/green] [teal]}[/teal]

Code:

 - reader.dll ( 1.0.0.1 )
 - script.sql ( 1.0.0.0 )
 - text.txt ( 2.0.0.3 )
 - writer.dll ( 1.12.0.765 )

fairyliquid said:
I need to use Nokogiri as it is able to read UTF16 files and can handle & etc...

Well, I had problem with the encoding, so I had to change it to UTF-8 in the sample file to work. Regarding the ampersand ( & ) and generally any character entities, every XML library should treat them the same way.

Feherke.

http://free.rootshell.be/~feherke/

fairyliquid · Sep 10, 2010

Thanks for your quick response!

The XML sample I gave does work with your code, however I missed out a node. The xml to parse should look like this:

<?xml version="1.0" encoding="utf-16"?>
<resourcespec applicationName="Testapp" author="Joe Bloggs">
<resources>
<resource type="assembly" name="reader.dll" version="1.0.0.1"/>
<resource type="sql" name="script.sql" version="1.0.0.0"/>
<resource type="file" name="text.txt" version="2.0.0.3"/>
<resource type="assembly" name="writer.dll" version="1.12.0.765"/>
</resources>
</resourcespec>

So taking your code I tried to extend it to cover the new node. I still want the same values but keep getting an error - "Can't convert nil into String"

xmldoc.xpath('//resourcespec/resources/resource').each { |node| puts " - #{ node['type'] } ( #{ node['name'] } )" }

Also, within the above line how do you insert a case statement? I want to see if the "type" is an "assembly" to get the "name".
This is probably really basic stuff, but I'm struggling with the formatting of the code and the countlesss {} []s.

If it was .Net I'd be fine.

Thanks

fairyliquid · Sep 10, 2010

It's ok I sorted it out.

Just placed the do and end in the right place. So now it's full steam ahead. Thanks again!

xmldoc.xpath('//resourcespec/resources/resource').each do |node| puts " - #{ node['type'] } ( #{ node['name'] } )"
end

feherke · Sep 10, 2010

Hi

fairyliquid said:
The XML sample I gave does work with your code, however I missed out a node. [gray](...)[/gray] So taking your code I tried to extend it to cover the new node.

No need to change anything. In XPath "//" means anywhere in the structure. So unless you want to exclude resource nodes found somewhere else, ( not children of a resources node, ) you have to modify nothing. Anyway, your modified code works for me.

fairyliquid said:
Also, within the above line how do you insert a case statement? I want to see if the "type" is an "assembly" to get the "name".

So far that sounds like a simple [tt]if[/tt]. That can be filtered out in XPath :

Code:

xml[teal].[/teal]xpath[teal]([/teal][green][i]'//resource[highlight][@type="assembly"][/highlight]'[/i][/green][teal]).[/teal]each [teal]{[/teal] [teal]|[/teal]node[teal]|[/teal] puts [green][i]" - #{ node['name'] } ( #{ node['version'] } )"[/i][/green] [teal]}[/teal]

But of course, an [tt]if[/tt] in the Ruby code can also do it :

Code:

xml[teal].[/teal]xpath[teal]([/teal][green][i]'//resource'[/i][/green][teal]).[/teal]each [teal]{[/teal] [teal]|[/teal]node[teal]|[/teal] puts [green][i]" - #{ node['name'] } ( #{ node['version'] } ) #{ node['type'] }"[/i][/green] [highlight][b]if[/b] node[teal][[/teal][green][i]'type'[/i][/green][teal]]==[/teal][green][i]'assembly'[/i][/green][/highlight] [teal]}[/teal]

If you change the code block's syntax will be simpler to see where to add more code :

Code:

xml[teal].[/teal]xpath[teal]([/teal][green][i]'//resource'[/i][/green][teal]).[/teal]each [b]do[/b] [teal]|[/teal]node[teal]|[/teal]
  [b]if[/b] node[teal][[/teal][green][i]'type'[/i][/green][teal]]==[/teal][green][i]'assembly'[/i][/green]
    puts [green][i]" - #{ node['name'] } ( #{ node['version'] } ) #{ node['type'] }"[/i][/green]
  [b]end[/b]
[b]end[/b]

Then changing to [tt]case[/tt] is a snap :

Code:

xml[teal].[/teal]xpath[teal]([/teal][green][i]'//resource'[/i][/green][teal]).[/teal]each [b]do[/b] [teal]|[/teal]node[teal]|[/teal]
  [b]case[/b] node[teal][[/teal][green][i]'type'[/i][/green][teal]][/teal]
    [b]when[/b] [green][i]'assembly'[/i][/green]
      puts [green][i]" - #{ node['name'] } ( #{ node['version'] } ) #{ node['type'] }"[/i][/green]
    [b]when[/b] [green][i]'sql'[/i][/green]
      puts [green][i]' - Handling sql resources is on TODO list'[/i][/green]
    [b]else[/b]
      puts [green][i]" - Unhandled resource type : #{ node['type'] }"[/i][/green]
  [b]end[/b]
[b]end[/b]

Feherke.

http://free.rootshell.be/~feherke/

feherke · Sep 10, 2010

Hi

Ah, you are too fast to be helped by me. ;-) Glad to hear you solved it.

Feherke.

http://free.rootshell.be/~feherke/

fairyliquid · Sep 10, 2010

My issue is not resolved

I omitted to say I have a namespace which has been generated by BizTalk. The sample xml file did not show this as I didn't think it would be an issue.

If the namespace is removed from the file parsing works correctly.

So my next question is how do you remove the offending namespace?

The namespace offender is:
xmlns="

http://schemas.microsoft.com/BizTalk/ApplicationDeployment/ResourceSpec/2004/12"

FYI the xml is now:

<?xml version="1.0" encoding="utf-16"?>
<resourcespec xmlns:xsi="

http://www.w3.org/2001/XMLSchema-instance"

xmlns:xsd="

http://www.w3.org/2001/XMLSchema"

ApplicationName="Mercury.Syndication2" xmlns="

http://schemas.microsoft.com/BizTalk/ApplicationDeployment/ResourceSpec/2004/12"

applicationName="Testapp" author="Joe Bloggs">
<resources>
<resource type="assembly" name="reader.dll" version="1.0.0.1"/>
<resource type="sql" name="script.sql" version="1.0.0.0"/>
<resource type="file" name="text.txt" version="2.0.0.3"/>
<resource type="assembly" name="writer.dll" version="1.12.0.765"/>
</resources>
</resourcespec>

feherke · Sep 10, 2010

Hi

Code:

xml[teal].[/teal]xpath[teal]([/teal][green][i]'//[highlight]biz:[/highlight]resource'[/i][/green][highlight][teal],[/teal][green][i]'biz'[/i][/green][teal]=>[/teal][green][i]'[URL unfurl="true"]http://schemas.microsoft.com/BizTalk/ApplicationDeployment/ResourceSpec/2004/12'[/URL][/i][/green][/highlight][teal]).[/teal]each [teal]{[/teal] [teal]|[/teal]node[teal]|[/teal] puts [green][i]" - #{ node['name'] } ( #{ node['version'] } )"[/i][/green] [teal]}[/teal]

Feherke.

http://free.rootshell.be/~feherke/

fairyliquid · Sep 10, 2010

Yeeehhaaaaaa!!!

Thank you for your speedy replies.
This is definitely the fix.
I'll now about the namespace thing in future.
Simple but clever!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Read XML file with Nokogiri

fairyliquid

Programmer

fairyliquid

Programmer

feherke

Programmer

fairyliquid

Programmer

fairyliquid

Programmer

feherke

Programmer

feherke

Programmer

fairyliquid

Programmer

feherke

Programmer

fairyliquid

Programmer

Similar threads

Part and Inventory Search

Sponsor