Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Read XML file with Nokogiri

Status
Not open for further replies.

fairyliquid

Programmer
Sep 9, 2010
7
0
0
GB
Hi

This has been bugging me all day...
I'm a Ruby newbie and I need some help..
I'm trying to read an XML file to retrieve 2 values.

Below is an example of the xml.

<?xml version="1.0" encoding="utf-16"?>
<resources applicationName="Testapp" author="Joe Bloggs">
<resource type="assembly" name="reader.dll" version="1.0.0.1"/>
<resource type="sql" name="script.sql" version="1.0.0.0"/>
<resource type="file" name="text.txt" version="2.0.0.3"/>
<resource type="assembly" name="writer.dll" version="1.12.0.765"/>
</resources>

I want to retrieve the "name" and "version" for each "resource" that is of type="assembly" using Nokogiri.

I need to use Nokogiri as it is able to read UTF16 files and can handle & etc...

Can someone help me please? I don't have the code to hand that I wrote today, but can update this request tomorrow with my attempts so far.

I find the XML thing hard.

My Ruby version is 1.9.1 P429

Thanks in advance.
 
As promised here is the code I wrote so far;

filename = Test.xml'
file_content = File.read(filename)
xmldoc = Nokogiri::XML(file_content)
puts "Root attribute : " + xmldoc.root.attributes["applicationName"]


puts "Type : " + xmldoc.elements.elements[1].attributes["resource/type"]

I can get the "applicationName" attribute, but cannot bring back the "type" attribute.
 
Hi

Like this ?
Ruby:
[red]require[/red] [green][i]'nokogiri'[/i][/green]
xml[teal]=[/teal]Nokogiri[teal]::[/teal]XML File[teal].[/teal]open [green][i]'fairyliquid.xml'[/i][/green]
xml[teal].[/teal]xpath[teal]([/teal][green][i]'//resource'[/i][/green][teal]).[/teal]each [teal]{[/teal] [teal]|[/teal]node[teal]|[/teal] puts [green][i]" - #{ node['name'] } ( #{ node['version'] } )"[/i][/green] [teal]}[/teal]
Code:
 - reader.dll ( 1.0.0.1 )
 - script.sql ( 1.0.0.0 )
 - text.txt ( 2.0.0.3 )
 - writer.dll ( 1.12.0.765 )
fairyliquid said:
I need to use Nokogiri as it is able to read UTF16 files and can handle & etc...
Well, I had problem with the encoding, so I had to change it to UTF-8 in the sample file to work. Regarding the ampersand ( & ) and generally any character entities, every XML library should treat them the same way.


Feherke.
 
Thanks for your quick response!

The XML sample I gave does work with your code, however I missed out a node. The xml to parse should look like this:

<?xml version="1.0" encoding="utf-16"?>
<resourcespec applicationName="Testapp" author="Joe Bloggs">
<resources>
<resource type="assembly" name="reader.dll" version="1.0.0.1"/>
<resource type="sql" name="script.sql" version="1.0.0.0"/>
<resource type="file" name="text.txt" version="2.0.0.3"/>
<resource type="assembly" name="writer.dll" version="1.12.0.765"/>
</resources>
</resourcespec>

So taking your code I tried to extend it to cover the new node. I still want the same values but keep getting an error - "Can't convert nil into String"

xmldoc.xpath('//resourcespec/resources/resource').each { |node| puts " - #{ node['type'] } ( #{ node['name'] } )" }

Also, within the above line how do you insert a case statement? I want to see if the "type" is an "assembly" to get the "name".
This is probably really basic stuff, but I'm struggling with the formatting of the code and the countlesss {} []s.

If it was .Net I'd be fine.

Thanks
 
It's ok I sorted it out.

Just placed the do and end in the right place. So now it's full steam ahead. Thanks again! :)

xmldoc.xpath('//resourcespec/resources/resource').each do |node| puts " - #{ node['type'] } ( #{ node['name'] } )"
end
 
Hi

fairyliquid said:
The XML sample I gave does work with your code, however I missed out a node. [gray](...)[/gray] So taking your code I tried to extend it to cover the new node.
No need to change anything. In XPath "//" means anywhere in the structure. So unless you want to exclude resource nodes found somewhere else, ( not children of a resources node, ) you have to modify nothing. Anyway, your modified code works for me.
fairyliquid said:
Also, within the above line how do you insert a case statement? I want to see if the "type" is an "assembly" to get the "name".
So far that sounds like a simple [tt]if[/tt]. That can be filtered out in XPath :
Code:
xml[teal].[/teal]xpath[teal]([/teal][green][i]'//resource[highlight][@type="assembly"][/highlight]'[/i][/green][teal]).[/teal]each [teal]{[/teal] [teal]|[/teal]node[teal]|[/teal] puts [green][i]" - #{ node['name'] } ( #{ node['version'] } )"[/i][/green] [teal]}[/teal]
But of course, an [tt]if[/tt] in the Ruby code can also do it :
Code:
xml[teal].[/teal]xpath[teal]([/teal][green][i]'//resource'[/i][/green][teal]).[/teal]each [teal]{[/teal] [teal]|[/teal]node[teal]|[/teal] puts [green][i]" - #{ node['name'] } ( #{ node['version'] } ) #{ node['type'] }"[/i][/green] [highlight][b]if[/b] node[teal][[/teal][green][i]'type'[/i][/green][teal]]==[/teal][green][i]'assembly'[/i][/green][/highlight] [teal]}[/teal]
If you change the code block's syntax will be simpler to see where to add more code :
Code:
xml[teal].[/teal]xpath[teal]([/teal][green][i]'//resource'[/i][/green][teal]).[/teal]each [b]do[/b] [teal]|[/teal]node[teal]|[/teal]
  [b]if[/b] node[teal][[/teal][green][i]'type'[/i][/green][teal]]==[/teal][green][i]'assembly'[/i][/green]
    puts [green][i]" - #{ node['name'] } ( #{ node['version'] } ) #{ node['type'] }"[/i][/green]
  [b]end[/b]
[b]end[/b]
Then changing to [tt]case[/tt] is a snap :
Code:
xml[teal].[/teal]xpath[teal]([/teal][green][i]'//resource'[/i][/green][teal]).[/teal]each [b]do[/b] [teal]|[/teal]node[teal]|[/teal]
  [b]case[/b] node[teal][[/teal][green][i]'type'[/i][/green][teal]][/teal]
    [b]when[/b] [green][i]'assembly'[/i][/green]
      puts [green][i]" - #{ node['name'] } ( #{ node['version'] } ) #{ node['type'] }"[/i][/green]
    [b]when[/b] [green][i]'sql'[/i][/green]
      puts [green][i]' - Handling sql resources is on TODO list'[/i][/green]
    [b]else[/b]
      puts [green][i]" - Unhandled resource type : #{ node['type'] }"[/i][/green]
  [b]end[/b]
[b]end[/b]

Feherke.
 
My issue is not resolved :(

I omitted to say I have a namespace which has been generated by BizTalk. The sample xml file did not show this as I didn't think it would be an issue.

If the namespace is removed from the file parsing works correctly.

So my next question is how do you remove the offending namespace?

The namespace offender is:
xmlns="
FYI the xml is now:

<?xml version="1.0" encoding="utf-16"?>
<resourcespec xmlns:xsi=" xmlns:xsd=" ApplicationName="Mercury.Syndication2" xmlns=" applicationName="Testapp" author="Joe Bloggs">
<resources>
<resource type="assembly" name="reader.dll" version="1.0.0.1"/>
<resource type="sql" name="script.sql" version="1.0.0.0"/>
<resource type="file" name="text.txt" version="2.0.0.3"/>
<resource type="assembly" name="writer.dll" version="1.12.0.765"/>
</resources>
</resourcespec>
 
Hi

Code:
xml[teal].[/teal]xpath[teal]([/teal][green][i]'//[highlight]biz:[/highlight]resource'[/i][/green][highlight][teal],[/teal][green][i]'biz'[/i][/green][teal]=>[/teal][green][i]'[URL unfurl="true"]http://schemas.microsoft.com/BizTalk/ApplicationDeployment/ResourceSpec/2004/12'[/URL][/i][/green][/highlight][teal]).[/teal]each [teal]{[/teal] [teal]|[/teal]node[teal]|[/teal] puts [green][i]" - #{ node['name'] } ( #{ node['version'] } )"[/i][/green] [teal]}[/teal]

Feherke.
 
Yeeehhaaaaaa!!!

Thank you for your speedy replies.
This is definitely the fix.
I'll now about the namespace thing in future.
Simple but clever!
:)

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top