Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing XML with XML::Twig help

Status
Not open for further replies.

GZPM

IS-IT--Management
Jan 20, 2006
18
US
Hi,

Please see below code snippet and advise on what I may be doing wrong. Trying to print only once instance of EAN or ISBN per record. Each xml record lists one or multiple ISBN or EAN identifiers, but I want my output to list (in this order) the first found EAN, and if no EAN exists the first found ISBN. Just one identifier per record.

here's the code:

Code:
#!/usr/bin/perl

use strict;
use XML::Twig;
use diagnostics;


open OUTPUT, ">test.txt" or die "Cannot open file for write: $!";
our @files = 'test_data.xml';	

foreach my $file(@files){
      my $t= XML::Twig->new( 
           twig_roots   => {
 'product/productidentifier/b244' => 
\&print_identifier_text,
 'Product/ProductIdentifier/IDValue' =>
\&print_identifier_text,
 'Product/ISBN' => 
\&print_identifier_text,
 'Product/EAN13' => 
\&print_identifier_text,
 'product/b004' => 
\&print_identifier_text,
 'product/b005' => 
\&print_identifier_text
 keep_encoding 	=> 1
           }
           ) or die "Cannot find: $!";
  $t->parsefile($file);
  print OUTPUT "<%END%>\n\n";
  }
  
  
  sub print_identifier_text {
  	
  my ($t,$elt) = @_;
  	
  my $ean = $elt->text if $elt->text =~ /^\d{13}$/; 
  my $isbn = $elt->text if $elt->text =~ /^\d{9}[0-9|x]$/i;
  	  	
  if ($ean and $isbn){
  	print "<%END%>\n\n<%EAN%>" . $ean . "\n";
  	}
  	else {
  	print "<%END%>\n\n<%ISBN%>" . $isbn . "\n";
  	}
  	$t->purge;
  }
close OUTPUT;

and here's a sample xml record. I would only want to see the first instance of the EAN (9783540028239) in my output, nothing else.

<product>
<a001>978-3-540-02823</a001>
<a002>03</a002>
<productidentifier>
<b221>02</b221>
<b244>3540028234</b244>
</productidentifier>
<productidentifier>
<b221>03</b221>
<b244>9783540028239</b244>
</productidentifier>
<productidentifier>
<b221>15</b221>
<b244>9783540028239</b244>
</productidentifier>
<productidentifier>
<b221>01</b221>
<b233>Publishers Order No</b233>
<b244>10171625</b244>
</productidentifier>
<productidentifier>
<b221>06</b221>
<b244>10.1007/b16890</b244>
</productidentifier>
<b012>BB</b012>
<b028>Hebräisches und Aramäisches Handwörterbuch über das Alte Testament</b028>
<title>
<b202>01</b202>
<b203>Hebräisches und Aramäisches Handwörterbuch über das Alte Testament</b203>
</title>
<b057>17</b057>
<b217>7</b217>
<b058>17. Aufl. 1915. Unveränd. Neudruck</b058>
<language>
<b253>01</b253>
<b252>ger</b252>
</language>
<language>
<b253>01</b253>
<b252>ine</b252>
</language>
<b254>XIX</b254>
<b255>1013</b255>
<b064>SCI000000</b064>
<subject>
<b067>10</b067>
<b069>SCI000000</b069>
</subject>
<subject>
<b067>20</b067>
<b070>Handwörterbuch</b070>
</subject>
<subject>
<b067>20</b067>
<b070>Hebräisch /Wörterbücher, Fachausdrücke</b070>
</subject>
<subject>
<b067>20</b067>
<b070>Hebräisches Handwörterbuch</b070>
</subject>
<subject>
<b067>20</b067>
<b070>Aramäer /Wörterbücher, Fachausdrücke</b070>
</subject>
<subject>
<b067>20</b067>
<b070>Altes Testament /Lexikon, Wörterbuch</b070>
</subject>
<publisher>
<b291>01</b291>
<b241>02</b241>
<b243>SPVB</b243>
<b081>Springer</b081>
</publisher>
<b003>19620101</b003>
<b087>1915</b087>
<measure>
<c093>08</c093>
<c094>1980</c094>
<c095>gr</c095>
</measure>
<measure>
<c093>01</c093>
<c094>242</c094>
<c095>mm</c095>
</measure>
<measure>
<c093>02</c093>
<c094>170</c094>
<c095>mm</c095>
</measure>
<supplydetail>
<j137>Springer</j137>
<j292>01</j292>
<j141>IP</j141>
<j260>00</j260>
<j142>19620101</j142>
<price>
<j148>01</j148>
<j262>Recomm. price</j262>
<discountcoded>
<j363>02</j363>
<j378>Product discount group</j378>
<j364>DGNY2</j364>
</discountcoded>
<j266>02</j266>
<j151>55.95</j151>
<j152>USD</j152>
<j161>20040207</j161>
</price>
<price>
<j148>03</j148>
<j262>Retail price</j262>
<discountcoded>
<j363>02</j363>
<j378>Product discount group</j378>
<j364>DGNY2</j364>
</discountcoded>
<j266>02</j266>
<j151>65.37</j151>
<j152>EUR</j152>
<b251>DE</b251>
<j161>20020101</j161>
</price>
</supplydetail>
</product>

Thanks and hope to hear from the community!
 
[0] Not sure how to blend it to other functionality in your script, but, on the ean and isbn coming from multiple element of productidentifier, you can build an array to hold the data and print it until it's time. The trouble is Twig cannot, in a sense, look back with sax-like processing idea for efficiency.

[1] The main.
[tt]
#...etc etc
our @files = 'test_data.xml';
[blue]my @a_ean;
my @a_isbn;[/blue]
foreach my $file(@files){
[blue]#just to reset them
@a_ean=(0) x @a_ean;
@a_isbn=(0) x @a_isbn;[/blue]
my $t= XML::Twig->new(
#etc etc...
) or die "Cannot find: $!";
$t->parsefile($file);
[blue]#now the timing and what printed may need to adjust to blend correctly with other treatments of other data
if(@a_ean) {
print shift(@a_ean);
} elsif (@a_isbn) {
print shift (@a_isbn);
} else {
print "no ean or isbn identifier is found.";
}
#add other functionlity according to need[/blue]

print OUTPUT "<%END%>\n\n";
}
#etc etc...
close OUTPUT;
[/tt]
[2] The processing sub.
[tt]
sub print_identifier_text {
my ($t,$elt) = @_;
my $ean = $elt->text if $elt->text =~ /^\d{13}$/;
my $isbn = $elt->text if $elt->text =~ /^\d{9}[0-9|x]$/i;
[blue]#print got delayed - how and when to print is to consider separately[/blue]
#if ($ean and $isbn){
[blue]if ($ean){
#print "<%END%>\n\n<%EAN%>" . $ean . "\n";
push(@a_ean,$ean);
} elsif ($isbn) {
#print "<%END%>\n\n<%ISBN%>" . $isbn . "\n";
push(@a_isbn,$isbn);
}[/blue]
$t->purge;
}[/tt]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top