Hi,
Please see below code snippet and advise on what I may be doing wrong. Trying to print only once instance of EAN or ISBN per record. Each xml record lists one or multiple ISBN or EAN identifiers, but I want my output to list (in this order) the first found EAN, and if no EAN exists the first found ISBN. Just one identifier per record.
here's the code:
and here's a sample xml record. I would only want to see the first instance of the EAN (9783540028239) in my output, nothing else.
<product>
<a001>978-3-540-02823</a001>
<a002>03</a002>
<productidentifier>
<b221>02</b221>
<b244>3540028234</b244>
</productidentifier>
<productidentifier>
<b221>03</b221>
<b244>9783540028239</b244>
</productidentifier>
<productidentifier>
<b221>15</b221>
<b244>9783540028239</b244>
</productidentifier>
<productidentifier>
<b221>01</b221>
<b233>Publishers Order No</b233>
<b244>10171625</b244>
</productidentifier>
<productidentifier>
<b221>06</b221>
<b244>10.1007/b16890</b244>
</productidentifier>
<b012>BB</b012>
<b028>Hebräisches und Aramäisches Handwörterbuch über das Alte Testament</b028>
<title>
<b202>01</b202>
<b203>Hebräisches und Aramäisches Handwörterbuch über das Alte Testament</b203>
</title>
<b057>17</b057>
<b217>7</b217>
<b058>17. Aufl. 1915. Unveränd. Neudruck</b058>
<language>
<b253>01</b253>
<b252>ger</b252>
</language>
<language>
<b253>01</b253>
<b252>ine</b252>
</language>
<b254>XIX</b254>
<b255>1013</b255>
<b064>SCI000000</b064>
<subject>
<b067>10</b067>
<b069>SCI000000</b069>
</subject>
<subject>
<b067>20</b067>
<b070>Handwörterbuch</b070>
</subject>
<subject>
<b067>20</b067>
<b070>Hebräisch /Wörterbücher, Fachausdrücke</b070>
</subject>
<subject>
<b067>20</b067>
<b070>Hebräisches Handwörterbuch</b070>
</subject>
<subject>
<b067>20</b067>
<b070>Aramäer /Wörterbücher, Fachausdrücke</b070>
</subject>
<subject>
<b067>20</b067>
<b070>Altes Testament /Lexikon, Wörterbuch</b070>
</subject>
<publisher>
<b291>01</b291>
<b241>02</b241>
<b243>SPVB</b243>
<b081>Springer</b081>
</publisher>
<b003>19620101</b003>
<b087>1915</b087>
<measure>
<c093>08</c093>
<c094>1980</c094>
<c095>gr</c095>
</measure>
<measure>
<c093>01</c093>
<c094>242</c094>
<c095>mm</c095>
</measure>
<measure>
<c093>02</c093>
<c094>170</c094>
<c095>mm</c095>
</measure>
<supplydetail>
<j137>Springer</j137>
<j292>01</j292>
<j141>IP</j141>
<j260>00</j260>
<j142>19620101</j142>
<price>
<j148>01</j148>
<j262>Recomm. price</j262>
<discountcoded>
<j363>02</j363>
<j378>Product discount group</j378>
<j364>DGNY2</j364>
</discountcoded>
<j266>02</j266>
<j151>55.95</j151>
<j152>USD</j152>
<j161>20040207</j161>
</price>
<price>
<j148>03</j148>
<j262>Retail price</j262>
<discountcoded>
<j363>02</j363>
<j378>Product discount group</j378>
<j364>DGNY2</j364>
</discountcoded>
<j266>02</j266>
<j151>65.37</j151>
<j152>EUR</j152>
<b251>DE</b251>
<j161>20020101</j161>
</price>
</supplydetail>
</product>
Thanks and hope to hear from the community!
Please see below code snippet and advise on what I may be doing wrong. Trying to print only once instance of EAN or ISBN per record. Each xml record lists one or multiple ISBN or EAN identifiers, but I want my output to list (in this order) the first found EAN, and if no EAN exists the first found ISBN. Just one identifier per record.
here's the code:
Code:
#!/usr/bin/perl
use strict;
use XML::Twig;
use diagnostics;
open OUTPUT, ">test.txt" or die "Cannot open file for write: $!";
our @files = 'test_data.xml';
foreach my $file(@files){
my $t= XML::Twig->new(
twig_roots => {
'product/productidentifier/b244' =>
\&print_identifier_text,
'Product/ProductIdentifier/IDValue' =>
\&print_identifier_text,
'Product/ISBN' =>
\&print_identifier_text,
'Product/EAN13' =>
\&print_identifier_text,
'product/b004' =>
\&print_identifier_text,
'product/b005' =>
\&print_identifier_text
keep_encoding => 1
}
) or die "Cannot find: $!";
$t->parsefile($file);
print OUTPUT "<%END%>\n\n";
}
sub print_identifier_text {
my ($t,$elt) = @_;
my $ean = $elt->text if $elt->text =~ /^\d{13}$/;
my $isbn = $elt->text if $elt->text =~ /^\d{9}[0-9|x]$/i;
if ($ean and $isbn){
print "<%END%>\n\n<%EAN%>" . $ean . "\n";
}
else {
print "<%END%>\n\n<%ISBN%>" . $isbn . "\n";
}
$t->purge;
}
close OUTPUT;
and here's a sample xml record. I would only want to see the first instance of the EAN (9783540028239) in my output, nothing else.
<product>
<a001>978-3-540-02823</a001>
<a002>03</a002>
<productidentifier>
<b221>02</b221>
<b244>3540028234</b244>
</productidentifier>
<productidentifier>
<b221>03</b221>
<b244>9783540028239</b244>
</productidentifier>
<productidentifier>
<b221>15</b221>
<b244>9783540028239</b244>
</productidentifier>
<productidentifier>
<b221>01</b221>
<b233>Publishers Order No</b233>
<b244>10171625</b244>
</productidentifier>
<productidentifier>
<b221>06</b221>
<b244>10.1007/b16890</b244>
</productidentifier>
<b012>BB</b012>
<b028>Hebräisches und Aramäisches Handwörterbuch über das Alte Testament</b028>
<title>
<b202>01</b202>
<b203>Hebräisches und Aramäisches Handwörterbuch über das Alte Testament</b203>
</title>
<b057>17</b057>
<b217>7</b217>
<b058>17. Aufl. 1915. Unveränd. Neudruck</b058>
<language>
<b253>01</b253>
<b252>ger</b252>
</language>
<language>
<b253>01</b253>
<b252>ine</b252>
</language>
<b254>XIX</b254>
<b255>1013</b255>
<b064>SCI000000</b064>
<subject>
<b067>10</b067>
<b069>SCI000000</b069>
</subject>
<subject>
<b067>20</b067>
<b070>Handwörterbuch</b070>
</subject>
<subject>
<b067>20</b067>
<b070>Hebräisch /Wörterbücher, Fachausdrücke</b070>
</subject>
<subject>
<b067>20</b067>
<b070>Hebräisches Handwörterbuch</b070>
</subject>
<subject>
<b067>20</b067>
<b070>Aramäer /Wörterbücher, Fachausdrücke</b070>
</subject>
<subject>
<b067>20</b067>
<b070>Altes Testament /Lexikon, Wörterbuch</b070>
</subject>
<publisher>
<b291>01</b291>
<b241>02</b241>
<b243>SPVB</b243>
<b081>Springer</b081>
</publisher>
<b003>19620101</b003>
<b087>1915</b087>
<measure>
<c093>08</c093>
<c094>1980</c094>
<c095>gr</c095>
</measure>
<measure>
<c093>01</c093>
<c094>242</c094>
<c095>mm</c095>
</measure>
<measure>
<c093>02</c093>
<c094>170</c094>
<c095>mm</c095>
</measure>
<supplydetail>
<j137>Springer</j137>
<j292>01</j292>
<j141>IP</j141>
<j260>00</j260>
<j142>19620101</j142>
<price>
<j148>01</j148>
<j262>Recomm. price</j262>
<discountcoded>
<j363>02</j363>
<j378>Product discount group</j378>
<j364>DGNY2</j364>
</discountcoded>
<j266>02</j266>
<j151>55.95</j151>
<j152>USD</j152>
<j161>20040207</j161>
</price>
<price>
<j148>03</j148>
<j262>Retail price</j262>
<discountcoded>
<j363>02</j363>
<j378>Product discount group</j378>
<j364>DGNY2</j364>
</discountcoded>
<j266>02</j266>
<j151>65.37</j151>
<j152>EUR</j152>
<b251>DE</b251>
<j161>20020101</j161>
</price>
</supplydetail>
</product>
Thanks and hope to hear from the community!