Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Process an XML File with AWK

Status
Not open for further replies.

IMAUser

Technical User
May 28, 2003
121
CH
Hi ,
Have an XML file a section of which is as below
<component>
<refentity>
<name>ABITIBI-CONSOLIDATED INC.</name>
<red>003CB6</red>
</refentity>
<bond>
<name>ABY 8.55 01Aug10</name>
<issuer>
<prospname>Abitibi</prospname>
<pairred>003CB6AA3</pairred>
<ispreferred>true</ispreferred>
</issuer>
<type>Bond</type>
<isconvert>false</isconvert>
</bond>
<red>003CB6AA3</red>
</component>
<component>
<refentity>
<name>Argentine Republic</name>
<red>PP7D7E</red>
</refentity>
<bond>
<name>ARGENT 8.28 31Dec33 Sink</name>
<type>Bond</type>
<isconvert>false</isconvert>
</bond>
</component>

The above are two <component> records. In the first one within the tag <bond> there is a tag <issuer> , in the second there is no <issuer>.

I need to write a script to seperately write all the components and bonds which have a issuer into one file and components and bonds which do not have an issuer into another file.

Any help is very much appreciated.

Thanks
 
Thanx for that but it complains saying
awk: syntax error near line 3
awk: illegal statement near line 3

If you can tell me what each bit is doing there, I may be able to debug it.

Thanks again.
 
Hi

I have [tt]gawk[/tt] 3.1.1 and works fine. Works with --traditional option too, so theoretically should be compatible with other [tt]awk[/tt] versions.

Try to put all the filename expression into parenthesis :
Code:
awk '
/<component>/,/<\/component>/ { s=s ORS $0 }
/<\/component>/ { print s > [red]([/red](s~/<issuer>/?"with":"without") "_issuer"[red])[/red]; s="" }
' inputfile.xml

All it does, is to put all lines found between component marks into a variable, then when the end of section is reached, prints that value in a file.
Code:
/<component>/,/<\/component>/ {   [gray]# for lines between start and end marks[/gray]
  s=s ORS $0                      [gray]# add the current line to the variable[/gray]
}
/<\/component>/ {                 [gray]# the end mark is reached[/gray]
  print s >                       [gray]# print the variable's value to the file...[/gray]
    ((s~/<issuer>/?"with":"without") "_issuer")
[gray]# *_issuer, where the prefix of "with" or "without" is determined by the presence of issuer mark in value of s[/gray]
  s=""                            [gray]# reset the accumulated value[/gray]
}

Feherke.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top