how to add xml end tag

yosiasz · Feb 6, 2011

Greetings

When using awk to split huge xml file, I am losing an end tag on which I am filtering on. How do I go about not losing that tag or tacking it on. Interestingly the root tag is intact

Below is the awk script I am using.
Total newbee, just started awk today
Thanks

#$ cat split_bigfile.awk

BEGIN { new_chunk = 1 ; size = 100000 }

NR == 1 { header = $0 ; next }
NR == 2 { header = header ORS $0 ; footer = "</" substr($1,2) ">" ; next }

$0 !~ footer {
if (new_chunk) {
outfile = "chnk_association" sprintf("%07d", num) ".xml"
print header > outfile
new_chunk = 0
}
print > outfile
}

/<\/Association>/ {
num = int(count++/size)
if (num > prev_num) {
print footer > outfile
new_chunk = 1
}
prev_num = num
}

END { if (!new_chunk) print footer > outfile }

feherke · Feb 6, 2011

Hi

Please post some sample input too, so we can see your script in action.

Feherke.

http://free.rootshell.be/~feherke/

yosiasz · Feb 7, 2011

Hi

Appreciate your help. Here is what my xml looks like

<?xml version="1.0" encoding="utf-8"?>
<Root xmlns:xsi="

http://www.w3.org/2001/XMLSchema-instance"

xmlns:xsd="

http://www.w3.org/2001/XMLSchema">

<Associations>
<Association>
---
---
---
</Association>
<Association>
---
---
---
</Association>
</Associations>

The first extracted xml is missing the end tag

<?xml version="1.0" encoding="utf-8"?>
<Root xmlns:xsi="

http://www.w3.org/2001/XMLSchema-instance"

xmlns:xsd="

http://www.w3.org/2001/XMLSchema">

<Associations>
<Association>
---
---
---
</Association>
<Association>
---
---
---
</Association>

the last xml file is missing the start tag
<?xml version="1.0" encoding="utf-8"?>
<Root xmlns:xsi="

http://www.w3.org/2001/XMLSchema-instance"

xmlns:xsd="

http://www.w3.org/2001/XMLSchema">

<Association>
---
---
---
</Association>
<Association>
---
---
---
</Association>
<Associations>

which makes sense since the script is telling it to do so. How can I fix this so that the xml structure stays intact?

Thanks!!

p5wizard · Feb 7, 2011

Based on what little you've shown of the file, I 'd guess you need to put 3 lines in header, and a modified 3rd line in footer

Try this and see if you can make sense of it (I did not test it)

Code:

BEGIN { new_chunk = 1 ; size = 100000 }

NR == 1 { header = $0 ; next }
NR == 2 { header = header ORS $0 ; next }
NR == 3 { header = header ORS $0 ; footer = "</" substr($1,2) ">" ; next }

$0 !~ footer {
if (new_chunk) {
outfile = "chnk_association" sprintf("%07d", num) ".xml"
print header > outfile
new_chunk = 0
}
print > outfile
}

/<\/Association>/ {
num = int(count++/size)
if (num > prev_num) {
print footer > outfile
new_chunk = 1
}
prev_num = num
}

END { if (!new_chunk) print footer > outfile }

HTH,

p5wizard

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

how to add xml end tag

yosiasz

Programmer

feherke

Programmer

yosiasz

Programmer

p5wizard

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor