Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

how to add xml end tag

Status
Not open for further replies.

yosiasz

Programmer
Feb 5, 2011
2
0
0
US
Greetings

When using awk to split huge xml file, I am losing an end tag on which I am filtering on. How do I go about not losing that tag or tacking it on. Interestingly the root tag is intact

Below is the awk script I am using.
Total newbee, just started awk today
Thanks

#$ cat split_bigfile.awk

BEGIN { new_chunk = 1 ; size = 100000 }

NR == 1 { header = $0 ; next }
NR == 2 { header = header ORS $0 ; footer = "</" substr($1,2) ">" ; next }

$0 !~ footer {
if (new_chunk) {
outfile = "chnk_association" sprintf("%07d", num) ".xml"
print header > outfile
new_chunk = 0
}
print > outfile
}

/<\/Association>/ {
num = int(count++/size)
if (num > prev_num) {
print footer > outfile
new_chunk = 1
}
prev_num = num
}

END { if (!new_chunk) print footer > outfile }
 
Hi

Appreciate your help. Here is what my xml looks like

<?xml version="1.0" encoding="utf-8"?>
<Root xmlns:xsi=" xmlns:xsd="<Associations>
<Association>
---
---
---
</Association>
<Association>
---
---
---
</Association>
</Associations>

The first extracted xml is missing the end tag

<?xml version="1.0" encoding="utf-8"?>
<Root xmlns:xsi=" xmlns:xsd="<Associations>
<Association>
---
---
---
</Association>
<Association>
---
---
---
</Association>

the last xml file is missing the start tag
<?xml version="1.0" encoding="utf-8"?>
<Root xmlns:xsi=" xmlns:xsd="<Association>
---
---
---
</Association>
<Association>
---
---
---
</Association>
<Associations>

which makes sense since the script is telling it to do so. How can I fix this so that the xml structure stays intact?

Thanks!!
 
Based on what little you've shown of the file, I 'd guess you need to put 3 lines in header, and a modified 3rd line in footer

Try this and see if you can make sense of it (I did not test it)
Code:
BEGIN { new_chunk = 1 ; size = 100000 }

NR == 1 { header = $0 ; next }
NR == 2 { header = header ORS $0 ; next }
NR == 3 { header = header ORS $0 ; footer = "</" substr($1,2) ">" ; next }

$0 !~ footer {
if (new_chunk) {
outfile = "chnk_association" sprintf("%07d", num) ".xml"
print header > outfile
new_chunk = 0
}
print > outfile
}

/<\/Association>/ {
num = int(count++/size)
if (num > prev_num) {
print footer > outfile
new_chunk = 1
}
prev_num = num
}

END { if (!new_chunk) print footer > outfile }


HTH,

p5wizard
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top