Saving xml-files as txt-files

Guest_imported · Mar 31, 2002

Hi,

I've written a script that strips off all the HTML tags in a XML file.

BEGIN {
RS=""
}
{
if($0~/^.*<text>/)
gsub(/^.*<text>/, "", $0);
if($0~/<\/text>.*$/)
gsub(/<\/text>.*$/, "", $0);
gsub(/<\/?p>/, "", $0);
gsub(/\&quot\;/, "", $0);
gsub(/[ ][ ]+/, " ", $0);
print $0
}

What should I add to my script so that each XML file is automatically saved as a text file? So files like news135.xml, news653.xml should get the following names: news135.txt and news653.txt.
The problem is that I've got hundreds of such xml files, so I was thinking of using a wildcard on the command line (gawk -f script.awk *.xml). Each file should be saved as a seperate file but I don't know how to do that with gawk.

Can someone help me with this?

Febri

aigles · Apr 1, 2002

BEGIN {
RS=""
}
#
# First record of current file
# Close previous save file, and set new save file
#
FNR == 1 {
if (SaveFile != "&quot

close(SaveFile) ;
SaveFile = FILENAME ;
gsub(".[^.]*$",".XXX",SaveFile) ;
}
#
# Strip off all HTML tags
# and write to save file
#
{
if($0~/^.*<text>/)
gsub(/^.*<text>/, "", $0);
if($0~/<\/text>.*$/)
gsub(/<\/text>.*$/, "", $0);
gsub(/<\/?p>/, "", $0);
gsub(/\"\;/, "", $0);
gsub(/[ ][ ]+/, " ", $0);
print $0 > SaveFile ;
}
Jean Pierre.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Saving xml-files as txt-files

Guest_imported

New member

aigles

Technical User

Similar threads

Part and Inventory Search

Sponsor