Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Saving xml-files as txt-files

Status
Not open for further replies.

Guest_imported

New member
Jan 1, 1970
0
Hi,

I've written a script that strips off all the HTML tags in a XML file.

BEGIN {
RS=""
}
{
if($0~/^.*<text>/)
gsub(/^.*<text>/, &quot;&quot;, $0);
if($0~/<\/text>.*$/)
gsub(/<\/text>.*$/, &quot;&quot;, $0);
gsub(/<\/?p>/, &quot;&quot;, $0);
gsub(/\&quot\;/, &quot;&quot;, $0);
gsub(/[ ][ ]+/, &quot; &quot;, $0);
print $0
}


What should I add to my script so that each XML file is automatically saved as a text file? So files like news135.xml, news653.xml should get the following names: news135.txt and news653.txt.
The problem is that I've got hundreds of such xml files, so I was thinking of using a wildcard on the command line (gawk -f script.awk *.xml). Each file should be saved as a seperate file but I don't know how to do that with gawk.

Can someone help me with this?

Febri
 

BEGIN {
RS=&quot;&quot;
}
#
# First record of current file
# Close previous save file, and set new save file
#
FNR == 1 {
if (SaveFile != &quot;&quot;) close(SaveFile) ;
SaveFile = FILENAME ;
gsub(&quot;.[^.]*$&quot;,&quot;.XXX&quot;,SaveFile) ;
}
#
# Strip off all HTML tags
# and write to save file
#
{
if($0~/^.*<text>/)
gsub(/^.*<text>/, &quot;&quot;, $0);
if($0~/<\/text>.*$/)
gsub(/<\/text>.*$/, &quot;&quot;, $0);
gsub(/<\/?p>/, &quot;&quot;, $0);
gsub(/\&quot;\;/, &quot;&quot;, $0);
gsub(/[ ][ ]+/, &quot; &quot;, $0);
print $0 > SaveFile ;
}
Jean Pierre.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top