Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Search and Replace

Status
Not open for further replies.

roneo

Programmer
Feb 6, 2002
4
US
I am trying to write a Unix Shell Script command that would clean up the XML files by eliminating spaces between the nodes...example:
<PERSON>
<NAME>James</NAME>
</PERSON>

the command has to make:
<PERSON><NAME>James</NAME></PERSON>

anyone has clues ???
Thanks
 
the tr command may be useful, but I'm sure awk and sed can do it better

Some examples
To remove all end of lines: -
tr -d '\012' <infile>outfile

To remove all spaces: -
tr -d ' ' <infile>outfile

Please note the < > are required

see man pages
 
Hi Roneo,

The following awk script eliminates spaces between nodes like you want (i hope).

awk -f CleanUp.awk XMLfile

----- XMLfile -----
Your <B>identification</B> :
<IDENT>

<NAME>James Brown</NAME>
No more informations
</IDENT>
-------------------


----- Result -----
Your <B>identification</B> :
<IDENT><NAME>James Brown</NAME>
No more informations
</IDENT>
------------------


----- CleanUp.awk -----


# Supress spaces between consecutive nodes
# in the same line

{
gsub(&quot;>[ \t]*<&quot;,&quot;><&quot;,$0)
}

# After une line ending with a node,
# memorize empty line and go to to next line

AfterNode && /^[ \t]*$/ {
Memo[++MemoCnt]=$0
next
}

# After une line ending with a node,
# merge this previous line with current
# if starting with a node

AfterNode && /^[ \t]*</ {
gsub(&quot;^[ \t]*<&quot;,&quot;<&quot;,$0)
$0=Node $0
MemoCnt=0
AfterNode=0
}


# After une line ending with a node,
# print all memorized lines if the
# current line is not starting with a node

AfterNode && /^[ \t]*[^<]/ {
for (im=1; im<= MemoCnt; im++)
print Memo[im]
MemoCnt=0
AfterNode=0
}

# When a node is ending the line
# memorize it, it will be printed later
# Go to next line

/>[ \t]*$/ {
AfterNode=1
MemoCnt=0
Memo[++MemoCnt]=$0
gsub(&quot;>[ \t]*$&quot;,&quot;>&quot;,$0)
Node=$0
next
}

# No special case, print current line

{
print $0
}

# End of file, if we are after a line ending
# with a node, print all memorized lines

END {
if (AfterNode) {
for (im=1; im<= MemoCnt; im++)
print Memo[im]
}
}

-----------------------
Jean Pierre.
 
wow ! thanks man.. i hope you didnt just write all of that yourself... appreciate your help. I will test it out.
laters
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top