Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Deleting XML block from the XML file 1

Status
Not open for further replies.

amkipnis

Programmer
Apr 15, 2003
21
US
Hi, I am trying to write the sed script which would exclude the 10-lines xml segment based on the given and found pattern. All 10-lines segments are indentically structured.
Here is an example of the first 3 records of master XML file printers.conf:
<Printer CL_010002>
Info pa010002
DeviceURI socket://pa010002:port
State Idle
Accepting Yes
JobSheets none none
QuotaPeriod 0
PageLimit 0
KLimit 0
</Printer>
<Printer CL_010003>
Info pa010003
DeviceURI socket://pa010003:port
State Idle
Accepting Yes
JobSheets none none
QuotaPeriod 0
PageLimit 0
KLimit 0
</Printer>
<Printer CL_010013>
Info pa010013
DeviceURI socket://pa010013:port
State Idle
Accepting Yes
JobSheets none none
QuotaPeriod 0
PageLimit 0
KLimit 0
</Printer>
......
Here is my attempt to write the sed script to search for particular XML segment and remove it based on the found numeric 6-digit pattern (i.e. 010013, for example). I am having problem to create a delta file based on the excluded xml segments from the master file. My approach in the provided script below as following. First, I am determining the start and end line numbers of xml block to be deleted. However, as I am looping thru 6-digits patterns read from the explicitly defined file (*.dat), I need to be able to retain the unmatched 10-line xml blocks in their original position / sequence, while the 10-line xml blocks with found match should be excluded from the resulted xml file. Would you please help. Thanks in advance!============================================================
#!/usr/bin/ksh
CUPSDir=/path/to/file/
CUPS_TABLE=${CUPSDir}printers.conf
#cat -n *xml|egrep '<Printer CL|</Printer>' => extract number of lines
#1 cat -n printers.xml|egrep '<Printer CL|</Printer>'|grep 414401|nawk '{print $1}' => extract the number of start block
if [[ -f ${CUPSDir}printer.conf.update ]]
then
rm ${CUPSDir}printer.conf.update
fi
if [[ -f ${CUPSDir}printer.conf.read ]]
then
rm ${CUPSDir}printer.conf.read
fi
while read office
do
#sblock=0
#eblock=0
#echo locating office $office
#sblock=`cat -n $CUPS_TABLE|egrep '<Printer CL|</Printer>'|grep 414401|nawk '{print $1}'`

sblock=`cat -n $CUPS_TABLE|egrep '<Printer CL|</Printer>'|grep $office|nawk '{print $1}'`
if (test "$sblock" <> "")
then

let eblock=sblock+9
#echo start block is $sblock
#echo end block is $eblock

#2. sed -n 13,22p *xml =>extract the segment of needed branch

#This command will extract only the xml segments which need to be removed from the master file $CUPS_TABLE
sed -n "$sblock","$eblock"p $CUPS_TABLE >> ${CUPSDir}printer.conf.read

sed -n "$sblock","$eblock"!p $CUPS_TABLE > ${CUPSDir}printer.conf.read
#sed "$sblock","$eblock"d $CUPS_TABLE > ${CUPSDir}printer.conf.$office


#3. sed 's!13,22d' printers.xml => delete XML segment of closed office
#This command will remove only the xml segment based on the last office read in the list ${CUPSDir}printers_to_purge.dat
sed -e "$sblock","$eblock"d $CUPS_TABLE > ${CUPSDir}printer.conf.update

#This command will uppend duplicate xml segments which need to be removed the master file $CUPS_TABLE
sed -e "$sblock","$eblock"d $CUPS_TABLE >> ${CUPSDir}printer.conf.update
#===========================================================

else
echo ERROR - can not locate office $office in $CUPS_TABLE
fi
done<${CUPSDir}printers_to_purge.dat

exit
====================================================
Here is a content of the ${CUPSDir}printers_to_purge.dat
027105
028102
211300
211707
211719
211721
211725
211726
211760
211761
211762
211785
211814
211816
211817
211828
211831
211875
211876
211880
212151
213200
217700
219300
219802
321704
321712
321720
322105
540026
541704
541707
541715
547201
 
I'd use awk.

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Hi PHV, would you please provide me some guidance or at least starting point to the "awk" solution? Do I need to submit this question under "awk" forum?

In any event, if I try the following command, the resulting file would list only unique lines, while I need to be able to retain whole 10-line block for each xml section:

nawk '
FNR==NR{t[$1]=$2;next}
!($1 in t) || t[$1]!=$2
' ${CUPSDir}printers.conf ${CUPSDir}printer.conf.read > ${CUPSDir}printer.conf.update

I appreciate your input! Thanks.
 
Something like this perhaps:

Code:
awk '
        FNR==NR { t[$1]=1; next }
        /<Printer CL_/ {
                num=$0
                sub("^<Printer CL_","",num)
                sub(">$","",num)
                if (num in t) {
                        do { getline } while ($0 !~ /<\/Printer>/)
                } else {
                        print
                }
                next
        }
        { print }
' ${CUPSDir}printers_to_purge.dat ${CUPSDir}printers.conf > ${CUPSDir}printers.conf.update

Or another shorter way:

Code:
awk -F '[ <>_]+' '
        FNR==NR { t[$1]=1; next }
        /<Printer/ { num=$4 }
        /<Printer/,/<\/Printer>/ { if (!(num in t)) { print } }
' ${CUPSDir}printers_to_purge.dat ${CUPSDir}printers.conf > ${CUPSDir}printers.conf.update


Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top