Deleting XML block from the XML file 1

amkipnis · Dec 20, 2007

Hi, I am trying to write the sed script which would exclude the 10-lines xml segment based on the given and found pattern. All 10-lines segments are indentically structured.
Here is an example of the first 3 records of master XML file printers.conf:
<Printer CL_010002>
Info pa010002
DeviceURI socket://pa010002

ort
State Idle
Accepting Yes
JobSheets none none
QuotaPeriod 0
PageLimit 0
KLimit 0
</Printer>
<Printer CL_010003>
Info pa010003
DeviceURI socket://pa010003

ort
State Idle
Accepting Yes
JobSheets none none
QuotaPeriod 0
PageLimit 0
KLimit 0
</Printer>
<Printer CL_010013>
Info pa010013
DeviceURI socket://pa010013

ort
State Idle
Accepting Yes
JobSheets none none
QuotaPeriod 0
PageLimit 0
KLimit 0
</Printer>
......
Here is my attempt to write the sed script to search for particular XML segment and remove it based on the found numeric 6-digit pattern (i.e. 010013, for example). I am having problem to create a delta file based on the excluded xml segments from the master file. My approach in the provided script below as following. First, I am determining the start and end line numbers of xml block to be deleted. However, as I am looping thru 6-digits patterns read from the explicitly defined file (*.dat), I need to be able to retain the unmatched 10-line xml blocks in their original position / sequence, while the 10-line xml blocks with found match should be excluded from the resulted xml file. Would you please help. Thanks in advance!============================================================
#!/usr/bin/ksh
CUPSDir=/path/to/file/
CUPS_TABLE=${CUPSDir}printers.conf
#cat -n *xml|egrep '<Printer CL|</Printer>' => extract number of lines
#1 cat -n printers.xml|egrep '<Printer CL|</Printer>'|grep 414401|nawk '{print $1}' => extract the number of start block
if [[ -f ${CUPSDir}printer.conf.update ]]
then
rm ${CUPSDir}printer.conf.update
fi
if [[ -f ${CUPSDir}printer.conf.read ]]
then
rm ${CUPSDir}printer.conf.read
fi
while read office
do
#sblock=0
#eblock=0
#echo locating office $office
#sblock=`cat -n $CUPS_TABLE|egrep '<Printer CL|</Printer>'|grep 414401|nawk '{print $1}'`

sblock=`cat -n $CUPS_TABLE|egrep '<Printer CL|</Printer>'|grep $office|nawk '{print $1}'`
if (test "$sblock" <> "")
then

let eblock=sblock+9
#echo start block is $sblock
#echo end block is $eblock

#2. sed -n 13,22p *xml =>extract the segment of needed branch

#This command will extract only the xml segments which need to be removed from the master file $CUPS_TABLE
sed -n "$sblock","$eblock"p $CUPS_TABLE >> ${CUPSDir}printer.conf.read

sed -n "$sblock","$eblock"!p $CUPS_TABLE > ${CUPSDir}printer.conf.read
#sed "$sblock","$eblock"d $CUPS_TABLE > ${CUPSDir}printer.conf.$office

#3. sed 's!13,22d' printers.xml => delete XML segment of closed office
#This command will remove only the xml segment based on the last office read in the list ${CUPSDir}printers_to_purge.dat
sed -e "$sblock","$eblock"d $CUPS_TABLE > ${CUPSDir}printer.conf.update

#This command will uppend duplicate xml segments which need to be removed the master file $CUPS_TABLE
sed -e "$sblock","$eblock"d $CUPS_TABLE >> ${CUPSDir}printer.conf.update
#===========================================================

else
echo ERROR - can not locate office $office in $CUPS_TABLE
fi
done<${CUPSDir}printers_to_purge.dat

exit
====================================================
Here is a content of the ${CUPSDir}printers_to_purge.dat
027105
028102
211300
211707
211719
211721
211725
211726
211760
211761
211762
211785
211814
211816
211817
211828
211831
211875
211876
211880
212151
213200
217700
219300
219802
321704
321712
321720
322105
540026
541704
541707
541715
547201

PHV · Dec 20, 2007

I'd use awk.

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886

amkipnis · Dec 20, 2007

Hi PHV, would you please provide me some guidance or at least starting point to the "awk" solution? Do I need to submit this question under "awk" forum?

In any event, if I try the following command, the resulting file would list only unique lines, while I need to be able to retain whole 10-line block for each xml section:

nawk '
FNR==NR{t[$1]=$2;next}
!($1 in t) || t[$1]!=$2
' ${CUPSDir}printers.conf ${CUPSDir}printer.conf.read > ${CUPSDir}printer.conf.update

I appreciate your input! Thanks.

Annihilannic · Dec 20, 2007

Something like this perhaps:

Code:

awk '
        FNR==NR { t[$1]=1; next }
        /<Printer CL_/ {
                num=$0
                sub("^<Printer CL_","",num)
                sub(">$","",num)
                if (num in t) {
                        do { getline } while ($0 !~ /<\/Printer>/)
                } else {
                        print
                }
                next
        }
        { print }
' ${CUPSDir}printers_to_purge.dat ${CUPSDir}printers.conf > ${CUPSDir}printers.conf.update

Or another shorter way:

Code:

awk -F '[ <>_]+' '
        FNR==NR { t[$1]=1; next }
        /<Printer/ { num=$4 }
        /<Printer/,/<\/Printer>/ { if (!(num in t)) { print } }
' ${CUPSDir}printers_to_purge.dat ${CUPSDir}printers.conf > ${CUPSDir}printers.conf.update

Annihilannic.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Deleting XML block from the XML file 1

amkipnis

Programmer

PHV

MIS

amkipnis

Programmer

Annihilannic

MIS

Similar threads

Part and Inventory Search

Sponsor