Best method for updating variables in an xml file 2

epatton · Jan 18, 2007

I have a pretty big (306 line) xml file that represents information (metadata) about geological data collected by a government agency. The data in this xml is ugly, and hard to parse. However, there are really only about 20 different fields in this file that change from one geological survey to the next. Every dataset collected in every survey needs to have an accompanying xml file created which describes its metadata.

I would like to be able to write some kind of shell/sed/grep/awk script that only changes the data in these 20 or so special fields with information that the user provides interactively, either through shell variables passed to the program or interactive question-and-answer, and outputs the rest of the lines in the file unchanged. I'm not sure if awk is the best approach, or if I need some kind of awk/sed/shell hybrid script. Here's a snippet of the format of the xml file:

<?xml version="1.0" encoding="ISO-8859-1"?>
<metadata>
<idinfo>
<citation>
<citeinfo>
<origin>John Doe</origin>
<pubdate>Unpublished Material</pubdate>
<title>Coloured, Shaded-Relief Image of Multibeam Bathymetry of Tilt Cove, Newfoundland, Canada</title>
<geoform Sync="TRUE">remote-sensing image</geoform>
<serinfo></serinfo>
<pubinfo></pubinfo>
<ftname Sync="TRUE">TiltCove_2_ave_fill_shade_comb.tif</ftname></citeinfo>
</citation>
<descript>
<timeperd>
<timeinfo>
<rngdates>
<begdate>20010627</begdate>
<enddate>20010708</enddate>
</rngdates>
</timeinfo>
<current>ground condition</current>
</timeperd>
<status>
<progress>Complete</progress>
<update>None planned</update>
</status>
<spdom>
<bounding>
<westbc>-59.680215</westbc>
<eastbc>-59.616513</eastbc>
<northbc>44.013563</northbc>
<southbc>43.950450</southbc>
</bounding>

I know it looks like gibberish, but the key point here is that only a select few of these xml tags need to change from one dataset to the next (i.e., tags that represent time, dates, persons, location names, etc.)

Can anyone think of a general approach for automating the production of these xml files (given a finished one as a template), allowing for the modification of key xml tags in every one?

Mike042 · Jan 19, 2007

I have been working on something similar for a while. I have been trying to create html files from information in basic ascii text files. I chose to use Perl, but the scripting language is not so important. This is the method I used.
- create a 'template' file in html format
- create the text file (this can be edited later & the Perl script re-run to produce a new version of the html file)
- write a Perl script to 'make' the new page(s), as follows:

Open the text file and read the data into Variables & Arrays (also prompt the user for information)
Open an output file in a temporary location (don't want to overwrite the existing html file yet)
Open the 'template' file and read it line by line in a loop
Write the line to the output file as is, or modified by the data in the Variables & Arrays
At the end of the loop, close all files
Overwrite the existing html file with the new one

- save a copy of the existing html file
- test the script

I hope that helps, to get you started.

Mike

epatton · Jan 19, 2007

Thanks, Mike! This is a great start - exactly what I needed. I have a question for you though: how did you make decisions whether a line should be read from the template and copied to output, or read from the variables/arrays?

Thanks very much,

~ Eric.

Mike042 · Jan 19, 2007

Hi Eric,

In the template there would be lines like (example taken from your post but modified):

epatton said:
<?xml version="1.0" encoding="ISO-8859-1"?>
<metadata>
<idinfo>
<citation>
<citeinfo>
<origin>###author###</origin>
<pubdate>Unpublished Material</pubdate>
<title>###title###</title>

The first few lines get copied as is, because they don't meet any search criteria, as follows:

if the line contains ###author### then replace that string with data from a variable/array before writing it to the output file
else-if the line contains ###title### then replace that string with data from a variable/array before writing it to the output file
else write the line as is to the output file

I hope that makes sense.

Mike

Annihilannic · Jan 19, 2007

Or rather than lots of if's and else-if's you could just pull out the string between the #s and use that as the index to the array.

Annihilannic.

epatton · Jan 19, 2007

Thanks Mike, I understand now.

Annihilannic, could you elaborate a bit more? Thanks!

~ Eric.

Mike042 · Jan 19, 2007

Hi Eric,

Glad to hear it and thanks very much for the 'star'.

Regards.

Mike

Annihilannic · Jan 19, 2007

By example:

template.xml input file:

Code:

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <metadata>
            <idinfo>
                    <citation>
                            <citeinfo>
                                    <origin>###author###</origin>
                                    <pubdate>Unpublished Material</pubdate>
                                    <title>###title###</title>

variables input file:

Code:

author:Arthur Conan Doyle
title:The Resident Patient

updatexml script:

Code:

awk -F: '
        NR==FNR { vars[$1]=$2 ; next }
        /###(.*)###/ {
                token=gensub(".*(###[^#]*###).*","\\1","")
                tokenname=gensub(".*###(.*)###.*","\\1","")
                if (tokenname in vars) {
                        sub(token,vars[tokenname])
                } else {
                        sub(token,"undefined")
                }
        }
        { print }
' variables template.xml

updatexml output:

Code:

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <metadata>
            <idinfo>
                    <citation>
                            <citeinfo>
                                    <origin>Arthur Conan Doyle</origin>
                                    <pubdate>Unpublished Material</pubdate>
                                    <title>The Resident Patient</title>

Note that this solution depends on the gensub() function which is only available in GNU awk I believe.

Annihilannic.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Best method for updating variables in an xml file 2

epatton

Technical User

Mike042

MIS

epatton

Technical User

Mike042

MIS

Annihilannic

MIS

epatton

Technical User

Mike042

MIS

Annihilannic

MIS

Similar threads

Part and Inventory Search

Sponsor