I have a pretty big (306 line) xml file that represents information (metadata) about geological data collected by a government agency. The data in this xml is ugly, and hard to parse. However, there are really only about 20 different fields in this file that change from one geological survey to the next. Every dataset collected in every survey needs to have an accompanying xml file created which describes its metadata.
I would like to be able to write some kind of shell/sed/grep/awk script that only changes the data in these 20 or so special fields with information that the user provides interactively, either through shell variables passed to the program or interactive question-and-answer, and outputs the rest of the lines in the file unchanged. I'm not sure if awk is the best approach, or if I need some kind of awk/sed/shell hybrid script. Here's a snippet of the format of the xml file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<metadata>
<idinfo>
<citation>
<citeinfo>
<origin>John Doe</origin>
<pubdate>Unpublished Material</pubdate>
<title>Coloured, Shaded-Relief Image of Multibeam Bathymetry of Tilt Cove, Newfoundland, Canada</title>
<geoform Sync="TRUE">remote-sensing image</geoform>
<serinfo></serinfo>
<pubinfo></pubinfo>
<ftname Sync="TRUE">TiltCove_2_ave_fill_shade_comb.tif</ftname></citeinfo>
</citation>
<descript>
<timeperd>
<timeinfo>
<rngdates>
<begdate>20010627</begdate>
<enddate>20010708</enddate>
</rngdates>
</timeinfo>
<current>ground condition</current>
</timeperd>
<status>
<progress>Complete</progress>
<update>None planned</update>
</status>
<spdom>
<bounding>
<westbc>-59.680215</westbc>
<eastbc>-59.616513</eastbc>
<northbc>44.013563</northbc>
<southbc>43.950450</southbc>
</bounding>
I know it looks like gibberish, but the key point here is that only a select few of these xml tags need to change from one dataset to the next (i.e., tags that represent time, dates, persons, location names, etc.)
Can anyone think of a general approach for automating the production of these xml files (given a finished one as a template), allowing for the modification of key xml tags in every one?
I would like to be able to write some kind of shell/sed/grep/awk script that only changes the data in these 20 or so special fields with information that the user provides interactively, either through shell variables passed to the program or interactive question-and-answer, and outputs the rest of the lines in the file unchanged. I'm not sure if awk is the best approach, or if I need some kind of awk/sed/shell hybrid script. Here's a snippet of the format of the xml file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<metadata>
<idinfo>
<citation>
<citeinfo>
<origin>John Doe</origin>
<pubdate>Unpublished Material</pubdate>
<title>Coloured, Shaded-Relief Image of Multibeam Bathymetry of Tilt Cove, Newfoundland, Canada</title>
<geoform Sync="TRUE">remote-sensing image</geoform>
<serinfo></serinfo>
<pubinfo></pubinfo>
<ftname Sync="TRUE">TiltCove_2_ave_fill_shade_comb.tif</ftname></citeinfo>
</citation>
<descript>
<timeperd>
<timeinfo>
<rngdates>
<begdate>20010627</begdate>
<enddate>20010708</enddate>
</rngdates>
</timeinfo>
<current>ground condition</current>
</timeperd>
<status>
<progress>Complete</progress>
<update>None planned</update>
</status>
<spdom>
<bounding>
<westbc>-59.680215</westbc>
<eastbc>-59.616513</eastbc>
<northbc>44.013563</northbc>
<southbc>43.950450</southbc>
</bounding>
I know it looks like gibberish, but the key point here is that only a select few of these xml tags need to change from one dataset to the next (i.e., tags that represent time, dates, persons, location names, etc.)
Can anyone think of a general approach for automating the production of these xml files (given a finished one as a template), allowing for the modification of key xml tags in every one?