Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

cygwin gawk and XML files 1

Status
Not open for further replies.

madasafish

Technical User
Jul 18, 2006
78
TH
Code:
<configuration>MediacastCarousel_abc_dynamic<variable>^M^M
        <name>Cdf.PollingInterval</name>^M^M
        <value>60</value>^M^M
        <type>string</type>^M^M
        <grouplevel>3</grouplevel>^M^M
        <permission>0</permission>^M^M
<groupname>ngetv1</groupname>^M^M
</variable>^M^M
<variable>^M^M
        <name>Cdf.Url</name>^M^M
        <value>[URL unfurl="true"]http://dcabwww.mh.abc.co.uk/bridge/shared/cdfs/bridge_shared_langley.cdf</value>^M^M[/URL]
        <type>string</type>^M^M
        <grouplevel>3</grouplevel>^M^M
        <permission>0</permission>^M^M
<groupname>ngetv1</groupname>^M^M
</variable>^M^M
<variable>^M^M
        <name>PhysicalNetworkAddress</name>^M^M
        <value>65535:7501:64</value>^M^M
        <type>string</type>^M^M
        <grouplevel>3</grouplevel>^M^M
        <permission>0</permission>^M^M
<groupname>ngetv1</groupname>^M^M
</variable>^M^M

......

the <variable> stansa repeats with different name,value,type,grouplevel,permission values.
until we reach......
</variable>
</configuration>

Then repeats with a new configuration
<configuration>MediacastCarousel_cht_dynamic<variable>^M^M

......etc....

</variable>
</configuration>

What I want to achieve is the following...

Name,Pol_Int,PNA,URL
abc_dynamic,60,65535:7501:64,cht_dynamic,etc,etc,etc

Currently I am using MS Excell to convert the xml file to csv format. Even this does not give the exact result wanted and does require some massaging after the conversion.

I have also looked at Expat and xgawk, I did manage to acheive some output with xgawk but not the desired output.

Any help or better ideas appreciated,

Thanks in advance,

Madasafish
 
Try this:

Code:
awk -F '[<>]' -v OFS=',' '
        [olive]function[/olive] printrec() {
                [b]print[/b] name,v[[red]"[/red][purple]Cdf.PollingInterval[/purple][red]"[/red]],v[[red]"[/red][purple]PhysicalNetworkAddress[/purple][red]"[/red]],v[[red]"[/red][purple]Cdf.Url[/purple][red]"[/red]]
                [olive]delete[/olive] v
        }
        [blue]$2[/blue] == [red]"[/red][purple]name[/purple][red]"[/red] { n=[blue]$3[/blue] }
        [blue]$2[/blue] == [red]"[/red][purple]value[/purple][red]"[/red] { v[n]=[blue]$3[/blue] }
        [blue]$2[/blue] == [red]"[/red][purple]configuration[/purple][red]"[/red] {
                [olive]if[/olive] (name!=[red]"[/red][purple][/purple][red]"[/red]) printrec()
                name=[blue]$3[/blue]
                [b]sub[/b]([green]/MediacastCarousel_/[/green],[red]"[/red][purple][/purple][red]"[/red],name)
        }
        [green]END[/green] { printrec() }
' inputfile

This assumes that none of your actual data values contain "<" or ">".


Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]
 
Thank-you Annihilannic,

As always, your code works first time.

I never new you could have more than 1 delimiter awk -F '[<>]'
I learned something new :)

I am trying to introduce another variable which was not mentioned before but am struggling to get it to work.

My code is shown here
Code:
$2 == "value" { v[n]=$3
                if ($3 ~ ":");split($3,a,/:/);v[pid]=a[2]
                }
Once again, thank-you for your assistance.

Madasafish
 
I got the desired result with changing the print statement...
Code:
print name,v["PhysicalNetworkAddress"],substr(v["PhysicalNetworkAddress"],7,4)

In my case, there are 4 numbers "consistantly" for the substr statement.

Out of pure curiousity, how you would handle it if I did not have 4 numbers consistantly and it varied between the colons?

Thanks again Annihilannic,

Madasafish


 
I would split it up into an array by colons and use the second element.

Code:
[b]split[/b](v[[red]"[/red][purple]PhysicalNetworkAddress[/purple][red]"[/red]],a,[green]/:/[/green])
[b]print[/b] a[2]


Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top