Creating a table from an xml file

midge25 · Apr 13, 2004

Hi,

I have a large xml file which looks as follows:

<title id="T6352">
<en>Power Distribution</en>
<es>Distribución de alimentación</es>
<el>??a??µ? ?s????</el>
<cs>Usporádání elektrického napájení</cs>
<de>Stromverteilung</de>
<fi>Virranjako</fi>
<fr>Distribution d’alimentation</fr>
<hu>Áramelosztás</hu>
<it>Distribuzione di alimentazione elettrica</it>
<ja>???????????????</ja>
<nl>Spanningsverdeling</nl>
<no>Strømfordeling</no>
<pl>Dystrybucja zasilania</pl>
<ru>????????????? ?????????????? ???????</ru>
<tr>Güç Dagitimi</tr>
<sv>Spänningsfördelning</sv>
<pt>Distribuição de corrente</pt>
<da>Strømfordeling</da>
</title>

I wish to extract the data and output it to a table format as follows:

Title en es el cs de fi …….. sv pt da

I’m quite new to awk and know that I would have to use associative arrays.. (that’s what I’ve read)….

CaKiwi · Apr 13, 2004

Try this. You will need to add the rest of your tags to the print statement and may want to improve the formatting by using a printf instead of a print.

/<title/{
sub(/^[^"]*"/,"")
sub(/">$/,"")
title = $0
next
}
/<\/title/{
print title,a["en"],a["es"],a["el"],a["cs"] # add remaining tags
next
}
{
ix = substr($0,2,2)
sub(/^<[^>]*>/,"")
sub(/<.*>$/,"")
a[ix] = $0
}

CaKiwi

midge25 · Apr 13, 2004

That works great. There is a light problem, the tags are not always the same. These can vary from:

<es>
<nl>
<fe>
<cr>

e.t.c

Can I not change the script above so that it will search for all tags and pass these into an array and then print out as I wish??

CaKiwi · Apr 13, 2004

If you don't care what order they print in try this (untested)

/<\/title/{
printf title
for (tag in a) printf " " tag
print ""
next
}

Otherwise save the tags in another array as you find them

b[++n] = ix

and use

for (i=1;i<=n;i++) printf " " tag

Post back if this is not clear.

CaKiwi

futurelet · Apr 14, 2004

Code:

BEGIN{
  # Read the file twice to determine width of
  # each column.
  ARGV[ARGC]=ARGV[ARGC-1] ; ARGC++
  rememberwidth( "title", length("title") )
}

# At beginning of 2nd pass, print header.
NR!=FNR && 1==FNR {printrow(1)}

/^<title / {
  gsub( /^[^"]+"|"[^"]$/, "" )
  if ( NR==FNR )
    rememberwidth( "title", length($0) )
  else
  { title = $0
    split( "", text ) ## Erase array.
  }
  next
}

/^<\/title/ {
  if ( NR!=FNR )
    printrow()
  next
}

/^</ {
  tag = substr($0,2,2)
  gsub( /^[^>]+>|<[^<]+$/, "" )
  if ( NR!=FNR )
    text[ tag ] = $0
  else
    rememberwidth( tag, length($0) )
}

function rememberwidth( s, n )
{ if ( n > width[s] )
    width[s] = n
}

function printrow( header,      row,fmt,i)
{ fmt = "%-" width["title"] "s "
  row=sprintf(fmt,(header ? "title" : title) )
  for ( i in width )
    if ( i != "title" )
    { fmt = "%-" width[i] "s "
      row=row sprintf(fmt,
        (header ? i : text[i]))
    }
  sub( / +$/,"",row )
  print row
}

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Creating a table from an xml file

midge25

Programmer

CaKiwi

Programmer

midge25

Programmer

CaKiwi

Programmer

futurelet

Programmer

Similar threads

Part and Inventory Search

Sponsor