Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Creating a table from an xml file

Status
Not open for further replies.

midge25

Programmer
Jul 19, 2002
9
Hi,

I have a large xml file which looks as follows:

<title id="T6352">
<en>Power Distribution</en>
<es>Distribución de alimentación</es>
<el>??a??µ? ?s????</el>
<cs>Usporádání elektrického napájení</cs>
<de>Stromverteilung</de>
<fi>Virranjako</fi>
<fr>Distribution d’alimentation</fr>
<hu>Áramelosztás</hu>
<it>Distribuzione di alimentazione elettrica</it>
<ja>???????????????</ja>
<nl>Spanningsverdeling</nl>
<no>Strømfordeling</no>
<pl>Dystrybucja zasilania</pl>
<ru>????????????? ?????????????? ???????</ru>
<tr>Güç Dagitimi</tr>
<sv>Spänningsfördelning</sv>
<pt>Distribuição de corrente</pt>
<da>Strømfordeling</da>
</title>


I wish to extract the data and output it to a table format as follows:

Title en es el cs de fi …….. sv pt da

I’m quite new to awk and know that I would have to use associative arrays.. (that’s what I’ve read)….
 
Try this. You will need to add the rest of your tags to the print statement and may want to improve the formatting by using a printf instead of a print.

/<title/{
sub(/^[^"]*"/,"")
sub(/">$/,"")
title = $0
next
}
/<\/title/{
print title,a["en"],a["es"],a["el"],a["cs"] # add remaining tags
next
}
{
ix = substr($0,2,2)
sub(/^<[^>]*>/,"")
sub(/<.*>$/,"")
a[ix] = $0
}

CaKiwi
 
That works great. There is a light problem, the tags are not always the same. These can vary from:

<es>
<nl>
<fe>
<cr>

e.t.c

Can I not change the script above so that it will search for all tags and pass these into an array and then print out as I wish??
 
If you don't care what order they print in try this (untested)

/<\/title/{
printf title
for (tag in a) printf " " tag
print ""
next
}

Otherwise save the tags in another array as you find them

b[++n] = ix

and use

for (i=1;i<=n;i++) printf " " tag

Post back if this is not clear.


CaKiwi
 
Code:
BEGIN{
  # Read the file twice to determine width of
  # each column.
  ARGV[ARGC]=ARGV[ARGC-1] ; ARGC++
  rememberwidth( "title", length("title") )
}

# At beginning of 2nd pass, print header.
NR!=FNR && 1==FNR {printrow(1)}

/^<title / {
  gsub( /^[^"]+"|"[^"]$/, "" )
  if ( NR==FNR )
    rememberwidth( "title", length($0) )
  else
  { title = $0
    split( "", text ) ## Erase array.
  }
  next
}

/^<\/title/ {
  if ( NR!=FNR )
    printrow()
  next
}

/^</ {
  tag = substr($0,2,2)
  gsub( /^[^>]+>|<[^<]+$/, "" )
  if ( NR!=FNR )
    text[ tag ] = $0
  else
    rememberwidth( tag, length($0) )
}

function rememberwidth( s, n )
{ if ( n > width[s] )
    width[s] = n
}

function printrow( header,      row,fmt,i)
{ fmt = "%-" width["title"] "s "
  row=sprintf(fmt,(header ? "title" : title) )
  for ( i in width )
    if ( i != "title" )
    { fmt = "%-" width[i] "s "
      row=row sprintf(fmt,
        (header ? i : text[i]))
    }
  sub( / +$/,"",row )
  print row
}
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top