Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

processing an xml file with awk 2

Status
Not open for further replies.

daula

Programmer
May 29, 2006
38
0
0
US
hello everyone;
i'm very new in awk and as i was reading on i wondered if awk be the solution to my problem.

first a sample of my xml file:

-- xml --
<dts:dtservices xmlns:dts="uri:dts_namespace">
<dts:tally_output>
<dts:tally dts:name="test_lrfs">
<dts:tally dts:name="cState">
<dts:tally dts:name="dc_range">
<dts:value>00-06 </dts:value>
<dts:count>1</dts:count>
</dts:tally>
<dts:tally dts:name="dc_range">
<dts:value>07-12 </dts:value>
<dts:count>2</dts:count>
</dts:tally>
<dts:tally dts:name="dc_range">
<dts:value>13-24 </dts:value>
<dts:count>3</dts:count>
</dts:tally>
<dts:value>AK</dts:value>
<dts:count>6</dts:count>
</dts:tally>
<dts:tally dts:name="cState">
<dts:tally dts:name="dc_range">
<dts:value>07-12 </dts:value>
<dts:count>5</dts:count>
</dts:tally>
<dts:tally dts:name="dc_range">
<dts:value>13-24 </dts:value>
<dts:count>6</dts:count>
</dts:tally>
<dts:value>AL</dts:value>
<dts:count>11</dts:count>
</dts:tally>
<dts:tally dts:name="cState">
<dts:tally dts:name="dc_range">
<dts:value>00-06 </dts:value>
<dts:count>3</dts:count>
</dts:tally>
<dts:tally dts:name="dc_range">
<dts:value>07-12 </dts:value>
<dts:count>2</dts:count>
</dts:tally>
<dts:tally dts:name="dc_range">
<dts:value>13-24 </dts:value>
<dts:count>1</dts:count>
</dts:tally>
<dts:value>AR</dts:value>
<dts:count>6</dts:count>
</dts:tally>
</dts:tally>
</dts:tally_output>
</dts:dtservices>
---end---

my plan is to extract data from this file ordered by "cState" producing seven coma separated columns.

col 1 = cState
col 2 = the count for 00-06 dc_range
col 3 = % (value from col 2 divided by sum of all 00-06s multiplied by a 100)
col 4 = the count for 07-12 dc_range + value from col 2
col 5 = % (value from col 4 divided by sum of all 00-06s and 07-12s multiplied by a 100)
col 6 = the count for 13-24 dc_range + value from col 4
col 7 = % (value from col 6 divided by sum of all 00-06s, 07-12s and 13-24 multiplied by a 100).
Finally, list the totals at the end of the file.

Therefore, based on the above xml, the output should look something like:

AK,1,11.11,3,23.08,6,26.09
AL,0,00.00,5,38.46,11,47.83
AR,3,33.33,5,38.46,6,26.09
Total: 9,13,23

is this possible to achieve???

btw, i have tried to process the xml via xslt transform but is very slow.

Thanks in advance for any input and assistance.

daula

 
Something like this (unix syntax)?
awk '
BEGIN{FS=" *<[^>]+>";f="%s,%d,%04.2f,%d,%04.2f,%d,%04.2f\n"}
/:name="cState">/{++r;next}
/>00-06/{c=2;next}
/>07-12/{c=4;next}
/>13-24/{c=6;next}
/:value>/{a[r,1]=$2;c=0;next}
/:count>/{a[r,c]=$2;t[c]+=$2}
END{
t[4]+=t[2];t[6]+=t[4]
for(i=1;i<=r;++i){
a[i,4]+=a[i,2];a[i,6]+=a[i,4]
printf f,a[i,1],0+a[i,2],100*a[i,2]/t[2],a[i,4],100*a[i,4]/t[4],a[i,6],100*a[i,6]/t[6]
}
printf "%s %d,%d,%d\n","Total:",t[2],t[4],t[6]
}
' /path/to/input.xml > output

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Did something like this here

thread822-1233368

Mike

"A foolproof method for sculpting an elephant: first, get a huge block of marble, then you chip away everything that doesn't look like an elephant."

 
That was a great script PH... I appreciate your imput and everone else's.
However, the listed xml was only a sample look-alike of the original xml file... the layout is pretty much the same but it can grow or shrink. That is, can contain fifty, ten or any number of "cState"s.
As well, each state can have one, two or three "dc_range"s as you might have noticed AL don't have a 00-06 dc_range.
Finally, the "count"s also varies.

In that regard, I'm looking for something generic.

I apologize for not mentioning that on my original post.

Thank you again.

daula
 
I thought the awk program I gave you handled any number of "cState"s and one, two or three "dc_range"s from 00-06,07-12 and 13-24 ...
 
Please, disregard everything else on my previous post except the first line.

Your script served my need satisfactorily.

Thanks again PH... you're my hero right now!

daula
 
Yes it does. i had used the wrong file on my first run... sorry about that.
it works like a charm!
Thanks again PH... like i said, you are my hero!

Oh! one more thing, what if i want to add four more cols... i.e tow more "dc_range"s (25-36 and 37-48) and their percentages. Applying the same concept.

Thus, the output:
AK,1,25.00,3,23.08,6,26.09,6,26.09,6,26.09
AL,0,0.00,5,38.46,11,47.83,11,47.83,11,47.83
AR,3,75.00,5,38.46,6,26.09,6,26.09,6,26.09
Total: 4,13,23,23,23
 
Typed, not tested:
BEGIN{FS=" *<[^>]+>";f="%s,%d,%04.2f,%d,%04.2f,%d,%04.2f[!],%d,%04.2f,%d,%04.2f[/!]\n"}
/:name="cState">/{++r;next}
/>00-06/{c=2;next}
/>07-12/{c=4;next}
/>13-24/{c=6;next}[!]
/>25-36/{c=8;next}
/>37-48/{c=10;next}[/!]
/:value>/{a[r,1]=$2;c=0;next}
/:count>/{a[r,c]=$2;t[c]+=$2}
END{
t[4]+=t[2];t[6]+=t[4][!];t[8]+=t[6];t[10]+=t[8][/!]
for(i=1;i<=r;++i){
a[i,4]+=a[i,2];a[i,6]+=a[i,4][!];a[i,8]+=a[i,6];a[i,10]+=a[i,8][/!]
printf f,a[i,1],0+a[i,2],100*a[i,2]/t[2],a[i,4],100*a[i,4]/t[4],a[i,6],100*a[i,6]/t[6][!],a[i,8],100*a[i,8]/t[8],a[i,10],100*a[i,10]/t[10][/!]
}
printf "%s %d,%d,%d[!],%d,%d[/!]\n","Total:",t[2],t[4],t[6][!],t[8],t[10][/!]
}

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
works like a charm.
thank you very much.

daula
 
Hi PHV;
I've another request...
I'm, trying to list all 'count's that happened in the last 13 months.
NOTE: 'value' is in the form of year and month... i.e YYYYMM (e.g 200602 == Feb. 2006)

The last col of the output should contain the sum for each row.
As well, the last row should contain the sum for each col.

please, see the xml snip at the bottom of this listing.

you assistance is highly appreciated.

sample output:
---------------
AK,1,4,8,6,1,4,2,6,1,4,10,3,2,52
AL,24,19,17,13,7,10,10,25,8,8,21,7,5,174
AR,9,8,20,14,13,4,4,14,12,4,13,10,4,129
AZ,78,52,93,96,92,51,51,67,49,32,72,45,41,819
CA,...
CO,...
.
.
.
TOTAL: 122,83,.,.,.,...,1174

[tally.xml]
<?xml version="1.0" encoding="utf-8" ?>
<dps:services xmlns:dps="uri:dps_namespace">
<dps:tally_output>

<dps:tally dps:name="test_tally">
<dps:tally dps:name="cState">
<dps:tally dps:name="Ccurrent_date">
<dps:value>200602</dps:value> <== YYYYMM year and month
<dps:count>1</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200601</dps:value>
<dps:count>4</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200512</dps:value>
<dps:count>8</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200511</dps:value>
<dps:count>6</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200510</dps:value>
<dps:count>1</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200509</dps:value>
<dps:count>4</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200508</dps:value>
<dps:count>2</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200507</dps:value>
<dps:count>6</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200506</dps:value>
<dps:count>1</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200505</dps:value>
<dps:count>4</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200504</dps:value>
<dps:count>10</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200503</dps:value>
<dps:count>3</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200502</dps:value>
<dps:count>2</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200501</dps:value>
<dps:count>7</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200412</dps:value>
<dps:count>3</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200411</dps:value>
<dps:count>4</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200410</dps:value>
<dps:count>6</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200407</dps:value>
<dps:count>6</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200405</dps:value>
<dps:count>1</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200404</dps:value>
<dps:count>6</dps:count>
</dps:tally>
<dps:tally dps:name="Ccurrent_date">
<dps:value>200403</dps:value>
<dps:count>1</dps:count>
</dps:tally>
<dps:value>AK</dps:value>
<dps:count>86</dps:count>
</dps:tally>
<dps:tally dps:name="cState">
.
.
.
</dps:tally>
</dps:tally>
</dps:tally_output>
</dps:services>
 
Have you tried to modify PHV's script to do that? It should be quite easy? Without looking into it too deeply it sounds like you just need to add extra fields to the printf statements (and the f= format specifier).

Annihilannic.
 
thanks,
actually i did.
i added some more fields and had to change a few calculations and it works.
sorry, for not lettin ya'll know promptly.

i appreciate your response.

daula
 
hi everyone;
i need some more help...
the layout of my xml has changed thus, the requirements as well.
i have built on PVH's script and came up with something like the one listed at the bottom.
this script is supposed to transform an xml file (snip listed bellow)...
1.creates 12 comma separated cols
(Ccurrent_date,000-000,000-005,005-010,010-015,015-020,020-025,025-050,050-075,075-100,100-150,150-200)

2.the cols cummulate from right to left... for instance, the first block of 'Ccurrent_date' with value 200602 have the following data:
000-005: 305, 005-010: 478, 010-015: 707, 015-020: 219, 020-025: 941, 025-050: 378, 050-075: 89, 075-100: 5

from the above block the result looks like:
200602,3122,2817,2339,1632,1413,472,94,5,0,0,0

3. then, after the 4th block of Ccurrent_date repeat print a summary row which is same as 4th row (first if statement)

4. after the 7th block print another summary row wich is the output of 7th block - 4th block (second if statement)

5. in the third if statement is where i landed into problem.
beginning from the 8th block of Ccurrent_date, the would be outputed after every 6 block. the calculation should be cumulative count from the current iteration minus the 6th iteration ago.
the output generated by this if statement has a wrong summary row (in bold):
7-12,27835,26099,22833,17501,15504,9098,2423,110,0,0,0
200501,37107,34843,30455,23238,20537,12041,3186,146,0,0,0
200412,38847,36445,31834,24286,21460,12534,3344,150,0,0,0
200411,39808,37360,32633,24900,21989,12866,3431,152,0,0,0
200410,42411,39737,34647,26332,23268,13415,3564,157,0,0,0
200409,43141,40416,35224,26764,23665,13595,3612,158,0,0,0
200408,43991,41206,35868,27251,24084,13820,3675,161,0,0,0

13-18,16156,15107,13035,9750,8580,4722,1252,51,0,0,0
200407,47184,44151,38340,28964,25629,14505,3834,167,0,0,0
200406,47645,44578,38709,29233,25862,14649,3879,169,0,0,0
200405,48474,45356,39376,29730,26297,14928,3950,170,0,0,0
200404,50828,47562,41256,31039,27500,15501,4100,177,0,0,0
200403,53175,49760,43108,32370,28710,16044,4238,181,0,0,0

instead the two summary (bold) line should have been:
7-12,13590,12843,11399,8843,7926,4602,1093,39,0,0,0
13-18,9887,9189,7883,5853,5185,2674,726,25,0,0,0

any help to fix this error is appreciated.
thanks in advance.

Code:
<?xml version="1.0" encoding="utf-8" ?>
<dps:services xmlns:dps="uri:namespace">
<dps:tally_output>

<dps:tally dps:name="test_tally">
  <dps:tally dps:name="Ccurrent_date">
    <dps:tally dps:name="currdollar">
      <dps:value>000-005</dps:value>
      <dps:count>305</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>005-010</dps:value>
      <dps:count>478</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>010-015</dps:value>
      <dps:count>707</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>015-020</dps:value>
      <dps:count>219</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>020-025</dps:value>
      <dps:count>941</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>025-050</dps:value>
      <dps:count>378</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>050-075</dps:value>
      <dps:count>89</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>075-100</dps:value>
      <dps:count>5</dps:count>
    </dps:tally>
    <dps:value>200602</dps:value>
    <dps:count>3122</dps:count>
  </dps:tally>
  <dps:tally dps:name="Ccurrent_date">
    <dps:tally dps:name="currdollar">
      <dps:value>000-005</dps:value>
      <dps:count>146</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>005-010</dps:value>
      <dps:count>284</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>010-015</dps:value>
      <dps:count>449</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>015-020</dps:value>
      <dps:count>205</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>020-025</dps:value>
      <dps:count>548</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>025-050</dps:value>
      <dps:count>723</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>050-075</dps:value>
      <dps:count>358</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>075-100</dps:value>
      <dps:count>26</dps:count>
    </dps:tally>
    <dps:value>200601</dps:value>
    <dps:count>2739</dps:count>
  </dps:tally>
  <dps:tally dps:name="Ccurrent_date">
    <dps:tally dps:name="currdollar">
      <dps:value>000-005</dps:value>
      <dps:count>168</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>005-010</dps:value>
      <dps:count>407</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>010-015</dps:value>
      <dps:count>636</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>015-020</dps:value>
      <dps:count>312</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>020-025</dps:value>
      <dps:count>530</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>025-050</dps:value>
      <dps:count>1070</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>050-075</dps:value>
      <dps:count>441</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:value>075-100</dps:value>
      <dps:count>25</dps:count>
    </dps:tally>
    <dps:value>200512</dps:value>
    <dps:count>3589</dps:count>
  </dps:tally>
  .
  .
  .
</dps:tally>

</dps:tally_output>
</dps:services>

------------------------

awk '
BEGIN{FS=" *<[^>]+>";f="%s,%d,%d,%d,%d,%d,%d,%d,%d,%d,%d,%d\n"}
/:name="Ccurrent_date">/{++r;next}
/>000-000/{c=2;next}
/>000-005/{c=4;next}
/>005-010/{c=6;next}
/>010-015/{c=8;next}
/>015-020/{c=10;next}
/>020-025/{c=12;next}
/>025-050/{c=14;next}
/>050-075/{c=16;next}
/>075-100/{c=18;next}
/>100-150/{c=20;next}
/>150-200/{c=22;next}

/:value>/{a[r,1]=$2;c=0;next}
/:count>/{a[r,c]=$2;t[c]+=$2}
END{

 j=0;
 dat=""

 col1=0;col2=0;col3=0;col4=0;col5=0;col6=0;col7=0;col8=0;col9=0;col10=0;col11=0

 for(i=1;i<=r;++i)
 {
    j=i-1
    dat=a[i,1]
    a[i,2]=a[i,4]+a[i,6]+a[i,8]+a[i,10]+a[i,12]+a[i,14]+a[i,16]+a[i,18]+a[i,20]+a[i,22]+col1
    a[i,4]=a[i,6]+a[i,8]+a[i,10]+a[i,12]+a[i,14]+a[i,16]+a[i,18]+a[i,20]+a[i,22]+col2
    a[i,6]=a[i,8]+a[i,10]+a[i,12]+a[i,14]+a[i,16]+a[i,18]+a[i,20]+a[i,22]+col3
    a[i,8]=a[i,10]+a[i,12]+a[i,14]+a[i,16]+a[i,18]+a[i,20]+a[i,22]+col4
    a[i,10]=a[i,12]+a[i,14]+a[i,16]+a[i,18]+a[i,20]+a[i,22]+col5
    a[i,12]=a[i,14]+a[i,16]+a[i,18]+a[i,20]+a[i,22]+col6
    a[i,14]=a[i,16]+a[i,18]+a[i,20]+a[i,22]+col7
    a[i,16]=a[i,18]+a[i,20]+a[i,22]+col8
    a[i,18]=a[i,20]+a[i,22]+col9
    a[i,20]=a[i,22]+col10
    a[i,22]+=col11

    printf f,dat,0+a[i,2],a[i,4],a[i,6],a[i,8],a[i,10],a[i,12],a[i,14],a[i,16],a[i,18],a[i,20],a[i,22]
    col1=a[i,2];col2=a[i,4];col3=a[i,6];col4=a[i,8];col5=a[i,10];col6=a[i,12];col7=a[i,14];col8=a[i,16];col9=a[i,18];col10=a[i,20];col11=a[i,22]
    if(i==4)
    {
       dat="0-3"
       printf "%s\n", ""
       printf f,dat,0+a[i,2],a[i,4],a[i,6],a[i,8],a[i,10],a[i,12],a[i,14],a[i,16],a[i,18],a[i,20],a[i,22]
    }
  if(i==7)
    {
       dat="4-6"
       k=4

       a[i,2]-=a[k,2]
       a[i,4]-=a[k,4]
       a[i,6]-=a[k,6]
       a[i,8]-=a[k,8]
       a[i,10]-=a[k,10]
       a[i,12]-=a[k,12]
       a[i,14]-=a[k,14]
       a[i,16]-=a[k,16]
       a[i,18]-=a[k,18]
       a[i,20]-=a[k,20]
       a[i,22]-=a[k,22]
       printf "%s\n", ""
       printf f,dat,0+a[i,2],a[i,4],a[i,6],a[i,8],a[i,10],a[i,12],a[i,14],a[i,16],a[i,18],a[i,20],a[i,22]
    }
[b]
    if(i>7 && (j%6)==0) # every other 6th row
    {
       k=i-6
       dat=sprintf("%d-%d",k,i-1)

       a[i,2]-=a[k,2]
       a[i,4]-=a[k,4]
       a[i,6]-=a[k,6]
       a[i,8]-=a[k,8]
       a[i,10]-=a[k,10]
       a[i,12]-=a[k,12]
       a[i,14]-=a[k,14]
       a[i,16]-=a[k,16]
       a[i,18]-=a[k,18]
       a[i,20]-=a[k,20]
       a[i,22]-=a[k,22]
       printf "%s\n", ""
       printf f,dat,0+a[i,2],a[i,4],a[i,6],a[i,8],a[i,10],a[i,12],a[i,14],a[i,16],a[i,18],a[i,20],a[i,22]
    }
[/b]
 }
}
' input_xml_file
 
hello again;
i managed to fix the problem/ error that i mentioned on my previous post...
i'm getting the right output but the script looks kinda messy.
can someone please help me in cleaning up/ refactoring the script.

thanks again

daula

Code:
awk '
BEGIN{FS=" *<[^>]+>";f="%s,%d,%d,%d,%d,%d,%d,%d,%d,%d,%d,%d\n"}
/:name="Ccurrent_date">/{++r;next}
/>000-000/{c=2;next}
/>000-005/{c=4;next}
/>005-010/{c=6;next}
/>010-015/{c=8;next}
/>015-020/{c=10;next}
/>020-025/{c=12;next}
/>025-050/{c=14;next}
/>050-075/{c=16;next}
/>075-100/{c=18;next}
/>100-150/{c=20;next}
/>150-200/{c=22;next}

/:value>/{a[r,1]=$2;c=0;next}
/:count>/{a[r,c]=$2;t[c]+=$2}
END{

 j=0;
 dat=""

 col1=0;col2=0;col3=0;col4=0;col5=0;col6=0;col7=0;col8=0;col9=0;col10=0;col11=0

 for(i=1;i<=r;++i)
 {
    j=i-1
    dat=a[i,1]
    a[i,2]=a[i,4]+a[i,6]+a[i,8]+a[i,10]+a[i,12]+a[i,14]+a[i,16]+a[i,18]+a[i,20]+a[i,22]+col1
    a[i,4]=a[i,6]+a[i,8]+a[i,10]+a[i,12]+a[i,14]+a[i,16]+a[i,18]+a[i,20]+a[i,22]+col2
    a[i,6]=a[i,8]+a[i,10]+a[i,12]+a[i,14]+a[i,16]+a[i,18]+a[i,20]+a[i,22]+col3
    a[i,8]=a[i,10]+a[i,12]+a[i,14]+a[i,16]+a[i,18]+a[i,20]+a[i,22]+col4
    a[i,10]=a[i,12]+a[i,14]+a[i,16]+a[i,18]+a[i,20]+a[i,22]+col5
    a[i,12]=a[i,14]+a[i,16]+a[i,18]+a[i,20]+a[i,22]+col6
    a[i,14]=a[i,16]+a[i,18]+a[i,20]+a[i,22]+col7
    a[i,16]=a[i,18]+a[i,20]+a[i,22]+col8
    a[i,18]=a[i,20]+a[i,22]+col9
    a[i,20]=a[i,22]+col10
    a[i,22]+=col11

    printf f,dat,0+a[i,2],a[i,4],a[i,6],a[i,8],a[i,10],a[i,12],a[i,14],a[i,16],a[i,18],a[i,20],a[i,22]
    col1=a[i,2];col2=a[i,4];col3=a[i,6];col4=a[i,8];col5=a[i,10];col6=a[i,12];col7=a[i,14];col8=a[i,16];col9=a[i,18];col10=a[i,20];col11=a[i,22]
    
    if(i==4)
    {
       dat="0-3"
       printf "%s\n", ""
       printf f,dat,0+a[i,2],a[i,4],a[i,6],a[i,8],a[i,10],a[i,12],a[i,14],a[i,16],a[i,18],a[i,20],a[i,22]
    }

    if(i==7)
    {
       dat="4-6"
       k=4
       t1=a[i,2];t2=a[i,4];t3=a[i,6];t4=a[i,8];t5=a[i,10];t6=a[i,12];t7=a[i,14];t8=a[i,16];t9=a[i,18];t10=a[i,20];t11=a[i,22]
       t1-=a[k,2]
       t2-=a[k,4]
       t3-=a[k,6]
       t4-=a[k,8]
       t5-=a[k,10]
       t6-=a[k,12]
       t7-=a[k,14]
       t8-=a[k,16]
       t9-=a[k,18]
       t10-=a[k,20]
       t11-=a[k,22]
       printf f,dat,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,t11
    }

    if(i>7&& (j%6)==0)
    {
       k=i-6
       dat=sprintf("%d-%d",k,i-1)
       t1=a[i,2];t2=a[i,4];t3=a[i,6];t4=a[i,8];t5=a[i,10];t6=a[i,12];t7=a[i,14];t8=a[i,16];t9=a[i,18];t10=a[i,20];t11=a[i,22]
       t1-=a[k,2]
       t2-=a[k,4]
       t3-=a[k,6]
       t4-=a[k,8]
       t5-=a[k,10]
       t6-=a[k,12]
       t7-=a[k,14]
       t8-=a[k,16]
       t9-=a[k,18]
       t10-=a[k,20]
       t11-=a[k,22]
       printf f,dat,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,t11
    }
 }
}
' xml_file
 
Your code looks pretty neat actually.

Personally I would use an array col[1], col[2], etc instead of col1, col2, etc. Similarly for t1, t3, etc.

Then you could replace code like:

Code:
col1=a[i,2];col2=a[i,4];col3=a[i,6];col4=a[i,8];col5=a[i,10];col6=a[i,12];col7=a[i,14];col8=a[i,16];col9=a[i,18];col10=a[i,20];col11=a[i,22]

with something like:

Code:
for (m=1; m<=11; m++) { col[m]=a[i,m*2) }

Annihilannic.
 
thanks for your contribution.

daula
 
hello there;
once more, the requirements has changed due to the change in xml layout.
i'm wondering how i can modify my previously listed awk script to transform the bellow listed xml.

as you may notice the depth of the xml has grown.
however, the script should still generate 12 comma separated cols (Ccurrent_date,000-000,000-005,005-010,010-015,015-020,020-025,025-050,050-075,075-100,100-150,150-200)

2a.the cols cummulate from right to left... BUT this time use the 'Gender's 'Count' for every 'currdollar'.

2b. list M(ales) first, F(emales) sencond and unkowns (anything else other than M and F) third.

2c. all unknowns within a 'currdollar' to be added together and produce just one instance of U(nkown).


Sample output:
--------------
MALE
200602,515,458,376,267,233,102,18,0,0,0,0
200601,.,.,.,.,.,.,.
200512,.,.,.,.,.,.,.
.
.
FEMALE
200602,2198,1978,1648,1158,998,315,64,5,0,0,0
200601,.,.,.,.,.,.,.
200512,.,.,.,.,.,.,.
.
.
UKNOWN
200602,409,381,315,207,182,55,12,0,0,0,0
200601,.,.,.,.,.,.,.
200512,.,.,.,.,.,.,.
.
.

any help is highly appreciated
thanks in advance

daula

Code:
<?xml version="1.0" encoding="utf-8" ?>
<dps:services xmlns:dps="uri:dts_namespace">>
<dps:tally_output>

<dps:tally dps:name="test_lrfs">
  <dps:tally dps:name="Ccurrent_date">
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>U</dps:value>
        <dps:count>4</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>57</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>I</dps:value>
        <dps:count>5</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>F</dps:value>
        <dps:count>220</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>C</dps:value>
        <dps:count>1</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>A</dps:value>
        <dps:count>18</dps:count>
      </dps:tally>
      <dps:value>000-005</dps:value>
      <dps:count>305</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>U</dps:value>
        <dps:count>18</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>82</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>I</dps:value>
        <dps:count>14</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>F</dps:value>
        <dps:count>330</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>C</dps:value>
        <dps:count>1</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>A</dps:value>
        <dps:count>33</dps:count>
      </dps:tally>
      <dps:value>005-010</dps:value>
      <dps:count>478</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>U</dps:value>
        <dps:count>20</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>109</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>I</dps:value>
        <dps:count>25</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>F</dps:value>
        <dps:count>490</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>C</dps:value>
        <dps:count>6</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>B</dps:value>
        <dps:count>1</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>A</dps:value>
        <dps:count>56</dps:count>
      </dps:tally>
      <dps:value>010-015</dps:value>
      <dps:count>707</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>U</dps:value>
        <dps:count>8</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>34</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>I</dps:value>
        <dps:count>3</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>F</dps:value>
        <dps:count>160</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>C</dps:value>
        <dps:count>1</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>A</dps:value>
        <dps:count>13</dps:count>
      </dps:tally>
      <dps:value>015-020</dps:value>
      <dps:count>219</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>U</dps:value>
        <dps:count>30</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>131</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>I</dps:value>
        <dps:count>17</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>F</dps:value>
        <dps:count>683</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>C</dps:value>
        <dps:count>6</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>A</dps:value>
        <dps:count>74</dps:count>
      </dps:tally>
      <dps:value>020-025</dps:value>
      <dps:count>941</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>U</dps:value>
        <dps:count>12</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>84</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>I</dps:value>
        <dps:count>5</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>F</dps:value>
        <dps:count>251</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>C</dps:value>
        <dps:count>4</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>B</dps:value>
        <dps:count>1</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>A</dps:value>
        <dps:count>20</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>7</dps:value>
        <dps:count>1</dps:count>
      </dps:tally>
      <dps:value>025-050</dps:value>
      <dps:count>378</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>U</dps:value>
        <dps:count>2</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>18</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>I</dps:value>
        <dps:count>1</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>F</dps:value>
        <dps:count>59</dps:count>
      </dps:tally>
      <dps:tally dps:name="Gender">
        <dps:value>A</dps:value>
        <dps:count>9</dps:count>
      </dps:tally>
      <dps:value>050-075</dps:value>
      <dps:count>89</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>F</dps:value>
        <dps:count>5</dps:count>
      </dps:tally>
      <dps:value>075-100</dps:value>
      <dps:count>5</dps:count>
    </dps:tally>
    <dps:value>200602</dps:value>
    <dps:count>3122</dps:count>
  </dps:tally>
  .
  .
  .

</dps:tally>
</dps:tally_output>
</dps:dtservices>
 
hello everyone;
i thought i refine my earlier question.
i need help in modifying the script listed at the bottom such that;
display M(ales) based on 'Ccurrent_date' and 'currdollar'.

the coma separated cols remains the same i.e (Ccurrent_date,000-000,000-005,005-010,010-015,015-020,020-025,025-050,050-075,075-100,100-150,150-200)

then, for-each 'Ccurrent_date' based on 'currdollar', get the 'Count' for the M 'Value' under 150-200 and display that Count on 150-200 col, get the M's Count under 100-150 then add to the previously result and display the result on the 100-150 col and so on. in other words, the result cummulate from right to left (from 150-200 to 000-000).
000-000= sum of (000-000,000-005,005-010,010-015,015-020,020-025,025-050,050-075,075-100,100-150 and 150-200)
000-005= sum of (000-005,005-010,010-015,015-020,020-025,025-050,050-075,075-100,100-150 and 150-200)
005-010= sum of (005-010,010-015,015-020,020-025,025-050,050-075,075-100,100-150 and 150-200)
010-015= sum of (010-015,015-020,020-025,025-050,050-075,075-100,100-150 and 150-200)
etc, etc

any advice, hint, assistance are all appreciated.
thanks in advance.

daula
Code:
Sample output:
--------------
MALE
200602,515,458,376,267,233,102,18,0,0,0,0
200601,1105,1011,861,658,586,363,115,7,0,0,0
200512,.,.,.,.,.,.,

***snip tally.xml***
--------------------
<?xml version="1.0" encoding="utf-8" ?>
<dps:services xmlns:dps="uri:dps_namespace">>
<dps:tally_output>

<dps:tally dps:name="test_lrfs">
  <dps:tally dps:name="Ccurrent_date">
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>57</dps:count>
      </dps:tally>
      <dps:value>000-005</dps:value>
      <dps:count>57</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>82</dps:count>
      </dps:tally>
      <dps:value>005-010</dps:value>
      <dps:count>82</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>109</dps:count>
      </dps:tally>
      <dps:value>010-015</dps:value>
      <dps:count>109</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>34</dps:count>
      </dps:tally>
      <dps:value>015-020</dps:value>
      <dps:count>34</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>131</dps:count>
      </dps:tally>
      <dps:value>020-025</dps:value>
      <dps:count>131</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>84</dps:count>
      </dps:tally>
      <dps:value>025-050</dps:value>
      <dps:count>84</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>18</dps:count>
      </dps:tally>
      <dps:value>050-075</dps:value>
      <dps:count>18</dps:count>
    </dps:tally>
    <dps:value>200602</dps:value>
    <dps:count>3122</dps:count>
  </dps:tally>
  <dps:tally dps:name="Ccurrent_date">
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>37</dps:count>
      </dps:tally>
      <dps:value>000-005</dps:value>
      <dps:count>146</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>68</dps:count>
      </dps:tally>
      <dps:value>005-010</dps:value>
      <dps:count>284</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>94</dps:count>
      </dps:tally>
      <dps:value>010-015</dps:value>
      <dps:count>449</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>38</dps:count>
      </dps:tally>
      <dps:value>015-020</dps:value>
      <dps:count>205</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>92</dps:count>
      </dps:tally>
      <dps:value>020-025</dps:value>
      <dps:count>548</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>164</dps:count>
      </dps:tally>
      <dps:value>025-050</dps:value>
      <dps:count>723</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>90</dps:count>
      </dps:tally>
      <dps:value>050-075</dps:value>
      <dps:count>358</dps:count>
    </dps:tally>
    <dps:tally dps:name="currdollar">
      <dps:tally dps:name="Gender">
        <dps:value>M</dps:value>
        <dps:count>7</dps:count>
      </dps:tally>
      <dps:value>075-100</dps:value>
      <dps:count>26</dps:count>
    </dps:tally>
    <dps:value>200601</dps:value>
    <dps:count>2739</dps:count>
  </dps:tally>
  .
  .
  .

</dps:tally>
</dps:tally_output>
</dps:services>

***tally.sh***
--------------
awk '
BEGIN{FS=" *<[^>]+>";f="%s,%d,%d,%d,%d,%d,%d,%d,%d,%d,%d,%d\n"}
/:name="Ccurrent_date">/{++r;next}
/>000-000/{c=2;next}
/>000-005/{c=4;next}
/>005-010/{c=6;next}
/>010-015/{c=8;next}
/>015-020/{c=10;next}
/>020-025/{c=12;next}
/>025-050/{c=14;next}
/>050-075/{c=16;next}
/>075-100/{c=18;next}
/>100-150/{c=20;next}
/>150-200/{c=22;next}

/:value>/{a[r,1]=$2;c=0;next}
/:count>/{a[r,c]=$2;t[c]+=$2}
END{

 j=0;
 dat=""

 for(i=1;i<=r;++i)
 {
    j=i-1
    dat=a[i,1]
    a[i,2]=a[i,4]+a[i,6]+a[i,8]+a[i,10]+a[i,12]+a[i,14]+a[i,16]+a[i,18]+a[i,20]+a[i,22]+col1
    a[i,4]=a[i,6]+a[i,8]+a[i,10]+a[i,12]+a[i,14]+a[i,16]+a[i,18]+a[i,20]+a[i,22]+col2
    a[i,6]=a[i,8]+a[i,10]+a[i,12]+a[i,14]+a[i,16]+a[i,18]+a[i,20]+a[i,22]+col3
    a[i,8]=a[i,10]+a[i,12]+a[i,14]+a[i,16]+a[i,18]+a[i,20]+a[i,22]+col4
    a[i,10]=a[i,12]+a[i,14]+a[i,16]+a[i,18]+a[i,20]+a[i,22]+col5
    a[i,12]=a[i,14]+a[i,16]+a[i,18]+a[i,20]+a[i,22]+col6
    a[i,14]=a[i,16]+a[i,18]+a[i,20]+a[i,22]+col7
    a[i,16]=a[i,18]+a[i,20]+a[i,22]+col8
    a[i,18]=a[i,20]+a[i,22]+col9
    a[i,20]=a[i,22]+col10
    a[i,22]+=col11

    printf f,dat,0+a[i,2],a[i,4],a[i,6],a[i,8],a[i,10],a[i,12],a[i,14],a[i,16],a[i,18],a[i,20],a[i,22]
    for (m=1; m<=11; m++) { col[m]=a[i,m*2) }
    if(i==4)
    {
       dat="0-3"
       printf "%s\n", ""
       printf f,dat,0+a[i,2],a[i,4],a[i,6],a[i,8],a[i,10],a[i,12],a[i,14],a[i,16],a[i,18],a[i,20],a[i,22]
    }

    if(i==7)
    {
       dat="4-6"
       k=4
       for (m=1; m<=11; m++) { t[m]=a[i,m*2) }
       t[1]-=a[k,2]
       t[2]-=a[k,4]
       t[3]-=a[k,6]
       t[4]-=a[k,8]
       t[5]-=a[k,10]
       t[6]-=a[k,12]
       t[7]-=a[k,14]
       t[8]-=a[k,16]
       t[9]-=a[k,18]
       t[10]-=a[k,20]
       t[11]-=a[k,22]
       printf f,dat,t[1],t[2],t[3],t[4],t[5],t[6],t[7],t[8],t[9],t[10],t[11]
    }

    if(i>7&& (j%6)==0)
    {
       k=i-6
       dat=sprintf("%d-%d",k,i-1)
       for (m=1; m<=11; m++) { t[m]=a[i,m*2) }
       t1-=a[k,2]
       t2-=a[k,4]
       t3-=a[k,6]
       t4-=a[k,8]
       t5-=a[k,10]
       t6-=a[k,12]
       t7-=a[k,14]
       t8-=a[k,16]
       t9-=a[k,18]
       t10-=a[k,20]
       t11-=a[k,22]
       printf f,dat,t[1],t[2],t[3],t[4],t[5],t[6],t[7],t[8],t[9],t[10],t[11]
    }
 }
}
' tally.xml
 
hello;
anyone here?? i believe there're some wonderful people out here who can help with my previous post.

looking forward...
thanks

giga
 
Hi daula7,

I think the problem is that you aren't really looking for tek tips so much as someone to do the job for you... I think you have enough information about how to do what you want, all that's needed now is to go through the tedious procedure of getting it right. Have you tried changing it to these new requirements? Are your changes not working? Is there any specific part that you need help with?

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top