multiple line records

pb0y · Aug 8, 2002

Hi,

I want to return all the record numbers, between TABLENAME and END pairs, if it finds a match for X in field2 of that record.

The layout is as follows:

:TABLENAME
:RECORD 1
; Field2 Field3 Field 4
01, X, 99, 4321
02, X, 99, 4322
03, X, 99, 4323
04, X, 99, 4324
05, X, 99, 4325
06, X, 99, 4326
07, X, 99, 4327
08, X, 99, 4328
09, X, 99, 4329
10, X, 99, 4330
11, X, 99, 4331
12, X, 99, 4332
13, X, 99, 4333
14, X, 99, 4334
15, X, 99, 4335
16, X, 99, 4336
17, X, 99, 4337
18, X, 99, 4338
19, X, 99, 4339
20, X, 99, 4340
21, X, 99, 4341
22, X, 99, 4342
23, X, 99, 4343
24, X, 99, 4344
:RECORD 2
:END

On the output I should get a list of record numbers:

Record 1,2,3,8,10, 30 etc

Here is what I have been playing with:

nawk -v X=$1 '
BEGIN {
RS == ":";
FS = "";
VAR="VAR"
}

/:TABLENAME/,/:END/ { if ( ":RECORD" == $1) { VAR = $2; next
{ FS = ",";
if ( X == $2) { printf ("%s\n", VAR)
}
}
}
}'

Thanks.

CaKiwi · Aug 9, 2002

I looks like I'm not the only person who doesn't understand what you want. Field 2 always contains an X so if you pass in a X you will select all records, if you pass in anything other than an X you will select no records. What am I missing? CaKiwi

marsd · Aug 9, 2002

X is a passed variable evidently(looking at the code given).

Something like
while (NR > findnr(":Tablename&quot

&& NR < findnr(":END&quot

) {
if ($2 == X) {
all = length(all) < 1 ? NR : all "," NR
}
}
END {
print all
}' filename

That's my guess at least, who knows.
I'll leave it to vgersh and cakiwi to implement findnr().
To much for me.
Have a good weekend.

pb0y · Aug 12, 2002

Hi CaKiwi and marsd,

Sorry for not being clear. What I am trying to do is this:

If :RECORD 1 has X in field 2 lines below it (X being some number which is a shell variable) then print :RECORD 1, not the actual line with the X on it. If X appears in field 2 in a record it will appear on every line until the next :RECORD. So basically I need to count all the lines between :RECORD and :RECORD as one record. I've tried setting the RS to ":" to accomplish this but am having problems with it.

Thanks for your help.

Grant · Aug 14, 2002

==================
I created the following bogus data file based on my understanding of the problem. I added junk at the top and bottom of the file, just in case
that was a possibility.
==================
% cat mydata
aaa
bbb
ccc
:TABLENAME
:RECORD 1
; Field2 Field3 Field 4
01, 13, 99, 4321
02, 12, 99, 4322
03, 13, 99, 4323
04, 14, 99, 4324
05, 13, 99, 4325
06, 13, 99, 4326
:RECORD 2
; Field2 Field3 Field 4
01, 22, 99, 4321
02, 32, 99, 4322
03, 33, 99, 4323
04, 34, 99, 4324
05, 43, 99, 4325
06, 43, 99, 4326
:RECORD 3
; Field2 Field3 Field 4
01, 11, 99, 4571
02, 22, 99, 4572
03, 13, 99, 4573
04, 14, 99, 4574
05, 13, 99, 4555
06, 16, 99, 4546
:END
ddd
eee
fff

==================
This is the script. It expects the variable 'val' to be passed from the command line.
==================
% cat printrec.awk
#!/usr/bin/awk -f
BEGIN{
FS="[ \t,]*";
}

/:TABLENAME/,/:END/{
if ($1==":RECORD&quot

{ RecNo=$2; }
if ( (NF==4) && (val == $2) ){ print RecNo; }
}

==================
The syntax to run it is: printrec.awk val=<value> <data file>.

Here is an example of running it to find the number 13 in the field 2 position.
==================
% printrec.awk val=13 mydata
1
1
1
1
3
3

==================
I hope this is what you wanted.

Grant

Grant · Aug 14, 2002

==================
This is an addendum to my previous message above.

It may very well be that the desired output should be unique. My previous solution (above) did not do that so I am offering the following alternative:
==================
% cat printrec.awk
#!/usr/bin/awk -f
BEGIN{
FS="[ \t,]*";
MaxRecNo==0;
}

/:TABLENAME/,/:END/{
if ($1==":RECORD&quot

{ RecNo=$2; MaxRecNo=$2; }
if ( (NF==4) && (val == $2) ){ RecArr[RecNo]=1; }
}

END{
for (i==1; i<=MaxRecNo; i++0)
{
if ( i in RecArr ){ print i }
}
}

==================
Based on the sample data I used in my previous message (see my previous message above) this is results of running the command for 'var=13'.
==================
printrec.2.awk val=13 mydata
1
3
==================

Grant.

pb0y · Aug 15, 2002

Thanks Grant!

I made a slight modification (NF>=3) since there could be 3 or 4 fields in the record. There is a problem sometimes the file gets messed up and empty records are added to it, when this happens if the last record is as such, the array example you provided will not work. How can I ignore this record?

:TABLENAME
:RECORD 1
; Field2 Field3 Field 4
01, 13, 99, 4321
02, 12, 99, 4322
03, 13, 99, 4323
04, 14, 99, 4324
05, 13, 99, 4325
06, 13, 99, 4326
:RECORD 2
; Field2 Field3 Field 4
01, 22, 99, 4321
02, 32, 99, 4322
03, 33, 99, 4323
04, 34, 99, 4324
05, 43, 99, 4325
06, 43, 99, 4326
:RECORD 3
; Field2 Field3 Field 4
01, 11, 99, 4571
02, 22, 99, 4572
03, 13, 99, 4573
04, 14, 99, 4574
05, 13, 99, 4555
06, 16, 99, 4546
:RECORD 1
01, 0, 0,
:END

Grant · Aug 16, 2002

--------------------------
I am not 100% sure exactly how the file might get messed up, so I will make some assumptions. In your sample data you show RECORD 1 occurring for a second time at the end of the record. I am assuming 1) that a record should never be repeated and 2) that if this happens, it will be at the end of the valid data.

If these assumptions are correct, then we can just exit as soon as we encounter a repeat of a previous record.

Here the code that implements that solution:
--------------------------
#!/usr/bin/awk -f
BEGIN{
FS="[ \t,]*";
MaxRecNo==0;
}

/:TABLENAME/,/:END/{
if ($1==":RECORD&quot

{
if ( $2 in RecArr )
{
exit;
}
RecNo=$2;
MaxRecNo=$2;
}
if ( (NF>3) && (val == $2) ){ RecArr[RecNo]=1; }
}

END{
for (i==1; i<=MaxRecNo; i++0)
{
if ( i in RecArr ){ print i; }
}
}

--------------------------
If my assumptions are incorrect, please provide as much info as you can about how it gets messed up.
--------------------------
Hope this does the trick.
Grant.

pb0y · Aug 16, 2002

Grant,

There are mutliple instances of ":RECORD 1" (when I say :RECORD 1, specifically it is a "1" and not some other RecNo in the table.)

As such the exit statement will exit the program before all the records are returned if it finds a "RECORD 1" before the end of the table

I believe this is because the database generates this as a default empty record when there is no data. But I've noticed only the last record screws up the array; if I manually delete this last record the logic works.

I changed the "exit" to "next" and it will return the all of the correct records if X exists in "RECORD 1". If X does not exist in "RECORD 1" it won't return any records. What I think I need is a way to eliminate the duplicate record numbers before putting them into the array.

Thanks for your help.

Grant · Aug 16, 2002

pb0y,

I think I see the problem. The main trouble is with the 'MaxRecNo' variable. The previous utilities were assuming that the second field of the ':RECORD _' lines would always be a number that increased, and never decreased. In other words, the lines would be in the order:

:RECORD 1
...
:RECORD 2
...
:RECORD 3
...
:RECORD 4
...

In the above case, by the time the END{} block was reached, MaxRecNo would be 4.

If instead we had a situation where the numbers suddenly decreased, that might not be a problem, as long as they increased again, as in:

:RECORD 1
...
:RECORD 2
...
:RECORD 3
...
:RECORD 1
...
:RECORD 4
...

In this case, MaxRecNo would be 4, which doesn't present a problem in the END{} block.

But in the case where the last line is a :RECORD 1, we would definitely have a problem. For example:

:RECORD 1
...
:RECORD 2
...
:RECORD 3
...
:RECORD 4
...
:RECORD 1
...

In this situation, MaxRecNo would be 1. This would cause a problem in the 'for' loop (see below) in the END{} block, because it would only loop from 1 to 1, then exit.

----------> for (i==1; i<=MaxRecNo; i++0) <----------

Anyway, my solution is to change the way MaxRecNo is assigned its values. Here is the new code:
-------------------------------
#!/usr/bin/awk -f
BEGIN{
FS="[ \t,]*";
MaxRecNo==0;
}

/:TABLENAME/,/:END/{
if ($1==":RECORD&quot

{
RecNo=$2;
if(MaxRecNo<$2){ MaxRecNo=$2; }
}
if ( (NF==4) && (val == $2) ){ RecArr[RecNo]+=1; }
}

END{
for (i==1; i<=MaxRecNo; i++0)
{
if ( i in RecArr ){ print i }
}
}
-------------------------------
Have a good weekend!
Grant.

pb0y · Aug 19, 2002

Grant,

New logic works great! Thanks for all your help.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

multiple line records

pb0y

Technical User

CaKiwi

Programmer

marsd

IS-IT--Management

pb0y

Technical User

Grant

Programmer

Grant

Programmer

pb0y

Technical User

Grant

Programmer

pb0y

Technical User

Grant

Programmer

pb0y

Technical User

Similar threads

Part and Inventory Search

Sponsor