Selecting Lines Using Array

tbohon · Dec 22, 2004

I have two files to process (using Korn Shell). One, which I've constructed, has a series of strings (e.g., X1234, X1335, X2226, etc.) The second file - very large - has embedded within it separate lines containing these same values along with some other 'stuff' that I also need. There is a guaranteed (yeah, right!) match, i.e., every X string will always match at least one (maybe more) records in the larger file.

Question is how do I use the key value file (the one with the Xnnnn values) and make one pass through the larger file, extracting matching records?

I've tried a couple of different methods without success and a co-worker thought that awk could do it - although he wasn't sure how ...

Thanks in advance for any thoughts on this.

Tom

mikevh · Dec 22, 2004

Code:

NR == FNR {
    # Read patterns file into an array
    patterns[++i] = $0
}
{
    # Test line in second file against each pattern until match found
    found = 0
    for (j=1; j<=i && !found; j++)
        found = ($0 ~ patterns[j])
    if (found) 
        # Do something with line
        print $0
}

HTH

mikevh · Dec 22, 2004

Whoops, need a next at end of that first block (see bolded):

Code:

NR == FNR {
    # Read patterns file into an array
    patterns[++i] = $0
    [b]next[/b]
}

futurelet · Dec 22, 2004

We need to know the format of the lines in the second file; show a few lines so we can determine where the Xnnnn value will be found in the line.

Here's a start.

File "matching.awk":

Code:

NR == FNR { array[ $0 ] = 1 ; next }

$1 in array

Run with

Code:

awk -f matching.awk file1 file2 >outfile

If Xnnnn is the second field in file2, change "$1" to "$2", etc.

Let me know whether or not this helps.

If you have nawk, use it instead of awk because on some systems awk is very old and lacks many useful features. For an introduction to Awk, see faq271-5564.

tbohon · Dec 23, 2004

I can't share the actual data file due to HIPPA regulations - it contains actual (live) patient data from our hospital in HL7 format. Sorry, I know it makes it more difficult to help but I'm completely out of $10,000 checks and would like to spend my declining years outside of jail if possible ...

The line I'm interested in, if you know anything about HL7, is the MSH (Message Header) which is a pipe-delimited line. The MSH segment (line) is the first line of each HL7 message and the fields I need are MSH 3 and MSH 7.

Thanks in advance!

Tom

"My mind is like a steel whatchamacallit ...

futurelet · Dec 23, 2004

We don't need the actual data, just examples that show exactly how that data is arranged.

For example, if someone was asking us for help with a file that contained

[tt]
"John Q. Doe","123-45-6789"
$45,543.22
[/tt]
he would show us
[tt]
"foo bar","333-33-3333"
$9,999.99
[/tt]

futurelet · Dec 23, 2004

> the fields I need are MSH 3 and MSH 7.

If the fields beyond $7 in the line aren't important, show something like this:
[tt]
X1111|%99|22.22|foo|bar|AZ-BB|1234|...
[/tt]

Code:

# Pipe-delimited fields in 2nd file.
BEGIN { FS="|" }

NR == FNR { array[ $0 ] = 1 ; next }

# We're in 2nd file.  If not all lines are
# records of the type we're looking at, we
# need a way to spot the good lines.
# As a guess, I'm grabbing lines that
# don't start with whitespace.
/^[^ \t]/ {
  if ( $3 in array )
    print $3, $7
}

PHV · Dec 28, 2004

Why not simply this ?
fgrep -f /path/to/keyfile /path/to/bigfile

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244

tbohon · Dec 28, 2004

OK - I'm a little dense but I understand about showing you the lines. Sorry ...

The data lines in the larger file are of the form:

12/28/2004 00:01:22|his_in|12282004|CLxxxxxxx|MSH| ... 2.2b^MEVN| ... ^MPID|1| ... etc.

Note that I need to pull out the entire line/record - all the way to the cr/lf - since each HL7 segment is separated by the ^M character above.

Looked at man fgrep and, due to the length of the lines and paragraphs, I'm not sure it will work. Plan to give it a try, however ... and appreciate the suggestion.

"My mind is like a steel whatchamacallit ...

futurelet · Dec 29, 2004

marsd · Jan 2, 2005

Sounds like you need a pretty common solution for this group.
"I need to compare the records derived from file1 against certain fields in file2 where the FS = n and then do
action foo."

Code:

BEGIN {
      while ( (getline < filename) > 0) {
               array[a++] = $0
      }
}

      {
        for (i=1 ; i < a ; i++) {
            if (array[i] == $n || array[i] == $(n + 2)) {
               do_foo($0)
            }
        }
        print
}

HTH

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Selecting Lines Using Array

tbohon

Programmer

mikevh

Programmer

mikevh

Programmer

futurelet

Programmer

tbohon

Programmer

futurelet

Programmer

futurelet

Programmer

PHV

MIS

tbohon

Programmer

futurelet

Programmer

marsd

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor