Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Selecting Lines Using Array

Status
Not open for further replies.

tbohon

Programmer
Apr 20, 2000
293
US
I have two files to process (using Korn Shell). One, which I've constructed, has a series of strings (e.g., X1234, X1335, X2226, etc.) The second file - very large - has embedded within it separate lines containing these same values along with some other 'stuff' that I also need. There is a guaranteed (yeah, right!) match, i.e., every X string will always match at least one (maybe more) records in the larger file.

Question is how do I use the key value file (the one with the Xnnnn values) and make one pass through the larger file, extracting matching records?

I've tried a couple of different methods without success and a co-worker thought that awk could do it - although he wasn't sure how ... :)

Thanks in advance for any thoughts on this.

Tom
 
Code:
NR == FNR {
    # Read patterns file into an array
    patterns[++i] = $0
}
{
    # Test line in second file against each pattern until match found
    found = 0
    for (j=1; j<=i && !found; j++)
        found = ($0 ~ patterns[j])
    if (found) 
        # Do something with line
        print $0
}
HTH
 
Whoops, need a next at end of that first block (see bolded):
Code:
NR == FNR {
    # Read patterns file into an array
    patterns[++i] = $0
    [b]next[/b]
}
 
We need to know the format of the lines in the second file; show a few lines so we can determine where the Xnnnn value will be found in the line.

Here's a start.

File "matching.awk":
Code:
NR == FNR { array[ $0 ] = 1 ; next }

$1 in array
Run with
Code:
awk -f matching.awk file1 file2 >outfile

If Xnnnn is the second field in file2, change "$1" to "$2", etc.

Let me know whether or not this helps.


If you have nawk, use it instead of awk because on some systems awk is very old and lacks many useful features. For an introduction to Awk, see faq271-5564.

 
I can't share the actual data file due to HIPPA regulations - it contains actual (live) patient data from our hospital in HL7 format. Sorry, I know it makes it more difficult to help but I'm completely out of $10,000 checks and would like to spend my declining years outside of jail if possible ... :)

The line I'm interested in, if you know anything about HL7, is the MSH (Message Header) which is a pipe-delimited line. The MSH segment (line) is the first line of each HL7 message and the fields I need are MSH 3 and MSH 7.

Thanks in advance!

Tom

"My mind is like a steel whatchamacallit ...
 
We don't need the actual data, just examples that show exactly how that data is arranged.

For example, if someone was asking us for help with a file that contained

[tt]
"John Q. Doe","123-45-6789"
$45,543.22
[/tt]
he would show us
[tt]
"foo bar","333-33-3333"
$9,999.99
[/tt]
 
> the fields I need are MSH 3 and MSH 7.

If the fields beyond $7 in the line aren't important, show something like this:
[tt]
X1111|%99|22.22|foo|bar|AZ-BB|1234|...
[/tt]

Code:
# Pipe-delimited fields in 2nd file.
BEGIN { FS="|" }

NR == FNR { array[ $0 ] = 1 ; next }

# We're in 2nd file.  If not all lines are
# records of the type we're looking at, we
# need a way to spot the good lines.
# As a guess, I'm grabbing lines that
# don't start with whitespace.
/^[^ \t]/ {
  if ( $3 in array )
    print $3, $7
}
 
Why not simply this ?
fgrep -f /path/to/keyfile /path/to/bigfile

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
OK - I'm a little dense but I understand about showing you the lines. Sorry ...

The data lines in the larger file are of the form:

12/28/2004 00:01:22|his_in|12282004|CLxxxxxxx|MSH| ... 2.2b^MEVN| ... ^MPID|1| ... etc.

Note that I need to pull out the entire line/record - all the way to the cr/lf - since each HL7 segment is separated by the ^M character above.

Looked at man fgrep and, due to the length of the lines and paragraphs, I'm not sure it will work. Plan to give it a try, however ... and appreciate the suggestion.

"My mind is like a steel whatchamacallit ...
 
>The data lines in the larger file are of the form:

>12/28/2004 00:01:22|his_in|12282004|CLxxxxxxx|MSH| ... 2.2b^MEVN| ... ^MPID|1| ... etc.

Which field is the string you're trying to match against the first file? Or is a match in any field at all good enough?
 
Sounds like you need a pretty common solution for this group.
"I need to compare the records derived from file1 against certain fields in file2 where the FS = n and then do
action foo."
Code:
BEGIN {
      while ( (getline < filename) > 0) {
               array[a++] = $0
      }
}

      {
        for (i=1 ; i < a ; i++) {
            if (array[i] == $n || array[i] == $(n + 2)) {
               do_foo($0)
            }
        }
        print
}

HTH



 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top