Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Problem Indexing on a key character in a field

Status
Not open for further replies.

dougcranston

Technical User
Oct 5, 2001
326
US
I have a pipe delimited file with 20+ fields per record.

What I need to do is match a "portion" of one field against a portion of another field, and where an exception exists write the record out to a file for investigation and correction.

The fields are not fixed length, and the key character used to establish a position reference can be repeated multiple times in the same field.

And worse yet, the number of those "key characters" can vary from one field to another in the same record.

So the issue I cannot seem to address is that the two fields contain multiple hyphens, and that the last hyphen marks the start of the data I want to check for.

Ex. for simplicity, 3 fields:

Texas-Austin-Museum|300000|Texas-Austin-County-Mueseum

So what I would like to do is compare from Field 1 is "Museum" to Field 3 "Mueseum" which would fail due to spelling, and then I would print the record out as an error.

Index(x,n) will only identify the say the first hyphen, ex. Index($1,"-") would give me the first hyphens position, but not the last which I could then use Substr to establish the starting position in that field.

Essentially I have a 20+ field file, and I need to identify records where certain fields are not exactly the same. The file contains in excess of 10000 rows, and currently, requires manual validation 3-4 times a month. Being able to mechanically do the pattern matching would reduce the time spent trying to identify issues, and alot of eye strain, and increase accuracy in the long run.

Any suggestions would be appreciated.

Thanks in advance.
 
Something like this should do it:

Code:
awk -F '|' '
        {
                n1=[b]split[/b]([blue]$1[/blue],a1,[red]"[/red][purple]-[/purple][red]"[/red])
                n3=[b]split[/b]([blue]$3[/blue],a3,[red]"[/red][purple]-[/purple][red]"[/red])
                [olive]if[/olive] (a1[n1] != a3[n3]) {
                        [b]print[/b] [blue]$0[/blue] [red]"[/red][purple] is bad[/purple][red]"[/red]
                } [olive]else[/olive] {
                        [b]print[/b] [blue]$0[/blue] [red]"[/red][purple] is good[/purple][red]"[/red]
                }
        }
' inputfile

split() returns the number of array items the input is split into, so a1[n1] refers to the last element of that array.

Annihilannic.
 
Annihilannic.

Thank you for the quick response. Will try that out. Have unintentionally ignored split() from my consideration... No reason. Just failed to consider.

Thanks again, and have a great day.

Dougc
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top