Problem Indexing on a key character in a field

dougcranston · Mar 10, 2010

I have a pipe delimited file with 20+ fields per record.

What I need to do is match a "portion" of one field against a portion of another field, and where an exception exists write the record out to a file for investigation and correction.

The fields are not fixed length, and the key character used to establish a position reference can be repeated multiple times in the same field.

And worse yet, the number of those "key characters" can vary from one field to another in the same record.

So the issue I cannot seem to address is that the two fields contain multiple hyphens, and that the last hyphen marks the start of the data I want to check for.

Ex. for simplicity, 3 fields:

Texas-Austin-Museum|300000|Texas-Austin-County-Mueseum

So what I would like to do is compare from Field 1 is "Museum" to Field 3 "Mueseum" which would fail due to spelling, and then I would print the record out as an error.

Index(x,n) will only identify the say the first hyphen, ex. Index($1,"-") would give me the first hyphens position, but not the last which I could then use Substr to establish the starting position in that field.

Essentially I have a 20+ field file, and I need to identify records where certain fields are not exactly the same. The file contains in excess of 10000 rows, and currently, requires manual validation 3-4 times a month. Being able to mechanically do the pattern matching would reduce the time spent trying to identify issues, and alot of eye strain, and increase accuracy in the long run.

Any suggestions would be appreciated.

Thanks in advance.

Annihilannic · Mar 10, 2010

Something like this should do it:

Code:

awk -F '|' '
        {
                n1=[b]split[/b]([blue]$1[/blue],a1,[red]"[/red][purple]-[/purple][red]"[/red])
                n3=[b]split[/b]([blue]$3[/blue],a3,[red]"[/red][purple]-[/purple][red]"[/red])
                [olive]if[/olive] (a1[n1] != a3[n3]) {
                        [b]print[/b] [blue]$0[/blue] [red]"[/red][purple] is bad[/purple][red]"[/red]
                } [olive]else[/olive] {
                        [b]print[/b] [blue]$0[/blue] [red]"[/red][purple] is good[/purple][red]"[/red]
                }
        }
' inputfile

split() returns the number of array items the input is split into, so a1[n1] refers to the last element of that array.

Annihilannic.

dougcranston · Mar 11, 2010

Annihilannic.

Thank you for the quick response. Will try that out. Have unintentionally ignored split() from my consideration... No reason. Just failed to consider.

Thanks again, and have a great day.

Dougc

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Problem Indexing on a key character in a field

dougcranston

Technical User

Annihilannic

MIS

dougcranston

Technical User

Similar threads

Part and Inventory Search

Sponsor