Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Pattern at variable position within line 1

Status
Not open for further replies.
Oct 10, 2003
2,323
US
I looked through several pages for an example, but without success, so here's my situation. I'm using grep to locate records with the keyword "PCN". I want to pipe that result to awk to find/list/print the next field following the keyword. However, the keyword (PCN) can occur at almost any position within the record (never the first position, but that hardly matters).

Just so you know the purpose, this is scanning across a printed report. The report supplier is going to increase the size of the field following the PCN and I want to be able to detect the first time they send using the new format so that we can do some validation. The PCN is an account number, and existing accounts will continue to use their shorter number, so locating the new accounts has great value to the testers and auditors. Thanks!

==================================
adaptive uber info galaxies (bigger, better, faster, and more adept than agile big data clouds)


 
Hi

Could we see some sample input please ?

For now I am not sure whether you are talking about single line or multi-line records. ( Actually, I am not sure neither about the need for Awk. )


Feherke.
feherke.ga
 
Here is some sample data (output of the grep). Obviously, real account numbers have been replaced. In this example, SCOTT J with account number FFFFFFFFFFFF has the new account number. Assuming they provide the revised report in the way they said they would. At this point, I'm just interested in getting the field after the PCN which I will then throw into a file for further analysis (length comes to mind, but since they might change other things in the report format, who knows what else I'll need to check). Since some people have 0 to 3 middle initials, or two last names, I can't use positionals like $1, $2, etc. I suppose I could "throw away" everything before the PCN and then take the first field.....lemme try that.

MCDSDFUZ0701anyq0000.txt: NAME RAYKIN I PCN 00000AAAAAA -04 SERVICE FROM 20150521 THRU 20150521
MCDSDFUZ0701anyq0000.txt: NAME RIVELL D F PCN 00000BBBBBB -06 SERVICE FROM 20150521 THRU 20150521
MCDSDFUZ0701anyq0000.txt: NAME RIZZO R M PCN CCCCCCCCCCC -05 SERVICE FROM 20150519 THRU 20150519
MCDSDFUZ0701anyq0000.txt: NAME RIZZO R M PCN CCCCCCCCCCC -07 SERVICE FROM 20150526 THRU 20150526
MCDSDFUZ0701anyq0000.txt: NAME RIZZO R M PCN CCCCCCCCCCC -08 SERVICE FROM 20150520 THRU 20150520
MCDSDFUZ0701anyq0000.txt: NAME SAHAR Q N PCN DDDDDDDDDDD -02 SERVICE FROM 20150515 THRU 20150515
MCDSDFUZ0701anyq0000.txt: NAME SALGUERO K Y PCN EEEEEEEEEEE -01 SERVICE FROM 20150414 THRU 20150414
MCDSDFUZ0701anyq0000.txt: NAME SCOTT J PCN FFFFFFFFFFFF-05 SERVICE FROM 20150520 THRU 20150520
MCDSDFUZ0701anyq0000.txt: NAME SINH S PCN GGGGGGGGGGG -01 SERVICE FROM 20150603 THRU 20150603
MCDSDFUZ0701anyq0000.txt: NAME STONE I PCN HHHHHHHHHHH -33 SERVICE FROM 20150429 THRU 20150429
MCDSDFUZ0701anyq0000.txt: NAME STONE I PCN HHHHHHHHHHH -33 SERVICE FROM 20150429 THRU 20150429
MCDSDFUZ0701anyq0000.txt: NAME STONE I PCN HHHHHHHHHHH -37 SERVICE FROM 20150513 THRU 20150513

==================================
adaptive uber info galaxies (bigger, better, faster, and more adept than agile big data clouds)


 
Hi

Well, this is close to my estimation and my concern regarding the need for Awk stands.

[tt]grep[/tt] ( the GNU implementation at least ) can do quite a lot here :
Code:
[blue]master #[/blue] grep 'PCN \S\{12\}' MCDSDFUZ0701anyq0000.txt
NAME SCOTT J PCN FFFFFFFFFFFF-05 SERVICE FROM 20150520 THRU 20150520

[blue]master #[/blue] grep -o 'PCN \S\{12\}' MCDSDFUZ0701anyq0000.txt
PCN FFFFFFFFFFFF

[blue]master #[/blue] grep -o 'PCN \S\{12\}' MCDSDFUZ0701anyq0000.txt | grep -o '\S\{12\}'
FFFFFFFFFFFF

But as you asked in the Awk forum :
Code:
[blue]master #[/blue] awk '[teal]{[/teal][b]for[/b][teal]([/teal][navy]i[/navy][teal]=[/teal][purple]1[/purple][teal];[/teal]i[teal]<[/teal]NF[teal];[/teal]i[teal]++)[/teal][b]if[/b][teal]([/teal][navy]$i[/navy][teal]==[/teal][i][green]"PCN"[/green][/i][teal]&&[/teal][b]length[/b][teal]([/teal][navy]$(i[/navy][teal]+[/teal][purple]1[/purple][navy])[/navy][teal])>=[/teal][purple]12[/purple][teal])[/teal][b]print substr[/b][teal]([/teal][navy]$(i[/navy][teal]+[/teal][purple]1[/purple][navy])[/navy][teal],[/teal][purple]1[/purple][teal],[/teal][purple]12[/purple][teal])}[/teal]' MCDSDFUZ0701anyq0000.txt
FFFFFFFFFFFF

Tested with [tt]gawk[/tt] and [tt]mawk[/tt].


Feherke.
feherke.ga
 
Well, thanks for teaching me some things that I didn't know about grep, here in the awk forum !

==================================
adaptive uber info galaxies (bigger, better, faster, and more adept than agile big data clouds)


 
Aha! We don't have the -o option for grep in SunOS 5.10 (22 Jun 2005). YIKES !!
We will be doing a server upgrade later this year, but for now, we don't have that grep option in our version of Solaris, so thank you feherke for the awk solution as well.

==================================
adaptive uber info galaxies (bigger, better, faster, and more adept than agile big data clouds)


 
Hi

Doh. And I was about to suggest another solution, implying a single [tt]grep[/tt] call :
Code:
[blue]master #[/blue] grep -oP '(?<=PCN )\S{12}' MCDSDFUZ0701anyq0000.txt
FFFFFFFFFFFF
But PCRE is experimental even in [tt]grep[/tt], probably not found in other implementations.

By the way, negative look-behind assertion is also available in [tt]ack[/tt], but probably not popular on Unix systems :
Code:
[blue]master #[/blue] ack -o '(?<=PCN )\S{12}' MCDSDFUZ0701anyq0000.txt
FFFFFFFFFFFF


Feherke.
feherke.ga
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top