Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Awk and Pattern Match

Status
Not open for further replies.

zen2003

MIS
Jul 21, 2003
17
US
This is my data file:
UNA:+.? 'UNB+UNOA:3+CHASE:ZZ+ADSD:ZZ+030718:0103+00000000000630'UNH+0000
0000000326+CRL:2:2:UN'UCI+47000618072003+EV1:ZZ+CH:ZZ+7'UNT+3+000
00000000326'UNZ+1+00000000000630'
UNA:+.? 'UNB+UNOA:3+ss+WER:ZZ+030718:0110+00000000000631++BTS'
UNH+00000000012689+BTS:D:96A:UN'BGM+XZ8+1'DTM+137:20030718:102'LIN+1'RFF+XC3:
F2003-07-18-06.00.47.168473'RFF+AEK:03071806000002'RFF+CR:EXM 1000735AT1ZD'SE
Q++1'GIS+1'UNT+10+00000000012689'UNH+00000000012690+BTS:D:96A:UN'BGM+XZ8+2'DT
M+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806
000002'RFF+CR:EXM 1000736P139G'SEQ++1'GIS+1'UNT+10+00000000012690'UNH+0000000001
2691+BTS:D:96A:UN'BGM+XZ8+3'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18
-06.00.47.168473'RFF+AEK:03071806000001'RFF+CR:EXM 1000939I361H'SEQ++1'GIS+1'UNT
+10+00000000012691'UNH+00000000012692+BTS:D:96A:UN'BGM+XZ8+4'DTM+137:20030718
:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000003'RFF+CR:
FUS 1000312ASBCA'SEQ++1'GIS+1'UNT+10+00000000012692'UNH+00000000012693+BTS:D:
96A:UN'BGM+XZ8+5'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.1684
73'RFF+AEK:03071806000003'RFF+CR:FUS 1000313BTJRA'SEQ++1'GIS+1'UNT+10+0000000001
2693'UNH+00000000012694+BTS:D:96A:UN'BGM+XZ8+6'DTM+137:20030718:102'LIN+1'RFF
+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS 1006351A1Y
AG'SEQ++1'GIS+1'UNT+10+00000000012694'UNH+00000000012695+BTS:D:96A:UN'BGM+XZ8
+7'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:030
71806000004'RFF+CR:FUS 1006352FM3CB'SEQ++1'GIS+1'UNT+10+00000000012695'UNH+00000
000012696+BTS:D:96A:UN'BGM+XZ8+8'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-
07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS 1006353MC65A'SEQ++1'GIS+
1'UNT+10+00000000012696'UNH+00000000012697+BTS:D:96A:UN'BGM+XZ8+9'DTM+137:200
30718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RF
F+CR:FUS 1006354Q29AA'SEQ++1'GIS+1'UNT+10+00000000012697'UNH+00000000012698+BANS
TA:D:96A:UN'BGM+XZ8+10'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.4
7.168473'RFF+AEK:03071806000004'RFF+CR:FUS 1006355R175B'SEQ++1'GIS+1'UNT+10+0000
0000012698'UNH+00000000012699+BTS:D:96A:UN'BGM+XZ8+11'DTM+137:20030718:102'LI
N+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS 100
6356V353U'SEQ++1'GIS+1'UNT+10+00000000012699'UNZ+11+00000000000631'

what I need to do is get out all text that starts with - UNH
and ends with - UNT
so the script should return
UNH+0000
0000000326+CRL:2:2:UN'UCI+47000618072003+EV1:ZZ+CH:ZZ+7'UNT

UNH+00000000012689+BTS:D:96A:UN'BGM+XZ8+1'DTM+137:20030718:102'LIN+1'RFF+XC3:
F2003-07-18-06.00.47.168473'RFF+AEK:03071806000002'RFF+CR:EXM 1000735AT1ZD'SE
Q++1'GIS+1'UNT

and so on

Thanks for your help
 
if you're on Sun/Solaris, use 'nawk' istead of plain-old-broken 'awk'.

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Try this:
Code:
BEGIN {
  RE=&quot;UNH.*&quot;
  RS=FS=&quot;UNT&quot;
}
{
  for(i=1;i<= NF;i++)
    if (match($i, RE))
      printf &quot;%s%s\n&quot;,substr($i,RSTART),FS
}

Hope This Help
PH.
 
It does not come back with anything:

$ awk -f a5.awk contrl0.txt
+ awk -f a5.awk contrl0.txt
$

a5.awk is:
$ cat a5.awk
+ cat a5.awk
BEGIN {
RE=&quot;UNH.*&quot;
RS=FS=&quot;UNT&quot;
}
{
for(i=1;i<= NF;i++)
if (match($i, RE))
printf &quot;%s%s\n&quot;,substr($i,RSTART),FS>>&quot;p.txt&quot;
}
 
PHV,
that's a better idea to break down input by the 'UNT' records. It might work, but..... zen might also be hitting awk's limits that are RECORD-based.

Zen, if PHV's ssuggestion doesn't fix your problem, can you get your hands on either POSIX-compliant awk OR on gawk?

I'm not quite sure what HPUX is being shipped with.

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Nope, Can not get to either. I think I am hitting the awk's limits that are RECORD-based.If I reduce the amount of text then it works fine. Any ideas of how I can workaround this limit?
 
fold does not work as the record may be a variable lenght.
What I am trying to do is insert a newline before 'UNH'
This substitution works
sed 's/UNH/*!!@UNH/' contrl0.txt

What I want to do is
sed 's/UNH/geta newline char hereUNH/' contrl0.txt
I tried
sed 's/UNH/\\nUNH/' contrl0.txt and a few other combinations but it does not work

Can this be done?

Thanks

 
Try something like this:
Code:
sed 's/UNH/@UNH/' contrl0.txt | tr '@' '\012'

Hope This Help
PH.
 
Try

sed 's/UNH/\^JUNH/' contrl0.txt

where ^J is Ctrl V Ctrl J
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top