Awk and Pattern Match

zen2003 · Jul 21, 2003

This is my data file:
UNA:+.? 'UNB+UNOA:3+CHASE:ZZ+ADSD:ZZ+030718:0103+00000000000630'UNH+0000
0000000326+CRL:2:2:UN'UCI+47000618072003+EV1:ZZ+CH:ZZ+7'UNT+3+000
00000000326'UNZ+1+00000000000630'
UNA:+.? 'UNB+UNOA:3+ss+WER:ZZ+030718:0110+00000000000631++BTS'
UNH+00000000012689+BTS

:96A:UN'BGM+XZ8+1'DTM+137:20030718:102'LIN+1'RFF+XC3:
F2003-07-18-06.00.47.168473'RFF+AEK:03071806000002'RFF+CR:EXM 1000735AT1ZD'SE
Q++1'GIS+1'UNT+10+00000000012689'UNH+00000000012690+BTS

:96A:UN'BGM+XZ8+2'DT
M+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806
000002'RFF+CR:EXM 1000736P139G'SEQ++1'GIS+1'UNT+10+00000000012690'UNH+0000000001
2691+BTS

:96A:UN'BGM+XZ8+3'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18
-06.00.47.168473'RFF+AEK:03071806000001'RFF+CR:EXM 1000939I361H'SEQ++1'GIS+1'UNT
+10+00000000012691'UNH+00000000012692+BTS

:96A:UN'BGM+XZ8+4'DTM+137:20030718
:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000003'RFF+CR:
FUS 1000312ASBCA'SEQ++1'GIS+1'UNT+10+00000000012692'UNH+00000000012693+BTS

:
96A:UN'BGM+XZ8+5'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.1684
73'RFF+AEK:03071806000003'RFF+CR:FUS 1000313BTJRA'SEQ++1'GIS+1'UNT+10+0000000001
2693'UNH+00000000012694+BTS

:96A:UN'BGM+XZ8+6'DTM+137:20030718:102'LIN+1'RFF
+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS 1006351A1Y
AG'SEQ++1'GIS+1'UNT+10+00000000012694'UNH+00000000012695+BTS

:96A:UN'BGM+XZ8
+7'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:030
71806000004'RFF+CR:FUS 1006352FM3CB'SEQ++1'GIS+1'UNT+10+00000000012695'UNH+00000
000012696+BTS

:96A:UN'BGM+XZ8+8'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-
07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS 1006353MC65A'SEQ++1'GIS+
1'UNT+10+00000000012696'UNH+00000000012697+BTS

:96A:UN'BGM+XZ8+9'DTM+137:200
30718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RF
F+CR:FUS 1006354Q29AA'SEQ++1'GIS+1'UNT+10+00000000012697'UNH+00000000012698+BANS
TA

:96A:UN'BGM+XZ8+10'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.4
7.168473'RFF+AEK:03071806000004'RFF+CR:FUS 1006355R175B'SEQ++1'GIS+1'UNT+10+0000
0000012698'UNH+00000000012699+BTS

:96A:UN'BGM+XZ8+11'DTM+137:20030718:102'LI
N+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS 100
6356V353U'SEQ++1'GIS+1'UNT+10+00000000012699'UNZ+11+00000000000631'

what I need to do is get out all text that starts with - UNH
and ends with - UNT
so the script should return
UNH+0000
0000000326+CRL:2:2:UN'UCI+47000618072003+EV1:ZZ+CH:ZZ+7'UNT

UNH+00000000012689+BTS

:96A:UN'BGM+XZ8+1'DTM+137:20030718:102'LIN+1'RFF+XC3:
F2003-07-18-06.00.47.168473'RFF+AEK:03071806000002'RFF+CR:EXM 1000735AT1ZD'SE
Q++1'GIS+1'UNT

and so on

Thanks for your help

PHV · Jul 21, 2003

Try this:

Code:

awk '/^UNH.*UNT$/{print}' path/to/datafile

Hope This Help
PH.

zen2003 · Jul 21, 2003

Thanks for you help. I tried it and does not return anything; What am I doing wrong.
$ awk '/^UNH.*UNT$/{print}' file1.txt
$

PHV · Jul 21, 2003

How are the lines terminated in your file1.txt ?

Hope This Help
PH.

zen2003 · Jul 21, 2003

Its a continous block of text

vgersh99 · Jul 21, 2003

something like that:

nawk -f zen.awk zen.txt

#--------------------------- zen.txt
BEGIN {
RE="UNH.*"
FS="UNT"
}

{
for(i=1; i <= NF; i++) {
if (match($i, RE))
print substr($i, RSTART, RLENGTH) FS;
}

}

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

vgersh99 · Jul 21, 2003

ooops, sorry 'bout that:

nawk -f zen.awk zen.txt

#--------------------------- zen.awk
BEGIN {
RE="UNH.*"
FS="UNT"
}

{
for(i=1; i <= NF; i++) {
if (match($i, RE))
print substr($i, RSTART, RLENGTH) FS;
}

}

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

zen2003 · Jul 21, 2003

It only returns a part of the string.

The result should have been
UNH+00000000012689+BTS

:96A:UN'BGM+XZ8+1'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000002'RFF+CR:EXM 1000735AT1ZD'SE
Q++1'GIS+1'UNT

but what I get is:
UNH+00000000012689+BTS

:96A:UN'BGM+XZ8+1'DTM+137:20030718:102'LIN+1'RFF+XC3:F2T

zen2003 · Jul 21, 2003

How do I write the print output to a file?
I tried print substr($i, RSTART, RLENGTH+100) >> aaa.txt

but I get an error:
syntax error The source line is 9.
The error context is
print substr($i, RSTART, RLENGTH+100) >>
awk: The statement cannot be correctly parsed.
The source line is 9.

Thanks for your help

vgersh99 · Jul 21, 2003

Given your sample input - everything is ONE continous block of data [one line].

try this one:

BEGIN {
RE="UNH.*"
FS="UNT"
}

{
for(i=1; i <= NF; i++) {
if (match($i, RE))
print substr($i, RSTART) FS;
}

}

Here's what I get as output - looks fine according to your definition:

UNH+00000000000326+CRL:2:2:UN'UCI+47000618072003+EV1:ZZ+CH:ZZ+7'UNT
UNH+00000000012689+BTS

:96A:UN'BGM+XZ8+1'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000002'RFF+CR:EXM1000735AT1ZD'SEQ++1'GIS+1'UNT
UNH+00000000012690+BTS

:96A:UN'BGM+XZ8+2'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000002'RFF+CR:EXM1000736P139G'SEQ++1'GIS+1'UNT
UNH+00000000012691+BTS

:96A:UN'BGM+XZ8+3'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000001'RFF+CR:EXM1000939I361H'SEQ++1'GIS+1'UNT
UNH+00000000012692+BTS

:96A:UN'BGM+XZ8+4'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000003'RFF+CR:FUS1000312ASBCA'SEQ++1'GIS+1'UNT
UNH+00000000012693+BTS

:96A:UN'BGM+XZ8+5'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000003'RFF+CR:FUS1000313BTJRA'SEQ++1'GIS+1'UNT
UNH+00000000012694+BTS

:96A:UN'BGM+XZ8+6'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS1006351A1YAG'SEQ++1'GIS+1'UNT
UNH+00000000012695+BTS

:96A:UN'BGM+XZ8+7'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS1006352FM3CB'SEQ++1'GIS+1'UNT
UNH+00000000012696+BTS

:96A:UN'BGM+XZ8+8'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS1006353MC65A'SEQ++1'GIS+1'UNT
UNH+00000000012697+BTS

:96A:UN'BGM+XZ8+9'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS1006354Q29AA'SEQ++1'GIS+1'UNT
UNH+00000000012698+BANSTA

:96A:UN'BGM+XZ8+10'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS1006355R175B'SEQ++1'GIS+1'UNT
UNH+00000000012699+BTS

:96A:UN'BGM+XZ8+11'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS1006356V353U'SEQ++1'GIS+1'UNT

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

vgersh99 · Jul 21, 2003

to output to a file:
print substr($i, RSTART) FS >> "/path2myOutputFile";

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

zen2003 · Jul 21, 2003

Thanks a lot. it works

zen2003 · Jul 21, 2003

What I need to do is take each line and further process it.
Is it possible to assign print substr($i, RSTART) FS to a variable and then do the futher processing to this variable using a unix script?

vgersh99 · Jul 21, 2003

#!/bin/ksh
# remove the file redirection from the awk script
#
nawk -f zen.awk zen.txt | while read theLine do
echo "here is my line to process->[${theLine}]"
done

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

zen2003 · Jul 21, 2003

I get error
$ new1
new1: syntax error at line 5 : `done' unexpected

File new1 is:

#!/bin/ksh

nawk -f zen.awk zen.txt | while read theLine do
echo "here is my line to process->[${theLine}]"
done

file zen.awk is:

BEGIN {
RE="UNH.*"
FS="UNT"
}

{
for(i=1; i <= NF; i++) {
if (match($i, RE))
print substr($i, RSTART) FS;
}

}

Thanks

vgersh99 · Jul 21, 2003

sorry:

#!/bin/ksh
# remove the file redirection from the awk script
#
nawk -f zen.awk zen.txt | while read theLine
do
echo "here is my line to process->[${theLine}]"
done

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

zen2003 · Jul 21, 2003

Thanks a lot it works now

zen2003 · Jul 25, 2003

vlad help!!!

Now if the line is big I get an error:
Input line UNA:+.? 'UNB+UNOA:3+ cannot be longer than 3,000 bytes.

How to solve this?

vgersh99 · Jul 25, 2003

zen,

is that somehow related to the other thread you've started regarding the ORACLE sql? If it's , as suggested you might be better off at the Oracle forum.

Either way, I think you're hitting the limit for the length of your shell variable. There're might be ways around it, but we need to see the 'body' of this 'while read....' look.

Could you post a snippet of that loop, pls.

Before your do that run your script with the 'set -x' mode and see where it's failing.

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

zen2003 · Jul 25, 2003

No its not Oracle related.

I am executing:
awk -f a4.awk contrl0.txt

a4.awk is:
BEGIN {
RE="UNH.*"
FS="UNT"
}

{
for(i=1; i <= NF; i++) {
if (match($i, RE))
print substr($i, RSTART) FS ;
}

}
This is the error I get:
UNH+00000000000327+CONTRL:2:2:UN'UCI+17010619072003+XCXCXCXCXCX:ZZ+QWQWQ:ZZ+7'UNT
awk: Input line UNA:+.? 'UNB+UNOA:3+ cannot be longer than 3,000 bytes.
The input line number is 1. The file is contrl0.txt.
The source line number is 1.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Awk and Pattern Match

MIS

MIS

MIS

MIS

MIS

Programmer

Programmer

MIS

MIS

Programmer

Programmer

MIS

MIS

Programmer

MIS

Programmer

MIS

MIS

Programmer

MIS

Similar threads

Log in

Part and Inventory Search

Sponsor