Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Awk and Pattern Match

Status
Not open for further replies.

zen2003

MIS
Jul 21, 2003
17
US
This is my data file:
UNA:+.? 'UNB+UNOA:3+CHASE:ZZ+ADSD:ZZ+030718:0103+00000000000630'UNH+0000
0000000326+CRL:2:2:UN'UCI+47000618072003+EV1:ZZ+CH:ZZ+7'UNT+3+000
00000000326'UNZ+1+00000000000630'
UNA:+.? 'UNB+UNOA:3+ss+WER:ZZ+030718:0110+00000000000631++BTS'
UNH+00000000012689+BTS:D:96A:UN'BGM+XZ8+1'DTM+137:20030718:102'LIN+1'RFF+XC3:
F2003-07-18-06.00.47.168473'RFF+AEK:03071806000002'RFF+CR:EXM 1000735AT1ZD'SE
Q++1'GIS+1'UNT+10+00000000012689'UNH+00000000012690+BTS:D:96A:UN'BGM+XZ8+2'DT
M+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806
000002'RFF+CR:EXM 1000736P139G'SEQ++1'GIS+1'UNT+10+00000000012690'UNH+0000000001
2691+BTS:D:96A:UN'BGM+XZ8+3'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18
-06.00.47.168473'RFF+AEK:03071806000001'RFF+CR:EXM 1000939I361H'SEQ++1'GIS+1'UNT
+10+00000000012691'UNH+00000000012692+BTS:D:96A:UN'BGM+XZ8+4'DTM+137:20030718
:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000003'RFF+CR:
FUS 1000312ASBCA'SEQ++1'GIS+1'UNT+10+00000000012692'UNH+00000000012693+BTS:D:
96A:UN'BGM+XZ8+5'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.1684
73'RFF+AEK:03071806000003'RFF+CR:FUS 1000313BTJRA'SEQ++1'GIS+1'UNT+10+0000000001
2693'UNH+00000000012694+BTS:D:96A:UN'BGM+XZ8+6'DTM+137:20030718:102'LIN+1'RFF
+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS 1006351A1Y
AG'SEQ++1'GIS+1'UNT+10+00000000012694'UNH+00000000012695+BTS:D:96A:UN'BGM+XZ8
+7'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:030
71806000004'RFF+CR:FUS 1006352FM3CB'SEQ++1'GIS+1'UNT+10+00000000012695'UNH+00000
000012696+BTS:D:96A:UN'BGM+XZ8+8'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-
07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS 1006353MC65A'SEQ++1'GIS+
1'UNT+10+00000000012696'UNH+00000000012697+BTS:D:96A:UN'BGM+XZ8+9'DTM+137:200
30718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RF
F+CR:FUS 1006354Q29AA'SEQ++1'GIS+1'UNT+10+00000000012697'UNH+00000000012698+BANS
TA:D:96A:UN'BGM+XZ8+10'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.4
7.168473'RFF+AEK:03071806000004'RFF+CR:FUS 1006355R175B'SEQ++1'GIS+1'UNT+10+0000
0000012698'UNH+00000000012699+BTS:D:96A:UN'BGM+XZ8+11'DTM+137:20030718:102'LI
N+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS 100
6356V353U'SEQ++1'GIS+1'UNT+10+00000000012699'UNZ+11+00000000000631'

what I need to do is get out all text that starts with - UNH
and ends with - UNT
so the script should return
UNH+0000
0000000326+CRL:2:2:UN'UCI+47000618072003+EV1:ZZ+CH:ZZ+7'UNT

UNH+00000000012689+BTS:D:96A:UN'BGM+XZ8+1'DTM+137:20030718:102'LIN+1'RFF+XC3:
F2003-07-18-06.00.47.168473'RFF+AEK:03071806000002'RFF+CR:EXM 1000735AT1ZD'SE
Q++1'GIS+1'UNT

and so on

Thanks for your help
 
Try this:
Code:
awk '/^UNH.*UNT$/{print}' path/to/datafile

Hope This Help
PH.
 
Thanks for you help. I tried it and does not return anything; What am I doing wrong.
$ awk '/^UNH.*UNT$/{print}' file1.txt
$
 
How are the lines terminated in your file1.txt ?

Hope This Help
PH.
 
something like that:

nawk -f zen.awk zen.txt

#--------------------------- zen.txt
BEGIN {
RE="UNH.*"
FS="UNT"
}

{
for(i=1; i <= NF; i++) {
if (match($i, RE))
print substr($i, RSTART, RLENGTH) FS;
}

}


vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
ooops, sorry 'bout that:

nawk -f zen.awk zen.txt

#--------------------------- zen.awk
BEGIN {
RE=&quot;UNH.*&quot;
FS=&quot;UNT&quot;
}

{
for(i=1; i <= NF; i++) {
if (match($i, RE))
print substr($i, RSTART, RLENGTH) FS;
}

}

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
It only returns a part of the string.

The result should have been
UNH+00000000012689+BTS:D:96A:UN'BGM+XZ8+1'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000002'RFF+CR:EXM 1000735AT1ZD'SE
Q++1'GIS+1'UNT

but what I get is:
UNH+00000000012689+BTS:D:96A:UN'BGM+XZ8+1'DTM+137:20030718:102'LIN+1'RFF+XC3:F2T
 
How do I write the print output to a file?
I tried print substr($i, RSTART, RLENGTH+100) >> aaa.txt

but I get an error:
syntax error The source line is 9.
The error context is
print substr($i, RSTART, RLENGTH+100) >>
awk: The statement cannot be correctly parsed.
The source line is 9.

Thanks for your help
 
Given your sample input - everything is ONE continous block of data [one line].

try this one:

BEGIN {
RE=&quot;UNH.*&quot;
FS=&quot;UNT&quot;
}

{
for(i=1; i <= NF; i++) {
if (match($i, RE))
print substr($i, RSTART) FS;
}

}


Here's what I get as output - looks fine according to your definition:

UNH+00000000000326+CRL:2:2:UN'UCI+47000618072003+EV1:ZZ+CH:ZZ+7'UNT
UNH+00000000012689+BTS:D:96A:UN'BGM+XZ8+1'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000002'RFF+CR:EXM1000735AT1ZD'SEQ++1'GIS+1'UNT
UNH+00000000012690+BTS:D:96A:UN'BGM+XZ8+2'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000002'RFF+CR:EXM1000736P139G'SEQ++1'GIS+1'UNT
UNH+00000000012691+BTS:D:96A:UN'BGM+XZ8+3'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000001'RFF+CR:EXM1000939I361H'SEQ++1'GIS+1'UNT
UNH+00000000012692+BTS:D:96A:UN'BGM+XZ8+4'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000003'RFF+CR:FUS1000312ASBCA'SEQ++1'GIS+1'UNT
UNH+00000000012693+BTS:D:96A:UN'BGM+XZ8+5'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000003'RFF+CR:FUS1000313BTJRA'SEQ++1'GIS+1'UNT
UNH+00000000012694+BTS:D:96A:UN'BGM+XZ8+6'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS1006351A1YAG'SEQ++1'GIS+1'UNT
UNH+00000000012695+BTS:D:96A:UN'BGM+XZ8+7'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS1006352FM3CB'SEQ++1'GIS+1'UNT
UNH+00000000012696+BTS:D:96A:UN'BGM+XZ8+8'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS1006353MC65A'SEQ++1'GIS+1'UNT
UNH+00000000012697+BTS:D:96A:UN'BGM+XZ8+9'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS1006354Q29AA'SEQ++1'GIS+1'UNT
UNH+00000000012698+BANSTA:D:96A:UN'BGM+XZ8+10'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS1006355R175B'SEQ++1'GIS+1'UNT
UNH+00000000012699+BTS:D:96A:UN'BGM+XZ8+11'DTM+137:20030718:102'LIN+1'RFF+XC3:F2003-07-18-06.00.47.168473'RFF+AEK:03071806000004'RFF+CR:FUS1006356V353U'SEQ++1'GIS+1'UNT


vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
to output to a file:
print substr($i, RSTART) FS >> &quot;/path2myOutputFile&quot;;

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
What I need to do is take each line and further process it.
Is it possible to assign print substr($i, RSTART) FS to a variable and then do the futher processing to this variable using a unix script?
 
#!/bin/ksh
# remove the file redirection from the awk script
#
nawk -f zen.awk zen.txt | while read theLine do
echo &quot;here is my line to process->[${theLine}]&quot;
done

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
I get error
$ new1
new1: syntax error at line 5 : `done' unexpected

File new1 is:

#!/bin/ksh

nawk -f zen.awk zen.txt | while read theLine do
echo &quot;here is my line to process->[${theLine}]&quot;
done

file zen.awk is:

BEGIN {
RE=&quot;UNH.*&quot;
FS=&quot;UNT&quot;
}

{
for(i=1; i <= NF; i++) {
if (match($i, RE))
print substr($i, RSTART) FS;
}

}

Thanks
 
sorry:

#!/bin/ksh
# remove the file redirection from the awk script
#
nawk -f zen.awk zen.txt | while read theLine
do
echo &quot;here is my line to process->[${theLine}]&quot;
done

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
vlad help!!!

Now if the line is big I get an error:
Input line UNA:+.? 'UNB+UNOA:3+ cannot be longer than 3,000 bytes.

How to solve this?
 
zen,

is that somehow related to the other thread you've started regarding the ORACLE sql? If it's , as suggested you might be better off at the Oracle forum.

Either way, I think you're hitting the limit for the length of your shell variable. There're might be ways around it, but we need to see the 'body' of this 'while read....' look.


Could you post a snippet of that loop, pls.

Before your do that run your script with the 'set -x' mode and see where it's failing.

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
No its not Oracle related.

I am executing:
awk -f a4.awk contrl0.txt

a4.awk is:
BEGIN {
RE=&quot;UNH.*&quot;
FS=&quot;UNT&quot;
}

{
for(i=1; i <= NF; i++) {
if (match($i, RE))
print substr($i, RSTART) FS ;
}

}
This is the error I get:
UNH+00000000000327+CONTRL:2:2:UN'UCI+17010619072003+XCXCXCXCXCX:ZZ+QWQWQ:ZZ+7'UNT
awk: Input line UNA:+.? 'UNB+UNOA:3+ cannot be longer than 3,000 bytes.
The input line number is 1. The file is contrl0.txt.
The source line number is 1.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top