Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

locate text string - locate different text string -write and repeat 2

Status
Not open for further replies.

keusch

Technical User
Jun 17, 2001
41
US
Can this be done using awk?
find a text string
1st search text string is: Plugback
once found - drop down two lines
then write to a file all the lines that occur between that point and one line above new search string
2nd search string is: -----
(5 dashes)
repeat until end of file.

I used to do this in vi with a macro - files are getting too large and need to be automated.

The results should look like this:
5035214750000 002 BRPG 9366 9400 8488
30045070230000 001 SQZD 6190 6244 5800
30045071110000 001 SQZD 5686 5714
30045071110000 002 SQZD 5750 5808

Here is a sample dataset:

25035214750000 -------------------- Production Tests ----------------------------
25035214750000 Top Base Top Base Oil Prod Test
25035214750000 Test Form Form Depth Depth Choke GORGrav Method Method
25035214750000 001 602CBNK 602CBNK 8775 8785 PERF UNDESIGNATED
25035214750000 002 353SRVR 353SRVR 9366 9400 PERF UNDESIGNATED
25035214750000 Production Shutoff ----------------------------
25035214750000 Shutoff Top BasePlugback
25035214750000 Test Type Depth Depth Depth
25035214750000 002 BRPG 9366 9400 8488
25035214750000 Production Volume ----------------------------
25035214750000 Oil Cond Gas Wtr
25035214750000 Test Amount Unit Desc Amount Unit Desc Amount Unit Desc Amount Unit Desc
25035214750000 001
25035214750000 002
25035214750000 Production Treatment ----------------------------
25035214750000 Test Top Base Volume MeasAmount T/P PSI Inj Type Nbr Agent Add
25035214750000 002 9366 9400 ACID
25035214750000 Production Perforation ----------------------------
25035214750000 Test Top Base Type Method Top Form Base Form Status Count Density Per
25035214750000 001 8775 8785 PERF 602CBNK 602CBNK
25035214750000 002 9366 9400 PERF 353SRVR 353SRVR
25035214750000
25035214750000 -------------------- Formations -----------------------------
30045070230000 Top Base Top Base Oil Prod Test
30045070230000 Test Form Form Depth Depth Choke GORGrav Method Method
30045070230000 001 602DKOT 602DKOT 6190 6244 48/64 PERF FLOWING
30045070230000 Production Shutoff ----------------------------
30045070230000 Shutoff Top BasePlugback
30045070230000 Test Type Depth Depth Depth
30045070230000 001 SQZD 6190 6244 5800
30045070230000 Production Volume ----------------------------
30045070230000 Oil Cond Gas Wtr
30045070230000 Test Amount Unit Desc Amount Unit Desc Amount Unit Desc Amount Unit Desc
30045070230000 001 250 MCFD
30045070230000 -------------------- Formations -----------------------------
30045071110000 Top Base Top Base Oil Prod Test
30045071110000 Test Form Form Depth Depth Choke GORGrav Method Method
30045071110000 001 602DKOT 602DKOT 5686 5714 PERF SWABBING
30045071110000 002 602DKOT 602DKOT 5750 5808 PERF SWABBING
30045071110000 Production Shutoff ----------------------------
30045071110000 Shutoff Top BasePlugback
30045071110000 Test Type Depth Depth Depth
30045071110000 001 SQZD 5686 5714
30045071110000 002 SQZD 5750 5808
30045071110000 Production Volume ----------------------------
30045071110000 Oil Cond Gas Wtr

CAN YOU HELP???
Thanks,
Keusch
 
Keusch:

This works, but it's not particularly nice. Turn printing on when the 4th field is BasePlugback and turn it off when the 4th field is dashes.

You'd like to be able to next two lines after turning printing on, but, unfortunately, awk just skips the second next if you try something like this:
if($4 == "BasePlugback")
{
p=1;
next;
next;
}

Once the printing flag was on, I choose to print only if the second field is numeric.

Regards,

Ed
Schaefer


# used Solaris 7 ksh and nawk
nawk ' BEGIN { p=0; }
{
if($4 == "BasePlugback")
{
p=1;
next;
}


if($4 ~ "----------------------------")
p=0

# print if flag on and 2nd field if numeric
if(p == 1 && $2 ~ /^[0-9]+$/)
printf("%s\n", $0)

} ' < data.file > new.file
 
If the files are truly huge then these are two good sed solutions. The first is about twice as fast (in this case) as awk and the second even faster.

A.
sed -n '
/Plugback/{
N
:dashes
N
s/.*\n//
/----------------------------/b
p
b dashes
}
' < data.file > new.file

B.
sed '
/Plugback/,/----------------------------/!{
d
}
/Plugback/{
N
d
}
/----------------------------/d
' < data.file > new.file

Cheers,
ND [smile]
 
This query has been posted before many times in
one form or another.

In addition to the solutions offered something
like this will work.

#!/bin/sh
context() {
filename=$1
pattern=$2

nums=`grep -n $pattern $filename | sed 's/:.*//'
for all in $(echo $nums);
do
awk -v x=$all ' {
if (NR >= (x - 1) && NR <= (x + 2)) {
print $0
}
}' $filename
done
return $?
}
 
Marsd:

I'm always interested in learning something new. I understand you've created function context(). A small typo: you forgot the backtick when determining the num count.

How do you invoke it? I've tried:

context data.file &quot;BasePlugback -----&quot;

and it doesn't work the way I expect.

Any comment is appreciated.

Regards,


Ed
Schaefer
 
Thanks to each of you who responded. Since I'm a hold out from a VI macro, I was looking to approach the problem the way ND (bigoldbulldog) resolved it with SED logic.
However, since this is a learning environment and OldEd reponded 1st, I rewrote the script using his suggestions. I did make a few changes. The dashes were not always in $4 so that became $NF and I took out the formating in the print statment.
Thanks again
Keusch
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top