Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

printing columns

Status
Not open for further replies.

Guest_imported

New member
Jan 1, 1970
0
Hello,

I want to search for a word in a file but I also want a clean output. So if I search for the word 'the' in a file I don't just want the records but I want something like this:

the guy thinks she's crazy
He's trying to hit the ball
I think the doctor is coming

Can somebody help me with that?
Thanx
Col81
 

awk -f Format.awk -v WORD=the file

--- Format.awk ---
BEGIN {
pattern = "(^|" IFS ")" WORD "($|" IFS ")" ;
}
$0 ~ pattern {
print " ",$0 ;
next ;
}
{
print
}
--- End of Format.awk --- Jean Pierre.
 
I'm sorry, the thread didn't look like I wanted to. I want the words 'the' all nicely under each other so:

aaa the bbbbb
the ccc
ddddd the e
(maybe it still doesn't look right)

I don't know what went wrong but aigles' script doesn't do anything with my data file.

Col81
 
Hi,

In the previous script replace IFS by FS (IFS if for ksh not for awk).
So, in all cases the script doesn't format the output like you want.


What must the script do ?

For each input line
Find the word position
If word in line then memorize line, position, mawimum position
end

For all memorized lines
Compute spaces to print before line text
Print spaces and line
end

Find the word position in the input line:

With gawk, no problem :
[tt]
pattern = &quot;\\<&quot; WORD &quot;\\>&quot; ; # WORD is word
pos = match($0, pattern) ; # Pos != 0 if found
[/tt]
In others awk implementation :
[tt]
pattern1 = FS WORD (^|&quot; FS &quot;)&quot; ; # WORD in middle/end of line
pattern2 = &quot;^&quot; WORD (^|&quot; FS &quot;)&quot; ; # WORD at start of line
pos = match($0, pattern1) ; # Search WORD in the line
if (pos > 0) # If found
pos += 1 ; # skip FS
else # else
pos = match($0, pattern2) ; # search WORD at start of line
[/tt]

Memorize line, word position, maximum position:
[tt]
if (pos != 0) { # If found
Line[++Cnt] = $0 ; # Memorize line (Cnt=line count)
Pos[Cnt] = pos ; # Memorize word position in line
if (pos > MaxPos) # Set max position
MaxPos = pos ; #
} #
[/tt]
Compute spaces to print before line text:
[tt]
for (l=1; l<= MaxPos; l++) # Build a the maximum spaces string
spaces = spaces &quot; &quot; ; #
SpacesCnt = MaxPos-Pos[l] ; # Space count to print for line l
[/tt]
Print spaces and line:
[tt]
printf substr(spaces,1,MaxPos-Pos[l]) ;
print Line[l] ;
[/tt]

Now, the complete script :

--- AlignWord.awk (gawk version) [COLOR=/---[/color]
BEGIN {
pattern = &quot;\\<&quot; WORD &quot;\\>&quot; ;
}
{
pos = match($0, pattern) ;
if (pos != 0) {
Line[++Cnt] = $0 ;
Pos[Cnt] = pos ;
if (pos > MaxPos)
MaxPos = pos ;
}
}
END {
for (i=1; i<= MaxPos; i++)
spaces = spaces &quot; &quot; ;
for (l=1; l<=Cnt; l++) {
printf substr(spaces,1,MaxPos-Pos[l]) ;
print Line[l] ;
}
}
--- End of AlignWord.awk ---

awk -v WORD=the -f AlignWord.awk file

Jean Pierre.
 
Thank you aigles, your script works allright.
Probably you also know how I can just get the left and the right context into a variable, say left and right, in order to use a printf-statement:

printf(&quot;%30s %s %s\n&quot;, left, pattern, right)

I know your solution works very well but I want to know whether I could deal with it this way.

col81
 
Hi,

First a correction ...

In others awk implementation :

pattern1 = FS WORD ($|&quot; FS &quot;)&quot; ; # WORD in middle of end of line
pattern2 = &quot;^&quot; WORD ($|&quot; FS &quot;)&quot; ; # WORD at start of line


Now, another version of the script using a function that search for a WORD and return its position and the left (RLEFT) and right (RGIHT) context.

When calling the script, defines the variables :
WORD = Word to search for
COL = Col where to print the WORD

--- AlignWord2.awk (gawk version) [COLOR=/---[/color]
BEGIN {
if (WORD == &quot;&quot;) exit ;
if (COL == 0 ) COL = 10 ;
COL -= 1 ;
}
function WordMatch(String, Word, ere1, ere2, wpos, wlen) {
ere1 = FS Word &quot;($|&quot; FS &quot;)&quot; ;
ere2 = &quot;^&quot; Word &quot;($|' FS &quot;)&quot; ;
wlen = length(Word) ;
wpos = match(String, ere1) ;
if (wpos != 0) {
wpos += 1 ;
else
wpos = match(String, ere2) ;
if (wpos > 0) {
RLEFT = substr(String, 1, wpos-1) ;
RRIGHT = substr(String, wpos+wlen) ;
} else {
RLEFT = $0 ;
RRIGHT = &quot;&quot; ;
}
RSTART = wpos ;
RLENGTH = wlen ;
return wpos :
}
if (WordMatch($0, WORD) > 0) {
RLEFT = substr(RLEFT, RSTART-COL) ;
printf &quot;%*s%s%s\n&quot;,COL,RLEFT,WORD,RRIGHT ;
}
}
--- End of AlignWord.awk ---

awk -v WORD=the -v COL=12 -f AlignWord2.awk file
Jean Pierre.
 
Hi aigles,

can I bother you one last time with this?
Is it possible to adapt your script so that I can specify the number of words in the left and right contexts? (so even involve the previous and the next record).

input:

aaaa bbb cc ddd eee
fff g the hh ii jjj
lll mm

output (-v context=4):

ddd eee fff g the hh ii jjj lll

TIAcol81
 
You need another logic to do that because of lookahead.

The file is read word by word and not line by line using the LwrGetWord function. You get the left and right contexts by calling LwrGetLeft and LwrGetRight.

--- Lwr.awk ---
BEGIN {
if (WORD == &quot;&quot;) exit ;
if (COL == 0 ) COL = 20 ;

if (CONTEXT == 0 ) CONTEXT = 3 ;
LwrLen = CONTEXT ;
LwrSiz = CONTEXT * 2 + 1 ;
LwrCnt = 0 ;
LwrWrd = 0 ;
LwrFld = 0 ;
}

function LwrGetInput( p) {
while (1) {
if (++LwrFld > NF) {
if (getline <= 0)
$0 = &quot;&quot; ;
LwrFld = 1 ;
}
p = LwrCnt++ % LwrSiz ;
LwrCtx[p] = $LwrFld ;
if (LwrCnt > LwrLen) break ;
}
}

function LwrGetWord( p) {
LwrGetInput() ;
p = LwrWrd++ % LwrSiz ;
return LwrCtx[p] ;
}

function LwrGetLeft( i,lp,lc) {
for (i=LwrLen; i>0; i--) {
lp = (LwrWrd - i + LwrSiz -1) % LwrSiz ;
lc = lc &quot; &quot; LwrCtx[lp] ;
}
sub(&quot;^ *&quot;,&quot;&quot;,lc) ;
return lc
}

function LwrGetRight( i,rp,rc) {
for (i=1; i<=LwrLen; i++) {
rp = (LwrWrd + i - 1) % LwrSiz ;
rc = rc LwrCtx[rp] &quot; &quot;;
}
sub(&quot; *$&quot;,&quot;&quot;,rc) ;
return rc
}

{
ll = COL - 1 ;
W = LwrGetWord() ;
while (W != &quot;&quot;) {
if (W == WORD) {
left = LwrGetLeft() &quot; &quot; ;
right = &quot; &quot; LwrGetRight() ;
lleft = length(left)
if (lleft > ll)
left = substr(left, lleft-ll+1) ;
printf &quot;%*s%s%s\n&quot;,ll,left,W,right;
}
W = LwrGetWord() ;
}
exit ;
}
--- End of Lwr.awk ---

Call the awk script with the variable définitions :
-v WORD=word the word you are looking for
-v COL=col column where to align the word (def=20)
-c CONTEXT=context size (in words) of left and right contexts

awk -f Lwr.awk -v WORD=the -v CONTEXT=4 file Jean Pierre.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top