Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

problem with arrays

Status
Not open for further replies.

jupiler

Technical User
Oct 1, 2002
14
BE
Hello,

While working on the previous script, I encountered a new problem. Given a file ("termlist.txt") and an inputfile ("textfile.txt" --> one sentence = one line), I need to write a script that states:

If a pattern in sentence X of "textfile.txt" matches a term in file "termlist.txt", print that sentence X to new file ("newfile.txt"). The problem however is the fact that a term can consist of more than one word. Hopefully, the examples will show you what I want to do.

Given "textfile.txt":

John went to the local pub
He saw a man sitting in the corner
The man gave him a chair
John drank a pint
He paid his drink, the value added tax included, and got out

Given "termlist.txt":

local pub
drink
value added tax

Gawk should only select:

John went to the LOCAL PUB
He paid his DRINK, the VALUE ADDED TAX included, and got out

The patterns need to be marked in the text as well. Can someone help me with this? Thanks,

Jupiler
 
something like that:

nawk -f term.awk textfile.txt

#------------- term.awk
BEGIN {
termlist="termlist.txt"

while (getline < termlist > 0) {
# skip comments in the config file
if ( $0 ~ /^[#].*/ ) continue;
termsARR[$0];
}
close(termlist);


}

{
found=0;
for(ind in termsARR)
if (match($0, ind)) {
found++;
gsub(ind, toupper(ind), $0);
}
if (found) print;
}

vlad
+---------------------------+
|#include<disclaimer.h> |
+---------------------------+
 
function strsearch(str,mat,cnt,alist, i) {
if (cnt >= length(str)) {
if (alist) {
return alist
} else {
return
}
}

if (substr(str,cnt,length(mat)) != mat) {
#print &quot;Not matched: &quot;, substr(str,cnt,length(mat)), &quot;at&quot;, cnt
i = cnt + 1
return strsearch(str,mat,i,alist)
} else if (substr(str,cnt,length(mat)) == mat) {
#print &quot;Matched: &quot;, substr(str,cnt,length(mat)) ,&quot;at&quot;, cnt
i = cnt + 1
alist = length(alist) < 2 ? substr(str,cnt,length(mat)) : alist&quot; &quot;substr(str,cnt,length(mat))
return strsearch(str,mat,i,alist)
}
}



BEGIN {
while ((getline < termlist) > 0) {
arr[a++] = $0
}
close(termlist)
}
{
for (x in arr) {
#print &quot;searching for : &quot;, arr[x]
all = strsearch($0,arr[x],1,&quot;0&quot;)
if (all) {
allx = toupper(all1)
sub(all,allx,$0)
}
print $0 > filename3
}
}

This may work in some pathological cases where match()
fails.

Good Luck.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top