Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

hello, Can someone help me in fi

Status
Not open for further replies.

Guest_imported

New member
Jan 1, 1970
0
hello,

Can someone help me in finding a script for the following problem: in a textfile I need to remove all the "unimportant" words. To remove these words, I've got a stoplist (stoplist.txt) with the following format:

a
a's
able
about
above
according
accordingly
across
actually
as
bit
I
little
more
of
this
through
with
you
your

All I need to do is writing a script that says: if the word in file x is found in stoplist.txt, remove this word from file x and continue. The result of this file should be saved in another file. So file "text.txt":

I have learned more through this forum as you can see above. With a little bit of your help.

should become (taking into account the stoplist):

have learned forum can see. help.

Thanks,

Lizebe



 
This is copied from a previous thread - I used it,
though I don't understand it fully - it should work OK

awk 'BEGIN{
while ((getline < &quot;stoplist.txt&quot;) > 0) { arr[ $1] = 1}
}
{
if( arr[$1] != 1) print

}' file.txt > newfile.txt

You may have to switch stoplist.txt and file.txt around
DB ;-) Dickie Bird
db@dickiebird.freeserve.co.uk
 
Hi,

I'm afraid the script does not work. I'm not getting a correct output. Allow me to rephrase the problem:

for every word in a text (file.txt),
if this word appears to be in the stoplist (stoplist.txt),
then remove this word from the original text,
if this word does not appear in the stoplist,
continue
print the output

The result of this procedure should be a new textfile with all the words - that were found in stoplist.txt - removed.

Greetings,

Lizebé
 
DB just gave a sample to get you started. Try out -

awk '
BEGIN{
while ((getline < &quot;stoplist.txt&quot;) > 0) { arr[$1]++ }
}
{
for( i=1; i<=NF; i++ )
if( arr[$i] < 1 ) printf(&quot;%s%s&quot;, $i, (i<NF? &quot; &quot; : &quot;\n&quot;))
}' < file.txt > newfile.txt Cheers,
ND [smile]

bigoldbulldog@hotmail.com
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top