Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How do I remove stop words from a file?

Status
Not open for further replies.

demisheep

Programmer
May 8, 2012
1
US
I am using the following example from Lingua::StopWords:

Code:
use Lingua::StopWords qw( getStopWords );
my $stopwords = getStopWords('en');

my @words = qw( i am the walrus goo goo g'joob );

# prints "walrus goo goo g'joob"
print join ' ', grep { !$stopwords->{$_} } @words;

How do I get it to use my $document, remove stopwords and print the results to a file? See my code here:

Code:
open(FILESOURCE, "sample.txt") or die("Unable to open requested file.");
my $document = <FILESOURCE>;
close (FILESOURCE);

open(TEST, "results_stopwords.txt") or die("Unable to open requested file.");

use Lingua::StopWords qw( getStopWords );
my $stopwords = getStopWords('en');

print join ' ', grep { !$stopwords->{$_} } $document;

I tried these variations:

print join ' ', grep { !$stopwords->{$_} } TEST;


print TEST join ' ', grep { !$stopwords->{$_} } @words;

Basically, how do I read in a document, remove the stop words and then write the result to a new file?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top