Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Perl: split sentences into words

Status
Not open for further replies.

diera

Programmer
Mar 21, 2011
28
DE
Hi,

i want to split the sentences into words and count its frequency. i have tried using the code below but it doesn't worked. This code actually from the text book.

----Data----
If milk goes bad if not refrigerated
Jones in rush to contribute for Bears (AP): Kevin Jones burst through the seam, sidestepped a defender as he cut.. so tired ugghhh. . . .#photog #photography
@RWZombie I am attending your Detroit show on 11/27! I want to hear the best of the new stuff. What and Sick Bubblegum plus many classics!
B & B - Vallejo & Napa Valley @onlylilia Annual Holiday party in Dec. Planning now. A concert, wine tasting and Hors d'Oeuvres
Real estate agents, worldwide. Expand your listings under contract, worldwide.\
Costa Rica, small country incredible biodiversity: tropical rain forests, dense cloud forests, active volcanoes. @Teerlink biking>gardening>schoolwork
@JSRWeek In RB, Molly Pitcher has a tasty brunch & I've heard good things about 2Senza too. In AP, Yvonne's Cafe is awesome!
# Check This Video --- Best Real Estate Deals,Hallandale Beach, FL 33009 another top XeeSM user - cool blog
@Fashion_iLIKE thanks for putting me on your special FF list ; ) Ages: 5 to 7 years at Familia! Los quiero a todo! I want to send a special shout to all the Boricuas around the world & those that made the trip to NYC this wknd!
Chabad.org goes mobile:

Code:
use strict;
use warnings;

open( IN, 'sentences.txt' ) or die;

open( OUT, ">word.csv" ) or die;

while (<IN>) {
    chomp;

    $- = lc;             #  Convert to lowercase
    s/[.,:;?"!()]//g;    #  Remove most punctuation
    s/--//g;             #  Remove dashes
    s/  +/  /g;          #  Replace multiple spaces by one space

    if ( not /^$/ ) {    #  Ignore empty lines
        @words = split(/  /);

        foreach $x (@words) {
            ++$tokens;
            ++$freq{$x};
        }
        $types = scalar keys %freq;
        $ratio = $tokens / $types;
        print OUT "$tokens, $types, $ratio\n";

    }
}

Any help is much appreciated. Thank You
 
You're code is overly complicated. The following will do the same thing:

Code:
use Data::Dumper;

use strict;
use warnings;

my %unique;

while (<DATA>) {
	while (/(\w+)/g) {
		$unique{lc $1}++;
	}
}

print Dumper(\%unique);

__DATA__
If milk goes bad if not refrigerated
Jones in rush to contribute for Bears (AP): Kevin Jones burst through the seam, sidestepped a defender as he cut.. [URL unfurl="true"]http://bit.ly/b03Bq[/URL]
so tired ugghhh. . . .#photog #photography
@RWZombie I am attending your Detroit show on 11/27! I want to hear the best of the new stuff. What and Sick Bubblegum plus many classics!
B & B - Vallejo & Napa Valley @onlylilia Annual Holiday party in Dec. Planning now. A concert, wine tasting and Hors d'Oeuvres
Real estate agents, worldwide. Expand your listings under contract, worldwide.\
Costa Rica, small country incredible biodiversity: tropical rain forests, dense cloud forests, active volcanoes. [URL unfurl="true"]http://short.to/bxr5[/URL]
@Teerlink biking&gt;gardening&gt;schoolwork
@JSRWeek In RB, Molly Pitcher has a tasty brunch & I've heard good things about 2Senza too. In AP, Yvonne's Cafe is awesome!
# Check This Video --- Best Real Estate Deals,Hallandale Beach, FL 33009   [URL unfurl="true"]http://tinyurl.com/nqcdhg[/URL] [URL unfurl="true"]http://XeeSM.com/JEFF[/URL] another top XeeSM user - cool blog
@Fashion_iLIKE thanks for putting me on your special FF list ; ) Ages: 5 to 7 years at [URL unfurl="true"]www.TOYTOPIA.com[/URL]
Familia! Los quiero a todo! I want to send a special shout to all the Boricuas around the world & those that made the trip to NYC this wknd!
Chabad.org goes mobile: [URL unfurl="true"]http://www.chabad.org/795454[/URL]
 
Thanks Miller. Do you have any suggestion which PERL book is the best to refer as i am the beginner.
 
Hi,

how i want to remove the stopwords from the sentence.

____stopwords____
a
about
above
according
across
actually
adj
after
afterwards
again
against
all
almost
alone
along
already
also
although
always
among
amongst
an
and
another
any
anyhow
anyone
anything
anywhere
are
aren't
around
as
at
be
became
because
become
becomes
becoming
been
before
beforehand
begin
beginning
behind
being
below
beside
besides
between
beyond
billion
both
but
by
can
can't
cannot
caption
co
company
corp
corporation
could
couldn't
did
didn't
do
does
doesn't
don't
down
during
each
eg
eight
eighty
either
else
elsewhere
end
ending
enough
etc
even
ever
every
everyone
everything
everywhere
except
few
fifty
first
five
for
former
formerly
forty
found
four
from
further
had
has
hasn't
have
haven't
he
he'd
he'll
he's
hence
her
here
here's
hereafter
hereby
herein
hereupon
hers
herself
him
himself
his
how
.........
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top