simple string parsing error, help please!

mkosloff · May 4, 2003

I am coding a simple program that counts the amount of lines in a file that have specific things, like the string "ing" or begins with a capital letter. My program runs without printing any output, just the top "Word Count" header. If anyone can see what I'm doing wrong, I'd appreciate the help, as I've been stuck on this for a few days now! Thanks!

---------------------------------

dPrint();

exit (0);

sub dPrint()
{
open( @lines,'tt') || die;

while (<@lines>)
{
my @word = split /\s+/, join("",@lines);

my %wc = ();
if ($word =~ /ing/)
{
$wc{$ing}++
}

if ($word =~ /^[A-Z]/)
{
$wc{$cap}++
}

if ($word =~ /^ /)
{
$wc{$space}++
}

if ($word =~ /\.$/)
{
$wc{$period}++
}

if ($word =~ /tt/)
{
$wc{$tts}++
}
}

print "----\tWord \tCount ----\n";

foreach my $ing (sort {$a cmp $b} keys %wc)
{
print "\ting = $wc{$ing}\n";
print "\tcap = $wc{$cap}\n";
print "\tspace = $wc{$space}\n";
print "\tperiod = $wc{$period}\n";
print "\ttts =$wc{$tts}\n";
}
}

icrf · May 4, 2003

change this:

Code:

    open( @lines,'tt') || die;
    
    while (<@lines>)
    {
      my @word = split /\s+/, join(&quot;&quot;,@lines);

      my %wc = ();

to this:

Code:

    open(FILE,'tt') || die &quot;Unable to open 'tt': $!&quot;;
    my %wc = ();
    while (<FILE>)
    {
       my @words = split /\s+/;
       foreach my $word (@words)
       {
           #all your if($word =~ //) tests here
       }

and I think you should be okay. You have to open to a file handle, not an array. The file handle is what goes inside the <> to read from it. Then you split each line in the file into an array of words, then you test each word in another loop. It looks like that's what you were trying to do.

Hope it helps.

----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light

mkosloff · May 4, 2003

thanks! now it's outputting, but each counter has a value of "6", which is strange because my file has 7 lines of code, and each counter should have a value of 1 each. no idea what the problem is....

------------------------

open(FILE,'tt') || die "Unable to open 'tt': $!";
my %wc = ();
while (<FILE>)
{
my @words = split /\s+/;
foreach my $word (@words)
{
if ($word =~ /ing/)
{
$wc{$ing}++
}

if ($word =~ /^[A-Z]/)
{
$wc{$cap}++
}

if ($word =~ /^ /)
{
$wc{$space}++
}

if ($word =~ /\.$/)
{
$wc{$period}++
}

if ($word =~ /tt/)
{
$wc{$tts}++
}
}
}

print "----\tWord \tCount ----\n";

foreach my $ing (sort {$a cmp $b} keys %wc)
{
print "\ting = $wc{$ing}\n";
print "\tcap = $wc{$cap}\n";
print "\tspace = $wc{$space}\n";
print "\tperiod = $wc{$period}\n";
print "\ttts =$wc{$tts}\n";
}

icrf · May 4, 2003

Few things I didn't notice before. Each of your increment lines lack a semicolon at the end. If it's parsing and compiling, it apparently doesn't care (probably because it's the last or maybe only statement in the block). It's generally good practice to have those.

Since you're splitting on spaces, this should never match: /^ / so it probably doesn't pay to have it in there.

Your final foreach loop that prints at the bottom is a little odd. You have a foreach looping over the sorted keys in the hash %wc, but then in the loop, you explicitly print everything in the hash. Let the loop do it's work.

Code:

    foreach my $key (sort {$a cmp $b} keys %wc)
    {
        print &quot;\t$key = $wc{$key}\n&quot;;
    }

Also, since your input is only seven lines, go ahead and post that so we can test with your data, too.

Happy coding.

----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light

mkosloff · May 4, 2003

Okay, thanks for the help. I am still getting a return value of "6" for this, and after making the change you thought in your last post, it only prints one entry of "6" and does not print the name of the variable it's counting.

As for the variable that counts the spacing, I want to count every line that begins with a space.. I guess that's not going to do it?

Alright, here's what I'm working with, below it is the data file:

-----------------------------------

open(FILE,'tt') || die "Unable to open 'tt': $!";
my %wc = ();
while (<FILE>)
{
my @words = split /\s+/;
foreach my $word (@words)
{
if ($word =~ /ing/)
{
$wc{$ing}++;
}

if ($word =~ /^[A-Z]/)
{
$wc{$cap}++;
}

if ($word =~ /^ /)
{
$wc{$space}++;
}

if ($word =~ /\.$/)
{
$wc{$period}++;
}

if ($word =~ /tt/)
{
$wc{$tts}++;
}
}
}

print "----\tWord \tCount ----\n";

foreach my $key (sort {$a cmp $b} keys %wc)
{
print "\t$key = $wc{$key}\n";
# print "\tcap = $wc{$cap}\n";
# print "\tspace = $wc{$space}\n";
# print "\tperiod = $wc{$period}\n";
# print "\ttts =$wc{$tts}\n";
}

---------------------

i am dancing
this is fun.
this is very fun.
Hello world
how about this
he believes it
matt is short for matthew

mkosloff · May 5, 2003

if anyone can help me on this, I'd really appreciate it!

icrf · May 5, 2003

First, you should try and get in the habit of including these two lines at the top of every perl script:

Code:

use strict;
use warnings;

As they will let you know about many problems. In this case, it came back and told me: [tt]Global symbol "$ing" requires explicit package name at E:\Documents\test\test.pl line 23.[/tt]
I didn't notice before, but you're storing things in the hash at key $ing, $cap, etc. Those scalars are never defined anywhere, so they all amount to nothing as a hash key, which is where it was storing all six results.

Take the $ out of the hash keys. For instance, make this kind of change for all the lines in the loop:

Code:

$wc{$ing}++;
$wc{ing}++;

Also, if you want to find lines beginning with a space, you can change your split pattern to handle it.
[tt]my @words = split /(?<!\A)\s+/;[/tt]
The (?<! pattern) construct is called a negative look-behind assertion. It "matches" only if the pattern does not match what is just before that point. In this case, the pattern is \A, which is an anchor the beginning of the string. In short, it splits the line up into words on a space, so long as that space is not at the beginning of the line. Check here for more details and better explaination:

http://www.perldoc.com/perl5.8.0/pod/perlre.html

Should get output similar to this:
[tt]---- Word Count ----
cap = 1
ing = 1
period = 2
space = 1
tts = 2
[/tt]

Sorry for the delay, but a nice storm came rolling through the southeast and Charter cable dropped service for about 24 hours.

Happy coding.

----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

simple string parsing error, help please!

mkosloff

Programmer

icrf

Programmer

mkosloff

Programmer

icrf

Programmer

mkosloff

Programmer

mkosloff

Programmer

icrf

Programmer

Similar threads

Part and Inventory Search

Sponsor