count number of occurances of string in file 2

mgp77 · Jan 12, 2006

Hello,

I realize that I can write a function to do this however I was wondering if there is an existing Perl function that will allow me to count the number of occurances of a particular string in a file. Currently I have the file opened I extract a string and then want to check in the remainder of the file to see if the string appears again. If it doesn't I want to do somethign with the string and then continue on with the next line and extract its string and so on. So ideally I don't want to have to open the file, extract the string, and then have to close it and open it again to check to see if the string appears more than once. I want to do it all in one step. If I open it twice I will lose my position in the file so that I won't be able to extract the next string. Any and all help is appreciated.

Thanks

KevinADC · Jan 12, 2006

there very well could be a module written to do this or a well known method. Can you post the code you have been using?

mgp77 · Jan 12, 2006

Here is what I'm trying but it doesn't appear to be working too well

Code:

use strict;
sub checkForMultipleLabels($);

my %labelNames = ();
open(FILE, "<$ARGV[0]") || die("Cannot open file $ARGV[0]");
open(OUT, ">labelsAndColumns.txt") || die("Cannot open output file");
open(OUT2, ">labels.txt") || die("Cannot open output file");

while (<FILE>)
{	
	if ($_ =~ /Column\s*:\s*(.+)\s{2}\(".*"\.".*""(.*)"\)/i)
	{
		my $count = checkForMultipleLabels($1);
		#label names will only be listed once
		#if (exists $labelNames{$1})
		#{
		#	if ($labelNames{$1} ne $2)
		#	{
		#		print OUT2 "The label $1 has the value $labelNames{$1} stored and also found the value $2\n";
		#	}
		#}
		if ($count eq "yes")
		{
			print OUT2 "The label $1 has the value $2 and also found other different values\n"
		}
		else
		{
			print OUT "$2\t$1\n";
			$labelNames{$1} = $2;
		}
	}

}
close FILE;
close OUT;
close OUT2;

#for my $key ( keys %labelNames ) 
#{
#	print OUT2 "$key\n";
#}

sub checkForMultipleLabels($)
{
	my $string = shift;
	my $count = 0;
	open(FILE2, "<$ARGV[0]") || die("Cannot open file $ARGV[0]");
	open(OUT3, ">test.txt") || die("Cannot open file test.txt");
	while(<FILE2>)
	{
		if ($_ =~ /Column\s*:\s*$string\s{2}\(".*"\.".*""(.*)"\)/i)
		{
			print OUT3 "String $string was found in line $_\n";
			$count++;
		}
		if ($count > 1)
		{
			close FILE2;
			close OUT3;
			return "yes";
		}
		
	}
	close OUT3;
	close FILE2;
	return "no";
}

vjcyrano · Jan 12, 2006

You could use this.

my $string = "words";
my $var = qx[grep -r '$string' file.txt | wc -l];

once you get the string,
grep the string in the file and do a line count.
that should give you the number of times the string occurs in the file and then u can go to the next line and repeat the process.

mgp77 · Jan 13, 2006

I see. I'm working on Windows so would that solution still work considering windows doesn't have a the grep command. Also, my situation is a little more complex I actually want to search for the string in the file but is has to appear in a certain context and I only want to count it if one specific element in the context is different. I'll attempt to explain. My regex from above is

$_ =~ /Column\s*:\s*(.+)\s{2}$".*"\.".*""(.*)"$/i

Now I only want to count occurances where the string I'm searching with appears in the file where the $1 variable in the regex above would be and only if the $2 portion of the regex is different from a specific string (i.e. hello) Does that makes any sense. Can I somehow use the grep command to accomplish this? All help is appreciated!

KevinADC · Jan 13, 2006

you can use perls grep function instead of the operating systems grep function. But that's not to say it would work for your particular application. Your explanation makes sense, but it's still confusing, maybe some example lines of data, instead of the regexp, would help clear it up.

vjcyrano · Jan 13, 2006

open(DAT,"data.txt") || die "die" ;

@local = <DAT>;

$var = scalar grep(/Data.*and/, @local);

print $var;

Replace 'Data.*and' with the string and regular expression

mgp77 · Jan 13, 2006

Can you explain to me what is happening in the line

@local = <DAT>;

$var = scalar grep(/Data.*and/, @local);

Is that opening the file and assigning each line in the file as an element in the array and then going through each element in the array one by one and seeing if the regex specified by /Data.*and/ is satisfied and then returning a total of the number of occurances. If so that is getting closer to what I want but I would want to have an if clause in there that would check each occurance that was found to see if a specific part of the regex, specified by the $2 portion in the post above, was a particular value and only count those that were not equal to said value.

tchatzi · Jan 14, 2006

you should try cygwin once you are working in windows so you can have whatever unix command you want.
Then it is just a mater of

grep -c 'my string' myfile.txt

or you can give us a sample of the file the string you are looking for and try to find a search pattern together

``The wise man doesn't give the right answers,
he poses the right questions.''
TIMTOWTDI

glg1 · Jan 14, 2006

I found a great little application - Windows Grep that has the functionality you might need. Its a bit more powerful than standard unix grep in that it has replace, save, etc.
Its not the worlds fastest program, but for a few files, its perfectly adequate.

http://www.wingrep.com/

There are a number of grep for windows programs out there, but this one has worked well for me.

Cheers, - Happy searching,
George

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

count number of occurances of string in file 2

mgp77

Programmer

KevinADC

Technical User

mgp77

Programmer

vjcyrano

Programmer

mgp77

Programmer

KevinADC

Technical User

vjcyrano

Programmer

mgp77

Programmer

tchatzi

Technical User

glg1

Programmer

Similar threads

Part and Inventory Search

Sponsor