Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Grep for occurence of a word and count the number of lines

Status
Not open for further replies.

mrimagepueblo

Programmer
Dec 15, 2003
52
US
I've gone all around this but am not getting what I need. I want to count the number of times a word appears in a text file, but instead of showing the total number of times it occurs, I only want to show the number of lines it occurs in.
This is what I have, but it's actually counting all the occurences. I've tried all kinds of different things with grep and word count, but not getting what I need. I will duplicate the magical formula for a list of cities I have.

$Alburg = `grep -c 'Alburg' < $data_file_path`;
 
how about

$Alburg = `grep 'Alburg' < $data_file_path |\
wc -l`;

gives number of lines that contain 'Alburg'

or

$Alburg = `tr '[:space:]' '\n' $data_file_path|\
grep -c 'Alburg'`;

gives the total number of occurrences of 'Alburg'




HTH,

p5wizard
 
the first example you gave me returns the same results as my example. The second example gives me nothing.
Just curious, I tried with and without spaces and the new line like your examples; what is the significance of the blackslash \, it doesn't seem to change the results either.



 
How sure are you that the first example is not producing the right answer?
My gut reaction would have been 'grep "Alburg" | wc -l' as in that example.



Trojan.
 
The file is a pipe delimited file on the server. I am importing it into Microsoft Access and filtering records by the town. And when I search with my cgi script it shows the same number of results as the access filter. So absolutely sure.
 
like

other text|Alburg|some more text|and|then|some
this text|line|does not|contain|the word|i'm looking for
this line|does|contain|Alburg|in fact|even|twice|Alburg|there
Alburg|here|once

?

# cat data.txt
other text|Alburg|some more text|and|then|some
this text|line|does not|contain|the word|i'm looking for
this line|does|contain|Alburg|in fact|even|twice|Alburg|there
Alburg|here|once
# grep -c Alburg data.txt
3
# grep Alburg data.txt|wc -l
3
# cat data.txt|tr '| \t' '\n\n\n'|grep -c Alburg
4
# cat data.txt|tr '| \t' '\n\n\n'|grep Alburg|wc -l
4


You might also want to use -w flag on grep to count words "Alburg" and not e.g. "Alburger"

See the man pages for tr and grep


HTH,

p5wizard
 
I'm really stymied now. I copied p5wizard first example directly into a file on the server called test.txt and then from a telnet prompt typed grep -c Alburg data.txt and got the results of 3 which was to be expected. I don't get the correct count at all in my examples, in fact the returning # doesn't make any sense at all, it appears to be a small but random number. I expect 5 as the result, but get 30. So my thoughts are that the file is coming in as tab delimited and I'm simply converting it to a pipe delimited file with
perl -pi -e 's/\t/\|/g' idx_1.txt. It doesn't matter even if I count in the original file in tab delimited.

Could I be having problems, because the file is possibly coming in as a DOS tab delimited file and I'm on a Unix box?
 
Hi

No, the DOS end-of-line is [tt]\r\n[/tt] and Unix use only [tt]\n[/tt], so you will have an uninterpreted extra character at the end of each line, shown as [tt]^M[/tt] in some editors. If in doubt, remove the [tt]\r[/tt] :
Code:
sed 's/\r$//' dosfile > unixfile
[gray]# or[/gray]
tr -d '\r' < dosfile > unixfile

Try p5wizard's advice regarding the use of -w.

Feherke.
 
this command works (it's another city)
grep -w 'Eden' idx_1.txt|wc -l
so I'm using the tick marks to get an exact match, but Alburg for example can have Alburg Village in the same line or even Alburg Elementary and I only want a count of 1 for the city of Alburg in that line.
So I'm half way there now... How do I declare just 'Alburg' only???
I read in a couple of places I could use \<Alburg\> but that gives results in the thousands. I even tried '\<Alburg\>'
 
I bet you have occurances of the word in a different field.
You probably need to restrict the match to ONLY the field in question.
Grep will normally match anywhere in the record.


Trojan.
 
Hi

No, the [tt]<[/tt] and [tt]>[/tt] are formatching the end of the word, so for "Alburg Village" will not make any difference. If the values are not padded, the try to include the delimiter too, in your case the pipe :
Code:
grep -c '|Alburg|' idx_1.txt

Feherke.
 
since the data file is pipe delimited, how you would construct the statement to only search then on field #10?
 
well kinda yes, and kinda no. I am getting a good count now with this command.
$Alburg= `grep '|Alburg|Vermont|' < $data_file_path | wc -l`;

I added the next field, which would always be unique to the string of text. Seems llike it shouldn't have to be that way .

Everyone's answers were in line with what I have read and tried. I just don't get why some words worked perfectly and others don't.

Let's put this one to rest. I'll spend more time on it after a night's sleep and if I come up with some definitive solution, I'll post it here.

THANK EVERYONE SO MUCH FOR YOUR HELP !!!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top