Grep for occurence of a word and count the number of lines

mrimagepueblo · Sep 5, 2005

I've gone all around this but am not getting what I need. I want to count the number of times a word appears in a text file, but instead of showing the total number of times it occurs, I only want to show the number of lines it occurs in.
This is what I have, but it's actually counting all the occurences. I've tried all kinds of different things with grep and word count, but not getting what I need. I will duplicate the magical formula for a list of cities I have.

$Alburg = `grep -c 'Alburg' < $data_file_path`;

p5wizard · Sep 5, 2005

how about

$Alburg = `grep 'Alburg' < $data_file_path |\
wc -l`;

gives number of lines that contain 'Alburg'

or

$Alburg = `tr '[:space:]' '\n' $data_file_path|\
grep -c 'Alburg'`;

gives the total number of occurrences of 'Alburg'

HTH,

p5wizard

mrimagepueblo · Sep 5, 2005

the first example you gave me returns the same results as my example. The second example gives me nothing.
Just curious, I tried with and without spaces and the new line like your examples; what is the significance of the blackslash \, it doesn't seem to change the results either.

TrojanWarBlade · Sep 5, 2005

How sure are you that the first example is not producing the right answer?
My gut reaction would have been 'grep "Alburg" | wc -l' as in that example.

Trojan.

mrimagepueblo · Sep 5, 2005

The file is a pipe delimited file on the server. I am importing it into Microsoft Access and filtering records by the town. And when I search with my cgi script it shows the same number of results as the access filter. So absolutely sure.

p5wizard · Sep 5, 2005

mrimagepueblo · Sep 6, 2005

I'm really stymied now. I copied p5wizard first example directly into a file on the server called test.txt and then from a telnet prompt typed grep -c Alburg data.txt and got the results of 3 which was to be expected. I don't get the correct count at all in my examples, in fact the returning # doesn't make any sense at all, it appears to be a small but random number. I expect 5 as the result, but get 30. So my thoughts are that the file is coming in as tab delimited and I'm simply converting it to a pipe delimited file with
perl -pi -e 's/\t/\|/g' idx_1.txt. It doesn't matter even if I count in the original file in tab delimited.

Could I be having problems, because the file is possibly coming in as a DOS tab delimited file and I'm on a Unix box?

feherke · Sep 6, 2005

Hi

No, the DOS end-of-line is [tt]\r\n[/tt] and Unix use only [tt]\n[/tt], so you will have an uninterpreted extra character at the end of each line, shown as [tt]^M[/tt] in some editors. If in doubt, remove the [tt]\r[/tt] :

Code:

sed 's/\r$//' dosfile > unixfile
[gray]# or[/gray]
tr -d '\r' < dosfile > unixfile

Try p5wizard's advice regarding the use of -w.

Feherke.

http://rootshell.be/~feherke/

mrimagepueblo · Sep 6, 2005

this command works (it's another city)
grep -w 'Eden' idx_1.txt|wc -l
so I'm using the tick marks to get an exact match, but Alburg for example can have Alburg Village in the same line or even Alburg Elementary and I only want a count of 1 for the city of Alburg in that line.
So I'm half way there now... How do I declare just 'Alburg' only???
I read in a couple of places I could use \<Alburg\> but that gives results in the thousands. I even tried '\<Alburg\>'

TrojanWarBlade · Sep 6, 2005

I bet you have occurances of the word in a different field.
You probably need to restrict the match to ONLY the field in question.
Grep will normally match anywhere in the record.

Trojan.

feherke · Sep 6, 2005

Hi

No, the [tt]<[/tt] and [tt]>[/tt] are formatching the end of the word, so for "Alburg Village" will not make any difference. If the values are not padded, the try to include the delimiter too, in your case the pipe :

Code:

grep -c '|Alburg|' idx_1.txt

Feherke.

http://rootshell.be/~feherke/

mrimagepueblo · Sep 6, 2005

since the data file is pipe delimited, how you would construct the statement to only search then on field #10?

feherke · Sep 6, 2005

hi

Code:

cut -d '|' -f 10 < idx_1.txt | grep -c -x Alburg
[gray]# or[/gray]
awk -F '|' '$10=="Alburg" { i++ } END { print i }' idx_1.txt

Feherke.

http://rootshell.be/~feherke/

TrojanWarBlade · Sep 6, 2005

Good solutions.
mrimagepueblo,
Do these solutions solve your problem?

Trojan.

mrimagepueblo · Sep 6, 2005

well kinda yes, and kinda no. I am getting a good count now with this command.
$Alburg= `grep '|Alburg|Vermont|' < $data_file_path | wc -l`;

I added the next field, which would always be unique to the string of text. Seems llike it shouldn't have to be that way .

Everyone's answers were in line with what I have read and tried. I just don't get why some words worked perfectly and others don't.

Let's put this one to rest. I'll spend more time on it after a night's sleep and if I come up with some definitive solution, I'll post it here.

THANK EVERYONE SO MUCH FOR YOUR HELP !!!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Grep for occurence of a word and count the number of lines

mrimagepueblo

Programmer

p5wizard

IS-IT--Management

mrimagepueblo

Programmer

TrojanWarBlade

Programmer

mrimagepueblo

Programmer

p5wizard

IS-IT--Management

mrimagepueblo

Programmer

feherke

Programmer

mrimagepueblo

Programmer

TrojanWarBlade

Programmer

feherke

Programmer

mrimagepueblo

Programmer

feherke

Programmer

TrojanWarBlade

Programmer

mrimagepueblo

Programmer

Similar threads

Part and Inventory Search

Sponsor