Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Word Count?

Status
Not open for further replies.

walks

Technical User
May 7, 2001
203
0
0
CA
Can someone tell me how I would possibly use a string to do a wordcount from a text file?
 
Hi, you need to read up on the STL algorithm function [tt]count_if()[/tt] which is declared in [tt]<algorithm>[/tt]

You can write your own simple function that will determine a 'word' and pass that to [tt]count_if()[/tt] as the third parameter.
In the simplest form you can count the number of spaces/line breaks and add 1 to the total but this will not give a true figure if someone entered more than one space between words.

There's the ANSI C function [tt]strtok()[/tt] declared in [tt]<string.h>[/tt] (or [tt]<cstring>[/tt]) that will do a similar thing.

Either way, you're going to have to add some additional checking to ensure that multiple subsequent counts of spaces/line breaks only count as one token towards the final total.
tellis.gif

[sup]programmer (prog'ram'er), n A hot-headed, anorak wearing, pimple-faced computer geek.[/sup]​
 
Just to start you off, here's a simple one which counts each and every non-printing character then adds '1' to the total. Obviously, you need to do something about double spaces, etc. but this should give you the general idea:

[tt]

#include <ctype.h>
#include <string>
#include <iostream>
#include <algorithm>


bool IsBreak(char c)
{
return (isspace(c)||!isprint(c));
}

void main()
{
std::string str = &quot;The quick brown fox jumped over the lazy dog.&quot;;

unsigned wCount = 1+std::count_if(str.begin(),str.end(),IsBreak);

std::cout << &quot;Word count is: &quot; << wCount << std::endl;
}
tellis.gif

[sup]programmer (prog'ram'er), n A hot-headed, anorak wearing, pimple-faced computer geek.[/sup]​
 
Thanks for the help guys but Im a little confused with the stuff suggested, this is my first semester of writing c++ so Im still struggling with it.

Essentially I will be reading in a txt file which will more then likely look like a thing of c++ code. I have to account for tabs, double spaces etc. I need to count the total number of lines (easily handled), total number of words, total number of non-whitespace characters, total number of letters, total number of digits, and the total number of punctuation.

Im sure there were some sort of string commands to tell the difference between everything. Could someone tell me where I could perhaps locate these string commands?
 
CString str;
...
str.SomeMethod //Find a method that can help you I'm from Russia :)
 
There are several useful functions defined in [tt]<ctype.h>[/tt] which examine a single character:

[tt]
isalpha() // returns non zero if parameter is
// any upper or lower case letter

isdigit() // returns non zero if parameter is
// any digit

iscntrl() // returns non zero if parameter is
// any control character such as tab or new line

ispunct() // returns non zero if parameter is
// some kind of punctuation character
[/tt]

You'll need to know about accessing strings from files and also how to use loops to work your way through each line of the text. Remember that a string is simply an array of characters which is why you should examine each line of text in a loop and focusing on each individual character.

tellis.gif

[sup]programmer (prog'ram'er), n A hot-headed, anorak wearing, pimple-faced computer geek.[/sup]​
 

#include <iostream>
#include <string>

using namespace std;

int main()
{
// The string to be searched
string text = &quot;Smith, where Jones had had \&quot;had had\&quot;, had had \&quot;had\&quot;.&quot;
&quot; \&quot;Had had\&quot; had had the examiners' approval.&quot;;

string word = &quot;had&quot;; // Substring to be found

cout << endl << &quot;The string is: &quot; << endl << text << endl;

// Count the number of occurrences of word in text
int count = 0; // Count of sub-string occurrences

for(int index = 0 ; (index = text.find(word, index)) != string::npos ;
index += word.length(), count++)
;

cout << &quot;Your text contained &quot;
<< count << &quot; occurrences of \&quot;&quot;
<< word << &quot;\&quot;.&quot;
<< endl;

return 0;
}
 
A proposal I made in a previous post, which I will still stick with for it's versitility, is to traverse the entire string:


look at first char in string
while there are more chars in the string ('\0' not encountered)
if a letter is encountered and bool wordFound is false
set bool wordFound to true
increment wordCount
else if a non-alpha char is encountered
set bool wordFound to false
look at next char in string

This algorithm will account for single spaces, double spaces, quinti-von-dipple-di-tripledy spaces (I like Dr. Suess [tongue]). It will also account for smileys, oddly spaced punctuation, tabs, newlines, etc., etc..
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top