Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

A problem involving strtok

Status
Not open for further replies.

liberty4all

Programmer
Aug 27, 2002
11
0
0
IL
Hey everyone,

I am writing a program that reads from a file, separates the words from punctuation marks and aranges arranges the words in alphabetical order in a linked list.
I have a loop, which deals with one line at a time, taking care of every word. It works fine with a 1-line file, but when trying to read the 2nd line, all hell breaks lose. Let me show you my loop...

void main(){

char* token;
char seps[]=" , ; : ! ? - . \n \t ";
char line[128];
WordNode* List;

/*the loop*/
while (fgets(line,128,f)){
token = strtok(line,seps);
while (token!=NULL){
AddWord(&List,token);
token=strtok(NULL,seps);
}
}
}

The first line works perfectly, then when the program hits the while statement for the 2nd time, the list completely changes. The words that were once arranged in the right order now contain weird parts of the remaining text in the file.

*pulls hair*
Someone, please help me....

 
Me again, just signed up here.

Anyway, if i replace the " AddWord(&List,token); " command with a simple printf, then it works and prints out all the words in the file. However, if i replace AddWord with the most trivial function in the world, " push_front ", which adds words to the beginning of the list, it's still messed up.
 
strtok does not create new strings.
Each token returned by strtok is a pointer in the original string you provide.
An example (I show the ending \0 character of each line).
Say line is:
Code:
line="Four words in line\0"
the token returned by strtok are:
Code:
tok1=line="Four\0"
tok2=line+5="words\0"
tok3=line+11="in\0"
tok4=line+14="line\0"
strtok replaces the first separation character with a \0 and returns the pointer to the string.
Now if you read another line in the same line variable:
Code:
fgets(...) -> "Five words in this line\0"
The tokens of the previous line change even before you call strtok because they really point to somewhere in the line variable. So, just after the fgets, you get:
Code:
tok1=line="Five words in this line\0"
tok2=line+5="words in this line\0"
tok3=line+11="in this line\0"
tok4=line+14="this line\0"

To solve this you have to copy the value returned by strtok to some new location different for all line of the file.
Here I duplicate the whole line before calling strtok on the copy. Note that I do not free the allocated memory because the tokens point to the allocated memory.

Code:
while (fgets(line,128,f)){
   char *tmpLine = malloc(sizeof(char)*(strlen(line)+1));
   strcpy(tmpLine, line);
   token = strtok(tmpLine,seps);
   while (token!=NULL){
       AddWord(&List,token); 
       token=strtok(NULL,seps);
   }
}
With this method, there is no easy way to free the allocated memory: you could call free only on the first token of each line and only if all lines start with a word.

Another cleaner solution is to have AddWord allocate its own memory and copy the token instead of copying the whole line after the fgets. This way you could free the token when you want.
 
how does this strtok work

while (fgets(line,128,f))// returns 128 character line

{
char *tmpLine = malloc(sizeof(char)*(strlen(line)+1));
strcpy(tmpLine, line);
token = strtok(tmpLine,seps);

//does it return the first string delimited by seps
//what about the rest of the string
// suppose if it finds 5 pointers (say) token is not
// an array to store the addresses
// how does it work
// Kindly clarify

while (token!=NULL){
AddWord(&List,token);
token=strtok(NULL,seps);
}
}



kindly clarify this doubt
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top