Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Odd strtok issue

Status
Not open for further replies.

Guest42

Programmer
Nov 9, 2005
28
US
I'm attempting to write a method that splits a string up (according to a delimiter) into an array of strings, which returns at the end of the function. This problem gets a little hairy. Here's the current function:

char** splitString(char *c_str, int len, char delim)
{
char **temp;
char *str;
int i;

str = malloc(sizeof(c_str));
strcpy(str, c_str);
printf("%s\n", str);

temp = malloc(len*sizeof(char*));
temp[0] = malloc(MAXCHAR*sizeof(char));
temp[0] = strtok(str, &delim);

for(i = 1; i < len; i++)
{
temp = malloc(MAXCHAR*sizeof(char));
temp = strtok(NULL, &delim);
}

return temp;
}

The issue is as follows:
Say *stringX = "part1;part2;part3". You then have **splitup = splitString(stringX, 3, ';').

Well, originally, inside the function I used c_str directly with strtok() instead of copying it into str. temp would return as a perfectly split up string, but if you tried to use stringX after sending it through the function, it would be left with only "part1", and the rest would be gone.

So I tried using a temporary string (str), where stringX would not be affected, and that is the current state of the function. The initial strtok(str, &delim) returns "part1" fine, but the second call returns crazy characters, and the third call returns null. So while stringX is unaffected, temp doesn't return a very useful string.

How do I just make this thing return temp without going crazy on me?
 
1. Please use the [code][/code] tags for posting code.

2. str = malloc(sizeof(c_str));
c_str is a pointer, which means sizeof gets the size of the pointer, not the length of the string it's pointing to.
What you need here is
[tt] str = malloc(strlen(c_str)+1);[/tt]

3. temp[0] = malloc(MAXCHAR*sizeof(char));
You don't need to malloc any of your temp pointers.
All you're doing is setting each pointer to each successive return result of strtok.

4. temp[0] = strtok(str, &delim);
delim is a string - which means it needs a \0 in there.
Pointing at a single char has no \0 associated with it.
Pass the delimiter as a char*, not as a char.
[tt]splitup = splitString(stringX, 3, ";");[/tt]

--
 
Thanks. I understand all those changes, but unfortunately, it's still spewing out trash. So I put it in a seperate test program:

Code:
#include <stdio.h>
#include <string.h>

char** splitString(char *c_str, int len, char *delim)
{
    char **temp;
    char *str;
    int i;
    
    str = malloc(strlen(c_str)+1);
    strcpy(str, c_str);
    printf("%s\n", str);//At this point, str is "part1;part2;part3"
    temp = malloc(len*sizeof(char*));
    temp[0] = strtok(str, delim);
    printf("%s\n", str); //At this point, str is "part1"
    
    for(i = 1; i < len; i++)
    {
        temp[i] = strtok(NULL, delim);
    }
    
    return temp;
}

int main(void)
{
    char *t = "part1;part2;part3";
    char **u;
    u = splitString(t, 3, ";");
    printf("%s\n", t);
    system("pause");
    return 0;
}

As I put in the comments, it seems like, as soon as I put str through strtok, it gets butchered down to "part1", and so when I attempt to put it through, it gives me trash.

There must be something I am missing?
 
Apart from changing the main method slightly, it seems to be working as advertised.

You have got a memory leak in there though, and I am also not sure of the value of a split() function which requires you pass in the number of tokens - but never mind.
Oh, also, strtok() is not thread-safe/ re-entrant, so should really be advoided IMO.

Code:
int main(void)
{
    char *t = "part1;part2;part3";
    char **u;
    u = splitString(t, 3, ";");
    printf("OUT : %s\n", u[0]);
    printf("OUT : %s\n", u[1]);
    printf("OUT : %s\n", u[2]);
    system("pause");
    return 0;
}

--------------------------------------------------
Free Java/J2EE Database Connection Pooling Software
 
I tell a lie - there isn't a memory leak after all.

--------------------------------------------------
Free Java/J2EE Database Connection Pooling Software
 
Yes there is. I see some malloc()'s but no free()'s.
 
Well, the output seems to be better, and yet str still gets cut down, and I have no idea why. The reason this bothers me is that I am just trying to get the hang of pointers, but I don't really undersstand why this is happening. Coming over from Java where I don't have to worry about this stuff, it's tricky for me.

In anycase, I would like to make it so you don't have to pass the number of tokens, but I was struggling with making the pointers work right first. In any case, if strtok() is not so safe, what would be the recommended path?

As far as the memory leak goes, do I have to use free() on anything I malloc() when I am done using it? I thought that since str is completely contained within the function that it would free up along with the end of the function, but I guess I don't know.
 
> Well, the output seems to be better, and yet str still gets cut down, and I have no idea why
Because strtok() modifies the string you're tokenising, by filling all the characters which match delim with \0.
This is mainly why you take a copy of it before you start.

> do I have to use free() on anything I malloc() when I am done using it?
Yes.
Assuming that the string you're tokenising doesn't begin with a delimiter, then in main you would do.
Code:
free( u[0] );  /* the result of str = malloc(strlen(c_str)+1);*/
free( u ); /* the result of temp = malloc(len*sizeof(char*)); */
If the string does begin with characters which match delim, then things are a bit trickier.

> I would like to make it so you don't have to pass the number of tokens
That's fairly easy to do once the fixed number approach is working to your satisfaction.

> In any case, if strtok() is not so safe, what would be the recommended path?
There are several possible solutions, say using sscanf, strcspn. All approaches mean you have to do a fair amount of detailed memory management by yourself - allocating space for each string segment for example.
There isn't a really good answer to this one I'm afraid.

--
 
If you malloc() something, then you must free it - whether its in scope or not. Leaving a function where you malloc'ed something does not free it, because you have added it to the heap.

The below version is a modification of what you are doing, but without any memory leaks.

Code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>


void splitString(char *c_str, char **ret, char *delim)
{
    int i = 0;
    char* tmp;    
    char str[strlen(c_str)+1];


    strcpy(str, c_str);
    
    tmp = strtok(str, delim);
    while (tmp) {
        ret[i] = malloc(strlen(tmp) * sizeof(char) +1);
	strcpy(ret[i++], tmp);
	tmp = strtok(NULL, delim);
    }
    
}

int main(void)
{
	while (1) {
		char *t = "part1;part2;part3";
		char **u = malloc(3* sizeof(char*));
		splitString(t, u, ";");
		printf("OUT : %s\n", u[0]);
		printf("OUT : %s\n", u[1]);
		printf("OUT : %s\n", u[2]);

		free (u[0]);
		free (u[1]);
		free (u[2]);
		free(u);
	}
    return 0;
}

--------------------------------------------------
Free Java/J2EE Database Connection Pooling Software
 
Hah. I guess that is a much better way of doing things, sedj. I can only hope that once I become more pointer savy I will have wit left to be more clever about how I write my programs. But at least for now I can look at it and learn something from it.

Would it be reasonable to allocate **ret inside of the function, rather than allocating u before hand? I'm guessing that there are useful functions in the string.h lib that would allow me count the number of tokens my starting string will have, thus allocating a proper number of space.

int properNumber = [Count total tokens here]
**ret = malloc(properNumber*sizeof(char*));

Would there be anything wrong with that?
 
That seems OK to me.

An easy way to count the number of tokens is to just loop the string that needs to be tokenized, counting the number of characters that are the delimeters (might need two loops, one to loop the string, and one to loop the delimeter string).



--------------------------------------------------
Free Java/J2EE Database Connection Pooling Software
 
Still, it bothers me, looking back, that strtok() is unsafe for threading. Not to drag this forum thread on longer than it needs, but could you explain a little to me about what the issues with strtok() are that make it unsafe for threading? What are the potential negative effects? I don't mean to be a pain in the ass about it, but I don't want to be ignorant as to why, so I have to keep asking questions.
 
strtok(), gmtime(), localtime() and a bunch of other C library functions are not thread safe because they use an internal static variable to remember state. So if two threads call that same function at the same time, they are both sharing the same static variable and one of the threads could end up with the wrong return value.
 
In the interests of truth: in MS VC++ strtok function is thread-safe (if you link with multithread RTL version, of course). This C and C++ implementation allocates RTL internal static storage on per-thread basis.
Never call this function simultaneously for different strings in the (only) thread:
Code:
p1 = strtok(s1,d1);
p2 = strtok(s2,d2);
/* Don't try to parse both s1 and s2 simultaneously: */
/* strtok knows only s2 now... */
Be careful when use strtok to parse CSV export files: strtok skips leading delimiters so it skips empty (NULL) fields. Better use your own (simple) scanner code in this case (based on strpbrk and/or strcspn, for example).
 
I understand what you're saying cpjust. Makes perfect sense when I think about how strtok works, but I didn't know the reason for it. And thanks for the info, ArkM. For my purposes, I don't think strtok will be dangerous, but I'll make sure to keep this in mind.

Anyway, I want to thank everyone who answered my questions. This had been the most helpful forum thread I've experienced, and I feel that my understanding of pointers (and C in general) is much much better for it. I am very appreciative.

I'm going to go forge ahead in my learnings on my own again, though needless to say there will be more questions. But they will probably be a matter for a different thread. Thanks again!

-Guest42
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top