Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

switched systems, now my code doesn't work 2

Status
Not open for further replies.

esmithbda

IS-IT--Management
Jun 10, 2003
304
US
Good day,

I had written some code in C two Linux systems which was working great on them for months, no errors and very fast.
One was an AMD processor and I compiled that code with the newest gcc (3.x). The other was a dual PIII machine and I compiled that with the newest Intel compiler for Linux.
I had no errors and no issues with the code.

I have since moved and no longer have those machines. Instead I now have a FreeBSD machine. This machine is an AMD machine, but it has gcc 2.95.4.

My code compiles fine with no errors, and then here is the annoying part:
Out of 10 times being run, it will work 8 or 9 of those, but occassionally will seg fault out and dump core.

At first I thought I had traced the issue down to where if a file that it needed to create was already there, then it wouldn't crash, but if it had to create it, then it would crash... But that is not the case after further investigation.

I recompiled the code with "gcc -g" to and then got it to dump core and ran the program and the core through gdb.
It game me:
#0 0x8048f4e in analyzeThis (fileName=0x32393839, returnString=0x32373638 <Address 0x32373638 out of bounds>)
at analyze.c:393
393 printf(&quot;Filename: %s\n&quot;, *fileName);


Line 393 is where I added in a line to see what it saw as the filename getting passed in. But it looks like it is saying the real issue is with the returnString reference being out of bounds.
This is all confusing to me since it all worked perfectly on the other systems, with no change in the code at all.

The &quot;analyzeThis&quot; function that it refers to is called like this:
char analysisData[100];
analyzeThis(&argv[1], analysisData);

Then the function itself, very truncated than the one I have, looks like this:
char *analyzeThis(char *fileName[], char *returnString)
{
FILE *pFile;
printf(&quot;Filename: %s\n&quot;, *fileName);
/*I have another function that gets called earlier in the program that handles the files in the exact same way and it never dies*/
pFile = fopen(*fileName, &quot;r&quot;);
if (pFile == NULL){
perror(&quot;Error opening file&quot;);
printf(&quot;there was an error opening the file, we are leaving now\n&quot;);
exit(1);
}
else {
/*do some stuff */
}

/*these are obviously acted on in the function, but it is large and I don't really want all of the code in here - this is the stuff it is failing on)*/

sprintf(returnString,&quot;A: %f\nB: %s\n&quot;,myA, myB);

return returnString;

}


I don't know if I just have really poor programming style and the times it was all working on the other systems was just because those compilers are a lot more friendly than this one? Or if it is just some basic differences in the OS/compiler versions.

Any help or suggestions would make my week.

Thanks
 
Problem is probably in the way char* filename[] is interpreted. If you are only passing one filename, why not just remove the level of indirection. Try
Code:
analyzeThis(argv[1], analysisData);

char *analyzeThis(char *fileName, char *returnString)
...
;
    printf(&quot;Filename: %s\n&quot;, fileName);
    pFile = fopen(fileName, &quot;r&quot;);
It may be taking the wrong level of indirection, thus causing the crash.
 
Well, that does look like a stupid mistake there - thanks for pointing that out.
I'm probably obviously relatively new to this and as with many newbies, have had issues with the pointers, esp when dealing with the strings.

That makes more sense now, and sure enough, after making those changes, as well as another function that is called the same way, it no longer dumps core.

That said, it still fails regularly, which it never did before on the other systems.

The jist of the code is that it is called &quot;programName inputFile outputFile&quot;.
The inputFile has data that it pulls out and analyzes, and then in the end, makes a decision and writes that out to the outputFile.

Now instead of dumping core, it is telling me &quot;Error opening file: Bad address&quot; in reference to the same spot in the code above at the perror call.
BUT, it doesn't do this all of the time - with the exact some inputFile it will sometimes work, and sometimes not.

any ideas.
 
Whilst your original code was unusual, there was nothing actually wrong with it. By itself, it was not the problem.

xwb's suggestion of removing one level of indirection certainly simplifies things, and it's something I would have suggested that you do.

> #0 0x8048f4e in analyzeThis (fileName=0x32393839, returnString=0x32373638 <Address 0x32373638 out of
This is where you need to know your ASCII character set inside out - those addresses look like &quot;2989&quot; and &quot;2768&quot; when interpreted as strings.

My guess is, something is writing to memory it shouldn't do, and this is merely the visible side effect of that. My first step would be to see if those numeric strings were present in any other code or the files being processed. Knowing where such strings get manipulated may lead you to the cause of the problem.

> because those compilers are a lot more friendly than this one?
Not necessarily. A different compiler will simply allocate memory in different ways, so memory corruption problems can go un-noticed for a very long time (they always hit something unused). A change of compiler rearranges its memory usage, and suddenly, something valuable is in harms way, and bang!, its game over for your program.
 
This program is technically only run once on a data set, but there are over 2000 datasets right now.

The Linux servers that I used before only had 256MB ram, this new FreeBSD machine has a gig. It is amusing that now with the larger memory space, it is now running into things. That said, this is a server doing other things, whereas before, the machines sole reason for existing was to run this code.

Like I said, I am probably doing something wrong with pointers and as you say, this is the side effect that is coming from all of that.

I find it curious that if I run it back to back 10 times (type it out on the command line and let it run, then press the up arrow to recall that command and bang it out again, etc), it will work perfectly and return the correct result 5-8 of those times. The rest of the time it will give an error.
Very strange to me - the worst bugs are the ones that are random.

That said, there are some parts of the code that are random, perhaps that is part of the issue. If the random number generated is in the &quot;right/wrong&quot; spot, then I have the issue?

Don't know.

Thank you so much for the tips - when I get a change again to go through the code and the data for the example, I will see if I can find something else that might be the culprit.
 
At a guess something is overwriting one of your strings in memory.
1) Is the analysisData big enough? Has this been declared on the stack, heap or in global memory?
2) Is your input buffer big enough? Is it on the stack or heap?

Also, does any data contain the sequences 2989 and 2768? At a guess, some buffer is being overwritten.

Try passing in the size of analysisData to the routine and only using that size of string in your result.
 
Well, thank you everyone for your tips.

I finally had a less stressful day at work, came home and made myself dinner, stretched out on the couch and really read through the code.

The issue was an odd one and I really don't see how this never caused an issue on the Linux systems.

This whole thing was puzzling to me because the line that this was failing at was very early on in the function, and the function call was very early on in the program. So basically, not a whole heck of alot was going on prior to it.
So if there was something corrupting the memory, there was very little where it could be doing it.

I made changes to some variable names just to clarify things for myself. Then I made some of the strings a little bigger in case they were getting squeezed with more than they were setup for (even though I had gone over that a thousand times before to make sure that it was both small, but also plenty big for what it would need).

And then I saw a line... a line directly above the line where this was breaking. It was a sprintf() call that was taking some varibles that were populated (randomly) above and formatting them into a string - this string was the starting point for a genetic algorithm, the next step was to read in some data and then learn and grow the algorithm as it reacted to the data.

So I put in a printf() call directly over the sprintf() call so that I could see what it was passing on...

Sure enough, what was supposed to be a series of numbers... started with a &quot;d&quot;.

I looked at the code and there it was:
Code:
    /* create a string out of the numbers */
    sprintf(bestAlg, &quot;d%,%d,%0.2f&quot;, conditionOne, conditionTwo, conditionThree);

There was a &quot;d%&quot; where there should have been a &quot;%d&quot; - so sloppy.
I have no idea at all how this worked all the time on the Linux systems, and some of the time on this system.
Essentially, since this was just the starting point, it was quickly overwritten with new data (correctly) very early in the program - so if it somehow made it through trying to figure out what to do with the &quot;d%&quot; then it was fine....


Anyway, thanks again everyone. In refernce to the 2989 and the 2768 - none of the data I look at contains those values. There is a data set (that this program was totally unaware of in terms of obvious awareness) in another directory that was 2768 lines long, but I can't guarentee that was there when the program was running with those exact errors.

I am a very happy man after figuring that out, and humbled/embarrassed at the mistake.

Thanks,

Eric
 
Glad to here you finally tracked it down :)

> sprintf(bestAlg, &quot;d%,%d,%0.2f&quot;, conditionOne, conditionTwo, conditionThree);
Here's a tip for future reference

Code:
gcc -W -Wall prog.c

will diagnose incorrect printf()-style and scanf()-style format strings at compile time, along with many other inconsistencies. It may be worth trying that now just to see what it says...
 
thanks - that did in fact point it out, as well as show unused variables (well, one).

Like I said, in terms of programming in C, I'm a newbie, so I'm still getting all the little things down. Been programming for years, but have been spoiled by Perl and Java - now I need the speed of C and am getting used to the fact that one has to be a bit more exact in the way things are handled.

I appreciate the help - this board is a great resource.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top