Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

parsing Record

Status
Not open for further replies.

logic4fun

Programmer
Apr 24, 2003
85
0
0
US
Hi all,
I have huge data files with different line legths and trying to load into tables. I would like to write a generic function which i can read through file. and as i said..it deals with huge quatity of data.

here is what i wrote but am running into problems..when i get Two commas at a time.

My function is as follows :
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
void parseIt()
{
char string[1028];
char *token[120];
char fname[128];
int a,b,i;
gargc = argc;
gargv = argv;

memset(fname,0,sizeof(fname));
sprintf(fname,"%s/%s",gargv[1],gargv[2]);

if((fp1 = fopen(fname, "r"))==NULL) {
fprintf(stdout,"CAnnot Open File \n");
exit(1);
}

while(fgets(string,1027,fp1) != NULL) {

for (i = 0; i< strlen(string) ; i++) {
if(string == '\n') {
string = '\0';
}
}

a=0;

token[a++] = strtok(string,&quot;,&quot;);

while(token[a++] = strtok(NULL,&quot;,&quot;)) ;
a--;


if(debugflag) {
for(b=0;b<a;b++)
printf(&quot;%i: \&quot;%s\&quot;\n&quot;,b+1,token);
}
}
}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

and sample file is

A,123,24,hello
A23,Howrya
Junk,Junk1,,junk2,,junk3


It all goes fine for the first 2 lines but for the third line it parses out sure but i want empty or null in that place..but i get 4 fields instead for 6 fields. a zero for NUll also would do..infact it is cool( if we have a way to do it in function)..All the above is an effort to rewrite a chunk of code which uses &quot;sscanf&quot; and runs into lot of buffer overflow problems.

I believe i have to use strrchr but am not sure..

Any suggestions..in the above chunk

thanks in advance
Logic4fun
 
Hey there, you need to click the link for Process TGML in the Step 2 Options of row of the Your Reply table on this page. When you post code you either don’t want to use the letter i as a array index variable or the letter b since they create TGML elements. Or alternatively you can post your code inside at TGML code element.

-pete
 
Hi:

The strtok function rolls two delimiters in a row into one. It looks like you'd like to have it deliver a null in your array.

In the past, I posted a replacement for strtok. You can view it here:

thread205-294161

Regards,

Ed


 
Hi all,
The above question also runs me into someOther problems. As i said actually i need to parse a record and I dont know the legths of the individual fieds. Right now i am using SSCANF to take into variables. alternatively..i can use something else..whatever u suggest to take into variables but..MY QUESTION is how to decide the length of each element. Coz when the length of the element exceeds my given size..it thorows me a memSEG errors and comes out. I want the process to continue.. is there any way to Check for it.

for eg.

char blah[20]

when i assign something to the above which is more than 20 it throws memSEG error..but how should i dynamically check the size of the element. coz i dont know what is the size of element.

My brain hurts.

Any suggestions..for both of my above questions...:-(

thanks
logic4fun
 
Logic4Fun,
From the code posted earlier it appears as if you assume that a line will have a maximum of 1027 chars. If there is more than 1027 chars in the line the code will treat it as more than one line. This may not be what you want.

A second thing is that fgets reads 1 less byte than what is passed to it as the second parameter. This means that if you have fgets(string,1027,fp1) fgets will read up to 1027-1=1026 bytes.

If the string contains a '\n' it will always be the last char before the NULL. This means that you can check for it the following way :
Code:
if (string[strlen(string)-1] == '\n')
   string[strlen(string)-1] = 0;
This is a lot better than to cycle through the whole string.

If I have some time today i'll try to code it.
 
logic4fun,

Did you modeled parsing the data before you started writing your code? It sounds like maybe you took off writing code before you even thought about how to do it. If you did, that would account for most of your problems. It’s difficult enough to write code without introducing the added complexity of writing code when you don’t even know what it is supposed to do.

Failing to plan is planning to fail [wink]


-pete
 
Palbano,
Yeah its actually a code which is running in production for years(even before i started coding) which was compiled with a XLC on an AIX box. there it was fine and it never used to throw any seg errors. Problem starts of all these buffer overflows when it has to run on LINUX box with a GCC compiler.

All the code consists of SSCANF function usage to take data into different variables and when there is a junk data for that variable which is greater than size of that it bums out with a seg error. So i am planning to write a replacement for that SSCANF chunk.
 
>> Problem starts of all these buffer overflows when it has
>> to run on LINUX box with a GCC compiler.

So the sscanf() result in seg fault errors? Since i believe that sscanf() works correctly i would first assume that the original code is incorrect and that the old compiler/system did some correcting to protect the innocent [lol]

I went back and re-read the thread and i do not see any code posted that uses sscanf(). It is difficult to help with so little information so please don’t take anything I say the wrong way. Perhaps you took a wrong turn somewhere and there is actually a smaller simpler solution regarding the original codes use of sscanf()? My first guess would be that the memory allocated for writing to using sscanf() is insufficient. This could mean there is a simple solution to this problem.

Or I could be completely misunderstanding everything you have posted [lol]


-pete
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top