Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parse Tag Value ascii file

Status
Not open for further replies.

zephan

Programmer
Jan 14, 2002
217
A2
Hi All,
I have got a file with the following format :
tag1 value1 tag2 value 2 tag3 value3 (end of record)
tag4 value4 ...... tag10 value10 (end of record)
....
end of file

I wrote a perl script that parses this file, fine but slow.

How can I emulate perl's split function in C, to split each line of this file base on a separator (space in this case). Or how can I integrate perl in c code (HP UX) or anyone has a suggestion for me ? I'm not that familiar with C and it's IO functions.
Thanx
 
> I wrote a perl script that parses this file, fine but slow.
How slow - say 10 seconds for 100K lines?

perl itself is written in C, and is pretty efficient at doing regular stuff like splitting things into fields (and taking care of all the memory allocation you would have to do in your own C program).

What else do you do with the fields once you have extracted them?

Reading files is inherently slow (disk seek times are in milliseconds, which is 1,000,000 times slower than your processor clock speed).

Code:
#include <stdio.h>
/* just a raw - how long to read a file - no processing each line */
int main ( int argc, char *argv[] ) {
  char buff[BUFSIZ];
  FILE *fp = fopen ( argv[1], "r" );
  if ( fp != NULL ) {
    while ( fgets( buff, BUFSIZ, fp ) != NULL );
    fclose( fp );
  }
  return 0;
}
Compile this program and compare times with
time ./a.out datafile.txt
time ./myperlprog datafile.txt > results.txt

Unless simply reading the file takes like 10% to 20% of the time it takes perl to do the whole lot, then rewriting it all in C isn't going to improve things that much.

Eg - 'word count' with a redundant split, compared with just reading a file and the 'wc' program itself
Code:
$ time ./a.out test1.txt

real    0m0.013s
user    0m0.002s
sys     0m0.003s
$ time perl -e 'my $sum=0;while(<>){split;$sum++;}print "$sum\n";' test1.txt > tmp

real    0m0.037s
user    0m0.020s
sys     0m0.004s
$ time wc -l test1.txt > tmp

real    0m0.016s
user    0m0.002s
sys     0m0.001s

--
 
Since I've not enough samples I put c program and perl one into a perl script that run each one 200 times.
Here are there results in c one :
Code:
time perl testc.pl > c.txt
real        1.2
user        0.4
sys         0.5

and for perl one

Code:
time perl testc.pl > c.txt
real        2.8
user        1.9
sys         0.6


Both programs do nothing but read in a text file tagged values sort them and put them in a csv.

You're right the perl script I wrote reads each time a parameter file, wich I hard coded in c program. I'll give more attention to IOs when comparing performance.

Back to my question: strtok did the job for splitting the blank delimited values.

Thanks
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top