Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Processing multi-column data files 1

Status
Not open for further replies.

fabien

Technical User
Sep 25, 2001
299
AU
Hi!

I have written a C program that takes a file with 4 columns of data as input: X Y T V and do some coordinate transformation on the X, Ys.

In the program I read the data as follows
// get the entire line
fin2.getline(line,sizeof(line));
strcpy(line1,line);

data1=strtok(line, " ");
data2=strtok(NULL, " ");
data3=strtok(NULL, " ");
data4=strtok(NULL, " ");
data5=strtok(NULL, " ");


// check if line starts with # or blank line then just write it out

if (data1==NULL)
{
// case blank line
fprintf(fout,"%s\n"," ");

} else if (data1[0]=='#') {
// writing data out
fprintf(fout,"%s\n",line1);
} else {
do the processing....

Because I have some blank lines and some lines starting with # I have to copy them to the output file

The processing I do is only on the X Y so the first two columns always, the T and V columns are being copied to the output file

Finally my question is how can I make a general program that will read a file with any number of columns (3,4,5..)
without the user specifying them,capture the X, Y and do processing on them with function(X,Y) and then merely copy the other columns straight to the output with the same format

Many thanks
 
Is it a C program? See your line #2:
Code:
fin2.getline(line,sizeof(line));
Well, in C++ try this (working;) improvisation:
Code:
void CvtXY(istream& in, ostream& out)
{
  string line, tail;
  int x, y;

  while (getline(in,line)) // not eof...
  {
    if (line.length() >= 3)
    {
      istrstream s(line.c_str(),line.size());
      if (s >> x && s >> y) // we have 2 integers (X, Y?)
      {
        getline(s,tail);
        // Process X, Y as you wish, then (e.g. + 1000)...
        out << x+1000 << " " << y+1000 << tail << endl;
        continue; // Sorry, structured programming funs...
      }
    }
    out << line << endl;
  }
}
Of course, in true C we need strtok(). I think, it's not so hard to transform the C++ alg above...
 
A C answer
Code:
int main ( ) {
  char buff[BUFSIZ];
  while ( fgets( buff, BUFSIZ, stdin ) != NULL ) {
    int x, y, n;
    if ( sscanf( buff, "%d%d%n", &x, &y, &n ) == 2 ) {
      fprintf( stdout, "%d %d%s", x*2, y*2, &buff[n] );
    } else {
      /* some other kind of line, just print it */
      fputs( buff, stdout );
    }
  }
  return 0;
}
Just replace the stdin and stdout with suitable FILE* variables of your choice.

But it does make it easy to test by simply running the program and typing in example lines from your data file.


--
 
Thanks guys. Salem: I am trying to implement your solution so I wrote the following:
char line[1000]=" ";
char line1[1000]=" ";
char functionname[200]= " ";

for (i=0;i<nlines;i++) {

printf("%d\n",i);
/* if (i==(n*d)) {printf("%d%% completed\r",n*d); fflush(stdout); d++;}*/
// get the entire line
fgets( line, 1000, finput );
strcpy(line1,line);
data1=strtok(line, " ");
// check if line starts with # or blank line then just write it out

if (data1==NULL) {
// case blank line
fprintf(fout,"%s\n"," ");

} else if (data1[0]=='#') {
// writing data out
fprintf(fout,"%s\n",line1);

} else if (data1[0] == 'F') {
printf("toto\n");
/* case AVF format */
sscanf( line1, "%s%f%f%n", functionname, &xin, &yin, &t );
// write the output
printf("toto\n");
printf("%s%d %d%s\n", functionname, xin*2, yin*2, &line1[t] );
printf("toto\n");
fprintf( fout, "%s%d %d%s", functionname, xin*2, yin*2, &line1[t] );


} else {
/* case X Y .. format */
sscanf( line1, "%f%f%n", &xin, &yin, &t );
fprintf( fout, "%d %d%s", xin*2, yin*2, &line1[t] );

}


}

But I can a segmentation fault with the following file
#
#FIELDS = Function ID, X, Y, Time, Vave
#FUNCTION_TYPE = TVave
#LINEAR_UNITS = FEET
#DATUM = 0.000000
#
Function1 385000.00 3450000.00 0.0000 6547.0977
Function1 385000.00 3450000.00 9.6197 6547.0977
Function1 385000.00 3450000.00 19.6197 6547.8047
Function1 385000.00 3450000.00 29.6197 6549.6675
Function1 385000.00 3450000.00 39.6197 6548.8813

Note that in this specific case I have an additional format "char X Y Z T" which I did not mention before. That is what I test in the code above.

 
Post an example of each type of line in your file


--
 
The data files will be eiher
#
#FIELDS = Function ID, X, Y, Time, Vave
#FUNCTION_TYPE = TVave
#LINEAR_UNITS = FEET
#DATUM = 0.000000
#
Function1 385000.00 3450000.00 0.0000 6547.0977
Function1 385000.00 3450000.00 9.6197 6547.0977
Function1 385000.00 3450000.00 19.6197 6547.8047
Function1 385000.00 3450000.00 29.6197 6549.6675
Function1 385000.00 3450000.00 39.6197 6548.8813


This is a special format which includes some comments with # at the beginning
the Char, X, Y, float, float

or
385000.00 3450000.00 0.0000
385000.00 3450000.00 9.6197
385000.00 3450000.00 19.6197

(no comments 3 columns X,Y,Z all floats)
or

385000.00 3450000.00 0.0000 6547.0977
385000.00 3450000.00 9.6197 6547.0977
385000.00 3450000.00 19.6197 6547.8047

(no comments 3 columns X,Y,Z,V all floats)

but I also want to cover the case where X,Y.. any number of columns.

Many thanks!

 
Maybe this
Code:
int main ( ) {
  char buff[BUFSIZ];
  while ( fgets( buff, BUFSIZ, stdin ) != NULL ) {
    char func[BUFSIZ];
    int x, y, n;
    if ( buff[0] == '#' ) {
      /* comment line */
      fputs( buff, stdout );
    } else
    if ( sscanf( buff, "%d%d%n", &x, &y, &n ) == 2 ) {
      printf( "%d %d%s", x*2, y*2, &buff[n] );
    } else
    if ( sscanf( buff, "%s%d%d%n", func, &x, &y, &n ) == 3 ) {
      printf( "%s %d %d%s", func, x*2, y*2, &buff[n] );
    } else {
      /* some other kind of line, just print it */
      fputs( buff, stdout );
    }
  }
  return 0;
}
With sscanf(), you can have several attempts at deciding what is in the buffer without actually changing the buffer.


--
 
Still no luck!

I replaced your code with

char buff[1000];
while ( fgets( buff, 1000, finput ) != NULL ) {
char func[1000];
int x, y;
int n;
if ( buff[0] == '#' ) {
/* comment line */
fputs( buff, stdout );
printf("comment\n");
} else if ( sscanf( buff, "%d%d%n", &x, &y, &n ) == 2 ) {
printf("format1\n");
printf( "%d %d%s", x*2, y*2, &buff[n] );
} else if ( sscanf( buff, "%s%d%d%n", func, &x, &y, &n ) == 3 ) {
printf( "%s %d %d%s", func, x*2, y*2, &buff[n] );
printf("format2\n");
} else {
/* some other kind of line, just print it */
fputs( buff, stdout );
printf("other\n");
}
}


and when I run it on the datafile which is in format2 I described above I either get "comment" (#) or "other" format.

Note that I replaced BUFSIZ with 1000, since my data are floats I tried replacing %d with %f same result.
 
Sorry, I missed all those doubles in my last post. You're right to change those %d to %f

Code:
#include<stdio.h>

void parse ( char *buff ) {
    char func[BUFSIZ];
    int    n, r1 = 0, r2 = 0;
    double x, y;
    printf( "%s", buff );
    if ( buff[0] == '#' ) {
        /* comment line */
        fputs( buff, stdout );
    } else
    if ( (r1=sscanf( buff, "%lf %lf %n", &x, &y, &n )) == 2 ) {
        printf( "%f %f %s", x*2, y*2, &buff[n] );
    } else
    if ( (r2=sscanf( buff, "%s %lf %lf %n", func, &x, &y, &n )) == 3 ) {
        printf( "%s %f %f %s", func, x*2, y*2, &buff[n] );
    } else {
        /* some other kind of line, just print it */
        fputs( buff, stdout );
    }
    printf( "scan results are %d %d\n", r1, r2 );
}

int main ( ) {
    char    t1[] = "#DATUM = 0.000000\n";
    char    t2[] = "      Function1    385000.00   3450000.00       0.0000    6547.0977\n";
    char    t3[] = "   385000.00   3450000.00       0.0000\n";
    parse(t1);
    parse(t2);
    parse(t3);
    return 0;
}
Try running this test program.
This is the output I get
Code:
#DATUM = 0.000000
#DATUM = 0.000000
scan results are 0 0
      Function1    385000.00   3450000.00       0.0000    6547.0977
Function1 770000.000000 6900000.000000 0.0000    6547.0977
scan results are 0 3
   385000.00   3450000.00       0.0000
770000.000000 6900000.000000 0.0000
scan results are 2 0

> Note that I replaced BUFSIZ with 1000
Unless you know BUFSIZ is very small (likr 512) on your system, and the lines are very long, I'd suggest you stick with BUFSIZ.

--
 
Hi Salem!

This worked, many thanks!!

JUst a precision on sscanf( buff, "%s %lf %lf %n", func, &x, &y, &n )) == 3 how does this work exactly?

 
> how does this work exactly?
Not sure what you mean.

There are a couple of parts which bypass the casual sscanf() user.
The first is the return result, which is the number of successful "conversions and assignments". This tells you how many data items were converted.
n = sscanf(buff,"%d %d", &x, &y );
if n was 1 say, then you would also know that y didn't have a proper value.

The %n is unique, since it measures progress through the string. It is an assignment (but not a conversion), so it doesn't increment the result count. It allows you to step through a string using several sscanf() calls, or in the case above to figure out where the start of the rest of the line is.



--
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top