Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Problem converting ASCII file to binary

Status
Not open for further replies.

mattlacey

Programmer
Oct 11, 2001
22
0
0
GB
I have a very large (half a million plus records) data file that is made up of records of 25 text characters.

3 sample records
----------------
AB10 043SAA1011NYYYYYN?
AB103 043SAA1011NYYYYYN?
AB103A 043SAA1011NYYYYYN?
(displayed on seperate rows for ease of viewing but actual file does not contain carriage returns)

Because this file is very large (10Mb+) I wish to compact it to make it easier to distribute (possibly looking at weekly updates to 5000 users). To do this I wish to turn it into a binary file. This will also have the added bonus of making the file unreadable to the human eye.

My problem is that the my output file is identical to the input one. Same size and also readable in notepad.

Can anyone spot what I'm doing wrong and point me in the right direction.

Alternatively, any suggestions for making the file smaller but not reducing the time it takes to read would be welcome.

Putting the data in a database is not possible as the program using the data has to run on windows AND *nix and we don't want to have to support multiple versions of the data.


I've spent the last three days on this and searching all over the web and am posting here in desperation.


Code below
***************************************

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct gazrec GAZ;

struct gazrec {
char details[25];
};

int main(void) {
char c[25];
FILE *file;
GAZ *gazrec = NULL;
int reccnt = 0;
int i;
FILE *fp;

file = fopen(&quot;test.asc&quot;, &quot;r&quot;); /*open text file*/

if(file==NULL){
printf(&quot;Error: can't open file.\n&quot;);
return 1;
} else {
while(fgets(c, 26, file)!=NULL) { /*get (size of structure) bytes from file*/
gazrec = realloc(gazrec, (reccnt+1)*sizeof(GAZ)); /*increase size of array to include room for new record*/
for(i=0 ; i<25 ; i++) { /*copy data from file into array*/
gazrec[reccnt].details = c;
}
printf(&quot;'%s' \n&quot;, gazrec[reccnt].details); /*display current record so can see something happening*/
reccnt++;
}

fclose(file); /*close text file*/

fp = fopen( &quot;test.bin&quot;, &quot;wb&quot; ); /*open binary file*/
fwrite( gazrec, sizeof(GAZ), reccnt, fp ); /*write array to binary file*/
fclose( fp ); /*close binary file*/

if(gazrec!=NULL) { /*free memory allocated to array*/
free(gazrec);
}

return 0;
}
}
 
>To do this I wish to turn it into a binary file

And how would that make it considerably more compact?
The difference would only be (I guess) that for every byte
'0'..'9', 'A'..'Z', etc
you'd just store another byte
0..9, theChar-'A', etc

Compacting data is a whole science to itself, how about just zip the lot.


/Per
[sub]
if (typos) cout << &quot;My fingers are faster than my brain. Sorry for the typos.&quot;;
[/sub]
 
Although you're opening the source and destination files in different modes, you are not converting the data. The character 'A' (decimal 65, IIRC) is the same regardless of the file mode you use. For example, if you read the characters &quot;AB10&quot; (65, 66, 31, 30) from a text file, and then write out 65, 66, 31, 30 (&quot;AB10&quot;) to a decimal file, you've changed nothing.

You need to convert your data from character arrays to different data types. For example, I notice the characters &quot;NYYYYYN&quot; in each record. If these are boolean values, you could convert them into a single-byte bitfield (00111110). Also, numeric values could be stored as integers.

I hope this helps,

Jason
 
A question and a quick point..

gazrec = realloc(gazrec, (reccnt+1)*sizeof(GAZ)); /*increase size of array to include room for new record*/

You run this operation for every record in your data set. If your data file is purely 25 char * n records, get the sizeof the file in bytes, and size your array once ? (file size / sizeof struct) ?

This should speed up your code.

If you are concerned about people viewing your data, and you have access to the source code of the windows and unix aps, could you not just add a simple value (or subtract) to each char as the binary file is converted, and then reverse the operation when it is installed into the other application ? (will protect you from causal browsing anyway)

Compression is a different ball game, as without knowing more of your dataset, and what things represent it will be hard to advise you.

HTH,

K
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top