Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

using a 32 bit number read from a data file 2

Status
Not open for further replies.

Ghodmode

Programmer
Feb 17, 2004
177
NZ
I'm reading a data file and four of the fields are supposed to be a 32-bit, little-endian, unsigned, integer identifying the number of records in this data file.

If I look at the file in my hex editor, I can see that the "Unsigned 32-bit" value of the first of these 4 bytes is 64745. That is the number of records in the file, so I know I'm looking in the right place.

As you'll see in my code below, I am trying to print the value. This isn't strictly necessary, but I do need to use it in a for loop later in the program.

Basically, I'm reading the file like this...
Code:
long c;

while ( (c = getchar()) != EOF ) {
    /* Read the header */
    for ( i = 0; i < 32; i++ ) {
        /* ... */
        if ( (i >= 4) && (i <= 7) ) {
            printf( "Record count(%d): %i\n", i, c );
            printf( "Record count(%d): %d\n", i, c );
            printf( "Record count(%d): %g\n", i, c );
            printf( "Record count(%d): %f\n", i, c );
            printf( "Record count(%d): %x\n", i, c );
        }
        /* ... */
    }
/* do stuff with the data */
}
/* ... */

So, I'm not really getting anything useful from my [tt]printf[/tt]s, but I realize that may be a limitation of [tt]printf[/tt].

The math is my weakest area. I don't understand what "little-endian" is and my understanding of hexadecimal is pretty limited, too. I know that the right-most digit is multiplied by 1, the next one by 16, etc... I've read K&R and a few other books, but there are some parts I just don't understand.

How do I use the value effectively?

Thank you

--
-- GhodMode
 
1. getchar() reads a char from stdin (from the console or from the file redirected to console by < in cmd). Is it your data file? Aprppos, if you have a task like that (parse external files) you must learn more about internal data representation (it's not a math, it's hardware issue;).
2. Suppose we have an opened C stream:
Code:
unsigned int c; /* if sizeof(int) == 4 on your comp */
FILE* f;
f = fopen("mydatafilename","rb"); /* in binary mode */
if (!f) /* Never forget error handling */
{
   printf("*** Can\'t open file.\n");
   exit(1);
}
if (fread(&c,4,1,f) != 1)
{
   printf("*** Can't read data.\n");
   exit(2);
}
printf("%u\n",c);
/* and so on, for example, all 4 numbers: */
unsigned cc[4];
fread(cc,16,4,f); /* cc[0], cc[1] etc */
Test your comp:
Code:
unsigned c = 0xadde;
const unsigned char* p = (const unsigned char*)&c;
printf("sizeof(c) == %d; %2x%2x\n",
       (int)sizeof(c),p[0],p[1]);
If you can see
sizeof(c) == 4 dead
then you have little-endian comp with a proper int size.
Now you may use fread() as in my snippet above...
 
Why don't you just "read" 4 bytes from the stream and store it in a long by using the address of the long to hold the data?
If you need to reverse it then you might like to create a union of one long and a 4 char array to allow you to byte swap.
You could even read 1 byte at a time (maybe with your getchar) into the bytes of the union.

Code:
#include <stdio.h>

long get_long();

int main()
{
  printf("Value is %d\n", get_long());
  return 0;
}

long get_long()
{
  union number {
    long value;
    char bytes[4];
  } number;
  number.bytes[0] = getchar();
  number.bytes[1] = getchar();
  number.bytes[2] = getchar();
  number.bytes[3] = getchar();
  return number.value;
}

If the value is the wrong way round simply reverse the sequence you store the bytes.
e.g.
Code:
  number.bytes[3] = getchar();
  number.bytes[2] = getchar();
  number.bytes[1] = getchar();
  number.bytes[0] = getchar();



Trojan.
 
Thank you both for your replies so far...

ArkM:
I knew about most of the file processing details. Actually, right now, I'm testing with [tt]cat file.dat | program [/tt]. I was going to do the proper file processing code later.

I should've realized that I actually would need all 4 bytes to account for the record count in all cases.

I will have to test this on a few different computers to be certain. I work on Linux, but I use vmware to test for Win98 and XP. My customer is using Win98 and XP at his sites.


TrojanWarBlade:
I think I can use your solution immediately. Is a union similar to a struct? Feel free to answer RTFM :) I literally skipped that chapter(6). I figured I could use separate arrays until I wasn't so pressed for time :)

I think ArkM's is actually closer to what I need to end up with, but you've both pointed out my basic problem: that I'm not reading or using all of the bytes needed.


My problem comes up because I have to write a console program for a Win98 computer with an unusual configuration. It doesn't actually have Windows on it, it boots straight into the command prompt (after showing the perty Win98 logo screen). So, it's basically something like DOS 7. It has no CD-ROM drive. So, I decided it would be best to avoid installing things. I'm just creating this small program to extract data from a file and turn it into a CSV.

I'm proof that you can't really learn to program just by reading the books. I read constantly, but I get stumped by things like this that I would probably get if I had some experience. I usually work with LAMP... So, mostly interpreted languages.

Thank you for your help. I think you've provided me with the solution.

--
-- Ghodmode
 
A union is like a struct but instead of all the members occupying sequential space, they share the same space.
The union I showed you gives you the ability to manage each byte of a long individually.



Trojan.
 
The union does not solve little/big-endian problem. Moreover, strictly speaking it's possible case (in some standard-compliant implementations) that a char occupied the same storage as a long.
Ghodmode's problem is not OS-specific, it's platform-specific one. For example, all x86 (with Linux, MS-DOS or Windows or what else) are little-endian etc...
In that case long* or char* mapped on long (with unions) are the same (because of we want void*;)...
 
That's all true but somewhat unrealistic.
The union does allow you to reverse the mapping simply by storing the bytes in reverse order. Indeed, it is perfectly possible to create 2 functions, one for big endian and one for little endian. Then you could pick and choose at will. That should have been blindingly obvious.
The byte width issue is true but very unlikely to be a problem. Bytes of 8 bits wide are the defacto standard and all PCs and most modern systems use 8 bit bytes.
As you can tell from Ghodmode's posts, he has 8 bit bytes so your points are mute.
Why try to overcomplicate an issue when the guy just wants a solution and help?
If you are trying to score points over me then you've completely lost the point of these forums. We are here to help people solve their problems.


Trojan.
 
Sorry, TrojanWarBlade,
but I can't understand (and accept) your words about score points over me, unrealistic, overcomplicate an issue etc.
Byte size is the other matter. No any bytes in C language, char is not a byte alias. Remember bit fields in C structures: the language does not define its bit alignment in a machine word, so we have an example of (almost) useless concept in this wonderful language. An implementation may allocate 32-bit (or 64-bit;) word for char, and so on.
The only portable way to deal with so unportable things as little/big endians is mask/shift approach.
I think we all work with The C language, and we may study it more deeply than atobvious level.
 
As I pointed out.
I agree about the bytes and char issue. I agree that a byte may not be 8 bit and that a char may not be 8 bits and indeed that a bte and char may not be the same thing.
I stand by the point though that in general it IS the case.
Also, you make points about portability and C is certainly not the best language to write portable code with! This thread demonstrates that point.
My observations are that this guy wants to read data from a file and I have tried to help him do that. I have not put theoretical arguments in the way to demonstrate what a futile task it would be. This seems to be exactly what you are trying to do.
If Ghodmode explains that he needs this to run on all platforms including 5 bit microcontroller platforms then I will hapilly retract my comments and suggestions.
If, however, he simply wants to use a normal, run of the mill, 32 bit (8 bit byte) platform to read this data then I stand by the point that you are completely clouding the issue.


Trojan.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top