writing bits to file 1

ADoozer · Nov 18, 2006

having a rough day today so thought id throw out a post for help.

the scenario is this, i have a program that creates datasets, with my original coding the data was outputted as padded char's as such:-

Code:

type 1 data
[byte][byte][byte][byte][byte][byte]
[type][   set1   ][   set2   ][qual]
type 2 data
[byte][byte][byte][byte][byte][byte][byte][byte]
[type][   set1   ][   set2   ][  data1   ][qual]
type 3 data
[byte][byte][byte][byte][byte][byte][byte][byte][byte]
[type][   set1   ][   set2   ][  data1   ][res1][qual]
type 4 data
[byte][byte][byte][byte][byte][byte][byte][byte][byte][byte]
[type][   set1   ][   set2   ][  data1   ][res1][res2][qual]

unfortunately the dataset files are getting absolutely huge (several gig), so i have attempted to remove the padding to form a new structure based on an unsigned long long (since unpadded the max data length is 56 bits) the new structure looks like this:-

Code:

type 1 data
[nibbnibb][nibbnibb][nibbnibb][nibbnibb]
[typset1 ][      se][t2      ][ qu<pad>]
type 2 data
[nibbnibb][nibbnibb][nibbnibb][nibbnibb][nibbnibb][nibbnibb]
[typset1 ][      se][t2      ][ data1  ][        ][qu< pad>]
type 3 data
[nibbnibb][nibbnibb][nibbnibb][nibbnibb][nibbnibb][nibbnibb]
[typset1 ][      se][t2      ][ data1  ][        ][res1  qu] 
type 4 data
[nibbnibb][nibbnibb][nibbnibb][nibbnibb][nibbnibb][nibbnibb][nibbnibb]
[typset1 ][      se][t2      ][ data1  ][        ][res1  re][s2  qu<>]

however im faced with an endian problem when writing the info to file.

Code:

//example type 1
fstream myFile("C:\\SomeFile.txt",ios::out | ios::binary);
ULONGLONG thedata=0x0000000020043300;
myFile.write ((char*)&thedata, 4);
myFile.close();
//when viewed in the file becomes "Hex: 00 33 04 20"
//idealy i need "Hex: 20 04 33 00" so i can retrieve
//the type and determine how many bytes belong to this
//data

i can think of several ways to code around it, but was wondering if anyone knew of a inbuilt function to save me some time.

thanx for any input

If somethings hard to do, its not worth doing - Homer Simpson

too much 49374'ing, im 57005... need 12648430

cpjust · Nov 18, 2006

Does it matter what the data looks like in the file? Is anyone going to read the file themselves or are they always going to use your program to read the data?

If they're only going to use your program, then reading the data back in the same way you wrote it should look normal again, shouldn't it?

Salem · Nov 18, 2006

Something like this perhaps?

Code:

$ cat foo.cpp && g++ foo.cpp && ./a.exe && od -Ax -t x1 test.bin
#include <iostream>
#include <fstream>
using namespace std;

class bitrec {
    int nBits;
    unsigned char bits;
  public:
    bitrec ( ) {
        bits = 0;
        nBits = 7;
    }
    void write ( ofstream &f ) {
        f.write( reinterpret_cast<char*>(&bits), 1 );
        bits = 0;
        nBits = 7;
    }
    void finish ( ofstream &f ) {
        if ( nBits != 7 ) {
            write( f );
        }
    }
    void add ( ofstream &f, unsigned char bit ) {
        bits |= bit << nBits;
        if ( --nBits < 0 ) {
            write( f );
        }
    }
    ofstream &writeBits ( ofstream &f, unsigned long val, int nBits ) {
        for ( int i = nBits-1 ; i >= 0 ; i-- ) {
            unsigned char lsb = (val>>i) & 1;
            add( f, lsb );
        }
        return f;
    }
    ~bitrec ( ) {
        if ( nBits != 7 ) {
            cerr << "Missing call to .finish" << endl;
        }
    }
};

int main ( ) {
    bitrec hold;
    ofstream f("test.bin",ios::binary);
    hold.writeBits( f, 0xA, 4 );
    hold.writeBits( f, 0x55, 8 );
    hold.writeBits( f, 0x3, 2 );
    hold.writeBits( f, 0x3, 2 );
    hold.writeBits( f, 0x3, 2 );
    hold.finish( f );
    f.close();
    return 0;
}

000000 a5 5f c0
000003

--

ADoozer · Nov 18, 2006

cpjust:
let us assume i have a type 1,2,3,4 in a row:-

idealy i want

Code:

Hex: [COLOR=blue]20 04 33 00[/color] [COLOR=red]50 02 00 40 00 C0[/color] [COLOR=green]70 02 00 40 00 83[/color]
Hex: [COLOR=pink]90 02 00 40 00 82 0C[/color]

this way i can read in the first byte and check bits 5/6/7 to find the type (and hence move to the next "record")

if the data is saved as

Code:

Hex: [COLOR=blue]00 33 04 20[/color] [COLOR=red]C0 00 40 00 02 50[/color] [COLOR=green]83 00 40 00 02 70[/color]
Hex: [COLOR=pink]0C 82 00 40 00 02 90[/color]

then i cant read it back, let alone anybody else

Salem:
thats another workaround

which is slightly neater than mine.

If somethings hard to do, its not worth doing - Homer Simpson

too much 49374'ing, im 57005... need 12648430

cpjust · Nov 18, 2006

Could you use another file to keep track of what type of data is stored in each position? Assuming you only have (and only will have) 4 data types, 2 bits would be all you need to represent each data type.

So if you have data in this order:
[type1][type3][type2][type1][type3][type4][type2][type2]...

You could map them like this:
[byte1][byte2]
[0x24] [0xB5]

Of course you'd have to read the whole map up to the location you want, then calculate the offset by adding together the sizes of all the types up to the index you want...
If that's something you'll be doing a lot, maybe inserting hard offset values at specific locations would help so you can skip ahead without reading the whole map file.

ADoozer · Nov 19, 2006

there are 6 data types in all hence the 3 bits.

As for keeping a second file with type information, (considering an average dataset holds circa 600 million records) i think it will be easier to write and read from 1 file only (with respect to file operations im not sure of the overheads regarding reading/writing 1 file vs reading/writing 2 files)

If somethings hard to do, its not worth doing - Homer Simpson

too much 49374'ing, im 57005... need 12648430

cpjust · Nov 19, 2006

Wouldn't that make searching through the file for a specific record extremely time consuming?
What about having 1 file for each data type? That way all the record sizes are equal in each file and going to a specific record can be done in one step. It would also allow you to delete a record and re-use its location for a new record...

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

writing bits to file 1

ADoozer

Programmer

cpjust

Programmer

Salem

Programmer

ADoozer

Programmer

cpjust

Programmer

ADoozer

Programmer

cpjust

Programmer

Similar threads

Part and Inventory Search

Sponsor