Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

writing bits to file 1

Status
Not open for further replies.

ADoozer

Programmer
Dec 15, 2002
3,487
AU
having a rough day today so thought id throw out a post for help.

the scenario is this, i have a program that creates datasets, with my original coding the data was outputted as padded char's as such:-

Code:
type 1 data
[byte][byte][byte][byte][byte][byte]
[type][   set1   ][   set2   ][qual]
type 2 data
[byte][byte][byte][byte][byte][byte][byte][byte]
[type][   set1   ][   set2   ][  data1   ][qual]
type 3 data
[byte][byte][byte][byte][byte][byte][byte][byte][byte]
[type][   set1   ][   set2   ][  data1   ][res1][qual]
type 4 data
[byte][byte][byte][byte][byte][byte][byte][byte][byte][byte]
[type][   set1   ][   set2   ][  data1   ][res1][res2][qual]

unfortunately the dataset files are getting absolutely huge (several gig), so i have attempted to remove the padding to form a new structure based on an unsigned long long (since unpadded the max data length is 56 bits) the new structure looks like this:-

Code:
type 1 data
[nibbnibb][nibbnibb][nibbnibb][nibbnibb]
[typset1 ][      se][t2      ][ qu<pad>]
type 2 data
[nibbnibb][nibbnibb][nibbnibb][nibbnibb][nibbnibb][nibbnibb]
[typset1 ][      se][t2      ][ data1  ][        ][qu< pad>]
type 3 data
[nibbnibb][nibbnibb][nibbnibb][nibbnibb][nibbnibb][nibbnibb]
[typset1 ][      se][t2      ][ data1  ][        ][res1  qu] 
type 4 data
[nibbnibb][nibbnibb][nibbnibb][nibbnibb][nibbnibb][nibbnibb][nibbnibb]
[typset1 ][      se][t2      ][ data1  ][        ][res1  re][s2  qu<>]

however im faced with an endian problem when writing the info to file.

Code:
//example type 1
fstream myFile("C:\\SomeFile.txt",ios::out | ios::binary);
ULONGLONG thedata=0x0000000020043300;
myFile.write ((char*)&thedata, 4);
myFile.close();
//when viewed in the file becomes "Hex: 00 33 04 20"
//idealy i need "Hex: 20 04 33 00" so i can retrieve
//the type and determine how many bytes belong to this
//data

i can think of several ways to code around it, but was wondering if anyone knew of a inbuilt function to save me some time.

thanx for any input

If somethings hard to do, its not worth doing - Homer Simpson

too much 49374'ing, im 57005... need 12648430
 
Does it matter what the data looks like in the file? Is anyone going to read the file themselves or are they always going to use your program to read the data?

If they're only going to use your program, then reading the data back in the same way you wrote it should look normal again, shouldn't it?
 
Something like this perhaps?
Code:
$ cat foo.cpp && g++ foo.cpp && ./a.exe && od -Ax -t x1 test.bin
#include <iostream>
#include <fstream>
using namespace std;

class bitrec {
    int nBits;
    unsigned char bits;
  public:
    bitrec ( ) {
        bits = 0;
        nBits = 7;
    }
    void write ( ofstream &f ) {
        f.write( reinterpret_cast<char*>(&bits), 1 );
        bits = 0;
        nBits = 7;
    }
    void finish ( ofstream &f ) {
        if ( nBits != 7 ) {
            write( f );
        }
    }
    void add ( ofstream &f, unsigned char bit ) {
        bits |= bit << nBits;
        if ( --nBits < 0 ) {
            write( f );
        }
    }
    ofstream &writeBits ( ofstream &f, unsigned long val, int nBits ) {
        for ( int i = nBits-1 ; i >= 0 ; i-- ) {
            unsigned char lsb = (val>>i) & 1;
            add( f, lsb );
        }
        return f;
    }
    ~bitrec ( ) {
        if ( nBits != 7 ) {
            cerr << "Missing call to .finish" << endl;
        }
    }
};

int main ( ) {
    bitrec hold;
    ofstream f("test.bin",ios::binary);
    hold.writeBits( f, 0xA, 4 );
    hold.writeBits( f, 0x55, 8 );
    hold.writeBits( f, 0x3, 2 );
    hold.writeBits( f, 0x3, 2 );
    hold.writeBits( f, 0x3, 2 );
    hold.finish( f );
    f.close();
    return 0;
}

000000 a5 5f c0
000003

--
 
cpjust:
let us assume i have a type 1,2,3,4 in a row:-

idealy i want

Code:
Hex: [COLOR=blue]20 04 33 00[/color] [COLOR=red]50 02 00 40 00 C0[/color] [COLOR=green]70 02 00 40 00 83[/color]
Hex: [COLOR=pink]90 02 00 40 00 82 0C[/color]

this way i can read in the first byte and check bits 5/6/7 to find the type (and hence move to the next "record")

if the data is saved as

Code:
Hex: [COLOR=blue]00 33 04 20[/color] [COLOR=red]C0 00 40 00 02 50[/color] [COLOR=green]83 00 40 00 02 70[/color]
Hex: [COLOR=pink]0C 82 00 40 00 02 90[/color]

then i cant read it back, let alone anybody else

Salem:
thats another workaround :) which is slightly neater than mine.

If somethings hard to do, its not worth doing - Homer Simpson

too much 49374'ing, im 57005... need 12648430
 
Could you use another file to keep track of what type of data is stored in each position? Assuming you only have (and only will have) 4 data types, 2 bits would be all you need to represent each data type.

So if you have data in this order:
[type1][type3][type2][type1][type3][type4][type2][type2]...

You could map them like this:
[byte1][byte2]
[0x24] [0xB5]

Of course you'd have to read the whole map up to the location you want, then calculate the offset by adding together the sizes of all the types up to the index you want...
If that's something you'll be doing a lot, maybe inserting hard offset values at specific locations would help so you can skip ahead without reading the whole map file.
 
there are 6 data types in all hence the 3 bits.

As for keeping a second file with type information, (considering an average dataset holds circa 600 million records) i think it will be easier to write and read from 1 file only (with respect to file operations im not sure of the overheads regarding reading/writing 1 file vs reading/writing 2 files)

If somethings hard to do, its not worth doing - Homer Simpson

too much 49374'ing, im 57005... need 12648430
 
Wouldn't that make searching through the file for a specific record extremely time consuming?
What about having 1 file for each data type? That way all the record sizes are equal in each file and going to a specific record can be done in one step. It would also allow you to delete a record and re-use its location for a new record...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top