Which object? 2

tgreer · Jan 5, 2006

I'm starting a new project, and trying to come up with the best way to store some data.

I'm processing large binary files. These are print-streams. Each file will contain many documents, and each document will contain a differing number of pages. At some point I'll need to re-sequence documents and/or pages. That's no problem.

I need a class that contains lots of information about this file, including the

1) Byte offset of each document, plus it's sequence in the file. I need to know where document 1 starts, document 16, etc.

2) Byte offset of each page, and its sequence within a document.

What I really need is an ISAM structure, does C# have one?

Barring that, what controls/types would you use to store this data?

I thought of having two dictionaries, one for the documents, and one for the pages. The dictionaries would be <string,int>, with the string being sequence number. For the pages, that would be a sequence number containing the document sequence number:

"0000000001-0000000001" = page 1 of document 1.

I also need to be able to serialize/de-serialize this data.

Are dictionaries the ticket?

Thomas D. Greer

http://www.tgreer.com

chiph · Jan 5, 2006

1) Byte offset of each document, plus it's sequence in the file. I need to know where document 1 starts, document 16, etc.

The requirement that order be maintained says ArrayList or List<> to me. In the List I would store some objects which describe the document -- byte offset, length, etc.

2) Byte offset of each page, and its sequence within a document.

Again, the requirement that order be maintained says to me to use an ArrayList or List<> to store this info. I'd create an object to store info about the page, such as the byte offset (within the file, or within the document??).

You would then have each document object contain your page list.

By marking those classes as Serializable, it'll be pretty easy to serialize/deserialize them.

Chip H.

____________________________________________________________________
Donate to Katrina relief:

http://s1.amazon.com/paypage/PELYGQVJ8Q7IB/103-6821258-5919825

If you want to get the best response to a question, please read FAQ222-2244 first

tgreer · Jan 5, 2006

I don't have to maintain sequence, per se. I need to be able to get any arbitrary document, or any arbitrary page within the document or file.

I'll look at List<>, thanks for the suggestion.

Thomas D. Greer

http://www.tgreer.com

tgreer · Jan 6, 2006

A follow-up question then would be, given either a specific key or value for a dictionary item, is it possible to determine the index of that entry?

In other words, given:

_dictionary["somekey"] is the 1st item in the dictionary, how would you return "0"?

Thomas D. Greer

http://www.tgreer.com

chiph · Jan 6, 2006

The IndexOf method.

Chip H.

____________________________________________________________________
Donate to Katrina relief:

http://s1.amazon.com/paypage/PELYGQVJ8Q7IB/103-6821258-5919825

If you want to get the best response to a question, please read FAQ222-2244 first

JurkMonkey · Jan 6, 2006

Couldn't you use something similar to a pair of Hashtables?

Specify a key and retrieve the value back?

Hashtable htPageLibrary = (Hashtable)htDocumentLibrary["Document1"];

PageClass page = htPageLibrary["Page26"];

Console.WriteLine(page.ParentDocument);
Console.WriteLine(page.PageNumber);
...

tgreer · Jan 6, 2006

Yes, I could.

If I have a dictionary, hashtable, or list of byte offsets:

Document 1 starts at byte 10671 in the file.
Document 2 starts at byte 888912 in the file.

and I want to store Document 1 in a string, then I need to retrieve both values, and calculate the bytes to read.

For reasons having to do with compatibility with legacy programs, each Document is given a string as a "key".

So, given a specific dictionary key, I need to get the value for its entry, and the NEXT entry, to calcuate the bytes to read.

I don't see an "IndexOf" property for a Dictionary, Chip.

Thomas D. Greer

http://www.tgreer.com

chiph · Jan 7, 2006

Sorry, it's on the ArrayList object. I don't use Dictionary objects all that often.

Chip H.

____________________________________________________________________
Donate to Katrina relief:

http://s1.amazon.com/paypage/PELYGQVJ8Q7IB/103-6821258-5919825

If you want to get the best response to a question, please read FAQ222-2244 first

tgreer · Jan 7, 2006

An ArrayList might work, if it were type-safe.

One program/class processes the file and generates all of the keys / records all the byte positions. What determines a page/document differs with every file, so it has to process a lot of rules and expressions as it reads through the file.

Then, it needs to serialize this data, and a separate 3rd-party program is used for address correction (address data is extracted).

Next, I need to sort the documents/pages based on the address correction, so need to deserialize the byte information, and retrieve specific chunks of data and re-organize the file.

Thanks for your help... perhaps Dictionaries aren't the right structure for this.

Thomas D. Greer

http://www.tgreer.com

chiph · Jan 7, 2006

If you're using the 2.0 framework, you can use the List<> generic.

Code:

List<Document> documents = new List<Document>();
// ..
// Fill list here with some documents.
// ..

// Create document to find
Document d = new Document();
d.title = "Widget Maintenance Procedures";
d.pages = new List<Page>();

// Find it's index in the list
int i = documents.IndexOf(d);

In order to compare your objects, you need to either override the Equals method (as in v1.1) or implement the System.IEquatable generic interface, which amounts to the same thing.

Chip H.

____________________________________________________________________
Donate to Katrina relief:

http://s1.amazon.com/paypage/PELYGQVJ8Q7IB/103-6821258-5919825

If you want to get the best response to a question, please read FAQ222-2244 first

tgreer · Jan 9, 2006

I just realized I never gave you stars for this Chip. You're right, creating a class for the Document, and then implementing a list of Document objects, is the proper way to do this. Thanks again.

Thomas D. Greer

http://www.tgreer.com

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Which object? 2

tgreer

Programmer

chiph

Programmer

tgreer

Programmer

tgreer

Programmer

chiph

Programmer

JurkMonkey

Programmer

tgreer

Programmer

chiph

Programmer

tgreer

Programmer

chiph

Programmer

tgreer

Programmer

Similar threads

Part and Inventory Search

Sponsor