Optimize I/O 2

Swi · Jan 18, 2013

I have a fixed width text file that is 6+ million records that are over 900 characters per line long.

I also have another match file that I am reading into a dictionary object which is also fixed width.

I read through the text file and then check to see if a value (key) exists in the dictionary and then append the item data an write the record out.

What can I do to speed up the reading/writing process?

Thanks.

Swi

gmmastros · Jan 19, 2013

I haven't tried this, but...

Have you tried using the jet provider to open the file, filter the rows and then write from there? Because you have a fixed width file, you'll need to use a Schema.ini file as described here:

http://msdn.microsoft.com/en-us/library/windows/desktop/ms709353(v=vs.85).aspx

And here:

http://connectionstrings.com/textfile

Specifically, I'm thinking that the real cause of the performance problem is determining if the key exists in the dictionary. By using the ado and the jet provider, that process may be faster, making it so the file I/O time is no longer a problem.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

dilettante · Jan 19, 2013

Well a 6 million record file of 900 character records looks like a 5.4GB file (assuming ANSI) from here. How are you reading these?

The Jet Text IISAM doesn't handle files over ~2.1GB any more than VB6 native I/O or the FSO do as far as I know.

I suspect that you'd need to use a 3rd party I/O library or some API-based code to process files of such a size. Even then about the only thing you can do about performance is to read using a large block size (64KB to 512KB) and deblock records yourself. Using a block size that is a whole multiple of the record size would help by letting you just index through the block record by record fairly cheaply. You still need to examine every record in order to locate your targets. Writing would be a similar process, accumulating large blocks and actually writing less often as blocks fill up.

How huge is your "match file" though? It might be a lot quicker to take your match key values from the match file and make a huge String, and do InStr() to locate matches. I'd probably append a "stopper" to each value, such as a "$" or vbNullChar or something to avoid hits on misaligned values that might false-match.

In any case for optimal performance you might have to write some rather specific code that matches the requirements you have. Anything very generic will involve some performance tradeoffs.

Swi · Jan 19, 2013

I am using the FSO to read the file as ADO and VB6 native I/O as mentioned above do not handle the size. I can't find documentation but I can tell you that FSO is definitely reading the file.

Do you have an 3r party I/O library o API-based examples?

Thanks for the comments.

Swi

dilettante · Jan 20, 2013

You might look at

http://www.vbforums.com/showthread.php?531321-VB6-Huge-(-gt-2GB)-File-I-O-Class

There are other variations on the same theme out there though.

Swi · Jan 20, 2013

Thanks. I saw that last night while browsing also. Thanks again!

Swi

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Optimize I/O 2

Swi

Programmer

gmmastros

Programmer

dilettante

MIS

Swi

Programmer

dilettante

MIS

Swi

Programmer

Similar threads

Part and Inventory Search

Sponsor