Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Optimize I/O 2

Status
Not open for further replies.

Swi

Programmer
Feb 4, 2002
1,968
US
I have a fixed width text file that is 6+ million records that are over 900 characters per line long.

I also have another match file that I am reading into a dictionary object which is also fixed width.

I read through the text file and then check to see if a value (key) exists in the dictionary and then append the item data an write the record out.

What can I do to speed up the reading/writing process?

Thanks.

Swi
 
I haven't tried this, but...

Have you tried using the jet provider to open the file, filter the rows and then write from there? Because you have a fixed width file, you'll need to use a Schema.ini file as described here:


And here:


Specifically, I'm thinking that the real cause of the performance problem is determining if the key exists in the dictionary. By using the ado and the jet provider, that process may be faster, making it so the file I/O time is no longer a problem.


-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Well a 6 million record file of 900 character records looks like a 5.4GB file (assuming ANSI) from here. How are you reading these?

The Jet Text IISAM doesn't handle files over ~2.1GB any more than VB6 native I/O or the FSO do as far as I know.

I suspect that you'd need to use a 3rd party I/O library or some API-based code to process files of such a size. Even then about the only thing you can do about performance is to read using a large block size (64KB to 512KB) and deblock records yourself. Using a block size that is a whole multiple of the record size would help by letting you just index through the block record by record fairly cheaply. You still need to examine every record in order to locate your targets. Writing would be a similar process, accumulating large blocks and actually writing less often as blocks fill up.

How huge is your "match file" though? It might be a lot quicker to take your match key values from the match file and make a huge String, and do InStr() to locate matches. I'd probably append a "stopper" to each value, such as a "$" or vbNullChar or something to avoid hits on misaligned values that might false-match.

In any case for optimal performance you might have to write some rather specific code that matches the requirements you have. Anything very generic will involve some performance tradeoffs.
 
I am using the FSO to read the file as ADO and VB6 native I/O as mentioned above do not handle the size. I can't find documentation but I can tell you that FSO is definitely reading the file.

Do you have an 3r party I/O library o API-based examples?

Thanks for the comments.

Swi
 
Thanks. I saw that last night while browsing also. Thanks again!

Swi
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top