Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Export DBF to CSV 2

Status
Not open for further replies.

Olaf Doschke

Programmer
Oct 13, 2004
14,847
DE
Today I needed a routine better acting on Memo and other field types than COPY TO or EXPORT do, so I took a solution from an MS KB article and modernized it a bit.

Have a look at faq184-7770

Whether this is useful or not will depend on the client reading in the CSV. For example you can't read back multi line memo fields exported with this routine via APPEND FROM txtfile, but that's not a fault of the CSV, the CSV standard for multi line text is to have them in exactly the way this routine outputs them including line feeds.

Bye, Olaf.
 
Nice :)

Respectfully,
Vilhelm-Ion Praisach
Resita, Romania
 
Our clients require almost all data be exportable to CSV. And since also most of our tables have at least one memo field (Oracle NCLOB), your function will be of great tool for us. Uhm, can I give you more than one star? [2thumbsup]

kilroy [knight]
philippines

"Once a king, always a king. But being a knight is more than enough."
 
That's easy, because we are a community :)

Respectfully,
Vilhelm-Ion Praisach
Resita, Romania
 
Thanks, kilroy and Vilhelm-Ion.

You could also rate the FAQ, even if it doesn't matter much to me. Do so, once it came in handy.
In my personal version I already made a slight change to also add a column DELETED for exporting a DBF including deleted records with the deletion flag, while there is no official need to do so.
I want to change the code from interaction to parameterization, too. Once all is as we'll use it in production, I'll make an update to the FAQ. Feel free to comment and add feature requests.

Besides, I remember the outcome of a discussion about performance was rather using a HTML table as intermediate format to import to Excel and then save as xls, xlsb, xlsx or (od course) also csv. I just warn about Excel having it's own mind about handling cell values and automatically converting them to what it thinks is more suitable, including for example wrong date/time conversions, turning a numeric string with eventually important leading zeros to a number, and more conversions I even don't want to know.

Anyway, if the destination is Excel, faq184-4704 is a good choice, most probably. It must be expanded to support all field types, but it's possible to do so after reading the description.

Bye, Olaf.
 
Olaf Doschke said:
You could also rate the FAQ
My pleasure :)

I already been through something similar recently, when I created exporttoxlsx.
At that moment I made some tests, to improve the speed.

I'm aware you already know what's next, but this is a forum, so please allow me to make these sugestions:

1) "==" is faster than "="
2) Transform() is slow, you already preffered the faster IIF() in case of logical fields; TRANSFORM() can be replaced with LTRIM(STR()) (for double, float, currency and integer) and DTOC(), respectivelly TTOC()
3) the order of the case braches have a big impact on the overall performances; the most frequent cases must be the first ones
4) instead of querying If lnCount < lnFieldCount inside the For loop, it's much faster to remove the extra comma after the For loop with lcFieldString = RTRIM(m.lcFieldString,0,",")
5) the mdot prefix for the memory variables significantly reduce the execution time (especially for tables with dozens of fields)

On a table with 12 fields and 4898957 rows, I gained about 5%
On a table with 15 fields and 315933 rows, I gained about 8%
On a table with 33 fields and 70381 rows, I gained about 13%
On a table with 82 fields and 151172 rows, I gained about 27%


Respectfully,
Vilhelm-Ion Praisach
Resita, Romania
 
Overall I can report I did a conversion from a ~400MB DBF to ~600MB CSV (numbers often get longer) in 6 minutes. So the throughput is nice.
Some of the things are simply inherited from the MS code, I would do some things different, too.

I already saw the mdot prefix missing. It doesn't only play a performance role, also in the part before the scan loop, you couldn't process tables with fields named as some variables without using mdot prefix. That's an error inherited from MS, like several other things you address.

I already made a version creating individual script code for the table schema, which means no CASE statement is processed for each record during scan of all the data, instead several // commands are executed with field names. But it did even took longer for some reason I have to investigate. And I already said I'll update once I have the code as used in production, I assume some bad behaviour or performance issue might come up and be addressed, so it's worthwhile waiting, and thanks for the input on what you see.

Bye, Olaf.
 
Mostly you're also right with Transform.

Transform is especially slow, as it internally first does another case statement on the parameter type to make the right conversion, even if you don't specify format and or mask.
I agree DTOC() and TTOC() are much better for D and T fields, and STR() is much better for I.

But the same idea doesn't apply good to any double and float values, eg take the extreme 8.9884656743115E307:
Code:
ln = 8.9884656743115E307
? LTrim(Str(m.ln)) && 8.98E+307
? LTrim(Str(m.ln,40)) && 8.9884656743116400000000000000E+307 (I didn't count the 0s)
? Transform(m.ln) && 8.98846567431164E+307 <- perfect (as far as the accuracy goes)

Transform handles floats and doubles especially nice. Using Str(ln,40) is just demonstrating an extreme, 22 would work in this case, but in the general case would leave many unwanted 0s for numbers with exponents.

Bye, Olaf.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top