Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Exporting to CSV

Status
Not open for further replies.

PCHomepage

Programmer
Feb 24, 2009
609
US
This isn't so much a VBA question as it is an Excel question but maybe VBA is what's needed. I am trying to export data from Excel into CSV but it fails to enquote text fields and does not escape quotes that exist in the data. It also seems to truncate some data. For example, this LOOKS like a number even though the column is actually general text but 1.0100 exports as 1.01. Any ideas on how to get the required output? Thank you.
 
and what is the source of data on Excel in the first place?

lots of issues around numbers mainly - and solution is not always the same.

see for one case

and for one of the common solutions to formatting

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Hi,

YOU need to prepare the data accordingly.

The fields you want enquoted, YOU must enquote.

BTW fields with a GENERAL Number Format may have numbers or text. There is a TEXT number format. Make the Number Format for your 1.01 column, TEXT. Then You must 1) make a new column for a formula, 2) enter the formula =TEXT(CellRef,"#,000") where CellRef is the cell containing your number and copy/paste this formula down thru all data rows. Then COPY that column and PASTE SPECIAL--VALUES into your number column formatted text.

Finally you can SaveAs .csv text file. (Export???)
 
BTW, your number problem may nor be with your "export" to a .csv.

It may be that you are opening the .csv file with Excel. I almost NEVER open a .csv file with Excel. Rather IMPORT any text file and use Excel's parsing feature to assign a descrete Number Format to each column.
 
The data was created in Excel by copy-and-pasting the OCRed content from a PDF scan of an old ('40s and '50s) automotive parts book. Due to the book's inconsistent formatting, each entry had to be made more or less manually and, in some cases, even re-typed where the original's quality was too poor to OCR properly. In order words, the data originated in Excel rather than being imported from CSV or any other format.

As for formatting, yes I know about "#.000" (although I don't use it often as I usually run Linux, I am no stranger or beginner with Excel) but the example I gave was just a single one and not all have the same decimal places. That's why I created the column as text, thinking that it would export enquoted to CSV.

In fact, some fields do seem to be enquoted during the export while others do not and, of course, with many thousands of lines of data (and this is just the experimental beginning of only a few sections of the book) I need some way to automate the enquoting and escaping but have not used VB in at least a decade. I don't even know where to begin.

The CSV is ultimately being imported into a MySQL database. Some fields also contain quotes in the data itself that must be escaped for proper database importing.
 
That's why I created the column as text, thinking that it would export enquoted to CSV."

Have you opened the .csv file with a TEXT EDITOR like NotePad or Word? I venture a guess that you'll see that value just as you entered it, meaning that the "export" is not the problem: rather using Excel to OPEN the text file. Just my suspicion.
 
I'm not using Excel to open the CSV but I did look at it in a text editor, which is how I know that there was truncation, only partial enquoting and no escaping.
 
So I ran a test from Excel.

I formatted column A as TEXT.

I entered 1.0100 in A1

I entered SkipVought into B1

I did a SaveAs, CSV in the Save as type

I opened the file in Notepad

Result:
[tt]
1.0100,SkipVought
[/tt]

BTW, simply changing a Number Format changes NOTHING!!! The underlying value remains UNCHANGED. To have leading or trailing ZEROS, they must be concatenated to your numeric string (remember you have already changed the Number Format to TEXT, prior to entering your numeric charactercters)
 
A proper CSV generally has all text columns enquoted so if Excel "knows" that a column is text and isn't doing that, it is not CSV so I need a way to force the enquoting. To restate, perhaps it is technically proper CSV but it is not the kind of CSV that will properly import into a database, especially when parts of the text itself contain quotes that are not escaped.
 
YOUR database importer may require quotes around text values, but that is definitely not a general feature of csv text files.

You will have to add whatever quotes and escape sequences are required for YOUR database.

My particular focus with my replies has been the numbers that must have trailing zeros. Is this still an issue?
 
The trailing zeros is not as big an issue as simply importing the data in general. Without the quote and escapes, it gets all messed up as databases (most of them that I am aware of) require these things for importing data, which I thought was the point of Excel being able to export to that format.
 

Several relatively common variations from the strict form specified by RFC 4180 are found and may be supported by software tools such as those listed below as Useful References:

In locales where the comma character is used in place of a decimal point in numbers, the separator between fields/columns is often a semicolon.

The line break character may be CR or LF, not necessarily CRLF.

Some Unix-based applications may use a different escape mechanism for indicating that one of the separator characters occurs within a text value. The individual character is preceded by a backslash character rather than enclosing the entire string in double quotes.

Single quotes may be treated as equivalent to double-quotes for escaping (also known as "text-qualification").
 
Thank you, yes, I've been all those routes through the years as this isn't the first time I've exported from Excel to CSV, then to a database. This is the first time, though, with a more recent version of Excel and all these problems, and Excel does not appear to have any options for the CSV exporting. It does it its way and that's it.

Many years ago I had an application that would import directly from Excel into any number of databases and would take care of these issues itself but it appears to be defunct now so perhaps I need to search for another or unearth this old one, which I may still have somewhere, and see if it will run on Windows 8.1.

The data I am working with contains single and double quotes, commas, colons and semicolons (and possibly others) which is why these things need to be escaped. Anyway, I'll see if I can locate some other application to do the importing directly and thank you again for your time.
 
a straight C# (using OLEDB and ACE driver) bit of coding can extract all data from excel and you can decide what needs to be formatted and how and do all type of validation based on your own metadata. And yes I am advising c# vs vba as you can have it easily setup to process metadata and create the required extracts on the fly - or "hardcode" each file type if you wish to have a quick and dirty process.

On my present projects we use this extensively to generate SQL Insert statement scripts from a variety of excel sources - and all formatting, quoting, escaping, null handling, dates converting are done on this one C# project (which unfortunately can't share).

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
I finally remembered the name of the application I used to use years ago for importing directly from Excel and was pleased to find that it still exists in modern 32 and 62 bit form and there are even Mac and Linux versions. In fact, it can export or import to and from many different databases but I used the version only for MySQL. I tried it and it imported flawlessly as far as I can tell! I'll not post it's name unless asked as I don't want to sound like an advertisement.
 
[blush]I sure booted my request!

Please don't leave us in a lurch with bated breath: no Thing or mint!
 
Okay, if you insist! It's called Navicat but it is commercial software and not cheap. The home version for MySQL-only starts at close to a hundred dollars and it goes up from there. If you want a commercial license, it doubles and apparently requires a separate license (and payment) for each OS you might have.

Since posting that I had used it, I've been looking through the data and see no flaws in the import at all. No truncation, no missed quotes or anything else amiss as far as I can tell. Apparently it can also import multiple tabs but I had combined several into one of almost 2000 rows for this test so had only one to import. Ultimately there will be maybe hundreds of thousands of rows so I'll have to do it as individual tabs but that's some time off as the creation is time consuming and tedious due to the original format not OCRing very well and the inconsistency of the original printed document with headers all over the place. I can think of no way at all to automate the process but even if there were one it would mean proofing it line by line anyway.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top