Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Quotation marks prevent proper import of tab-delimited file

Status
Not open for further replies.

Gavona

Technical User
Aug 27, 2002
1,771
GB
I am manipulating files exported from SAP. SAP creates Tab-delimited files with an xls extension. If I open directly into Excel then the data is corrupted due to the inclusion of " marks in some of the original fields.

My current workaround is to open the file in Notepad, replace all " with a space and save the file. I think I need to do this with a large number of files as I can't predict which will contain this corrupting character.
(Some files are 40k rows long and I have not yet tested if these will exceed Notepad's capacity.)

Is there a way I can automate this process within my VBA code? Or a way in which I can import the files rather than open them, and thus get around the issue.

In talking to others I gather this is a common problem within the organisation, causing significant delays and wasted time. I would therefore like to share with others a generic process to clean the "Excel" files created by SAP. I imagine this would involve sharing a macro.

Your guidance would be much appreciated!

Gavin
 
how is it being corrupted? I just tested some data, like so:
Code:
Header1	H2	H3
"Test 1"	Test 2	"Test 3"

Opened as Tab delimeted, Text qualifier as "

No problems...
 
Have you tried replacing the file's .xls extention with .txt and then opening that with Excel?
 
>Some files are 40k rows long

It can be much quicker to 'load' text files using a QueryTable; If you are not familiar with them then (in Excel 2003speak) do;

Start Recording a Macro
Drop Data Menu
Select Import Data
Select Files of type "Text Files"
Walk the file selection dialog to the text file you want to import (load)
Work your way through the text import wizard, which will allow you to deal with the tabs and quotes.
The file is imported.
Stop Recording the Macro
Checkout the code produced.
 



Hi,

I would recommend NEVER opening a text file with Excel.

I would recommend IMPORTING text files into an existing workbook. You have MUCH better control over the fields using IMPORT!

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
Gruu: It's weird. The contents of (part of/the whole) next record get concatenated into the record with the " in it. Blank row gets inserted into the spreadsheet (thus messing up code that expects good data design).

Hugh/Skip: I will try those ideas tomorrow (8:30pm in the UK right now). Thanks for the walk-through Hugh!






Gavin
 
Is it possible the file has real quotes? (open/close quotes as opposed to double tick marks) If so, you're in for a fun ride...
 
Gruuu: Anything is possible....and anything that might happen will happen at some point. Users have little restriction on what they enter in free text fields.

Why would quotes cause an issue if the file is tab delimited? And would the import approach help to control the way they are handled?

Gavin
 
Ah. I just spent a good 5 minutes trying as hard as I might to get the left and right double quotes to appear in a post. Tek-Tips apparently just does not allow this?

This should give you a very good idea of HOW MUCH OF A PAIN these are.

the " character (or double prime as it is called) is NOT THE SAME as the Left OR Right double quote character. For the example, I will use \\ to represent a left double quote, // for right double quote, and || to represent what the sane among us use for quotes.

Let's say you have a field like so:
Code:
[b]||[/b]Text Field[b]||[/b]
All is peachy. Excel finds the quote, reads text until it finds another quote, and caps it off. It includes any white space IN THE FIELD.

Next example:
Code:
[b]\\[/b]Text Field[b]//[/b]
All is NOT PEACHY.
Excel does not find a ||. It sees just another field. You end up with two data points: \\Text,Field//

Sinister example:
Code:
[b]||[/b]Text Field[b]//[/b]

Ok, we find a quote, we start reading text until we find another quote. Ok there is no other quote (before the end of the line, which will terminate the field also)
So you get:
Text Field// Data Data Data
as one field.
 
And to answer your second question? I'm not sure. I don't know if you can specify more than one text delimiter. If you could, that would be perfect. Perhaps something more likely is managing the EXPORT from SAP to clean up it's sloppy mess!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top