Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Cannot easily bulk insert from CSV file with double quotes

Status
Not open for further replies.

surfoman

Programmer
Jul 18, 2003
4
GB
I am trying to bulk insert from a comma separated CSV file containing double quotes.

Here is an example CSV file which illustrates what I am referring to:

---------------------------

ID,TextField1,TextField2
23,This is some text for this field,This is some other text
34,”This field, contains a comma and double quotes”,”So does this field, as you can see”
99,Sometimes it may not have double quotes,”Sometimes, it suddenly does”

---------------------------

As you can see from above, this is a perfectly acceptable CSV file but unfortunately I cannot control the consistency of the double quotes. The supplier of the data feed only appears to surround a field in double quotes if it contains the comma delimiter.

I have tried implementing a query using the MS SQL BULK INSERT query.

Here is an example:

BULK INSERT TempTable
FROM 'C:\Files\CSV\example.csv'
WITH
(
FORMATFILE = 'C:\Files\CSV\example-format.txt'
)

The problem I have here is the format file. I cannot get it to work on string fields in which some rows have the double quotes and other rows do not.

I managed to find an example on this forum here:

Although this works fine for fields surrounded by double quotes, as soon as you remove the double quotes from one of the fields the bulk insert fails.

So perhaps if anyone familiar with format files can help then I may have a solution.
 
I am familiar with format files. But it won't really help.

You see, when you use a format file, you are telling SQL that this is how the data is laid out. You can identify delimiters in your format file, but EVERY row must be consistent. Without consistency, you are doomed to failure.

I have 2 suggestions for you.

1. Go back to the person that has the source data and tell them to create a consistently formatted file, even if this means they use quotes around data that does not need quotes.

2. You could try loading the data in to Excel and then saving it with another format. You could use tab delimited or even Excel. If you save as excel, you won't be able to use bulk insert, but you could use OpenRowset instead.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Hi George,

I realise now that I will need to pre-process the CSV file to insert the double quotes so the data is consistent as you say.

This whole process needs to be done automatically on a weekly scheduled basis so having to manualy take the file and convert is not practical.l

My solution is as follows:

1. Read the CSV file into memory
2. Apply a GREP command to surround all string fields which may contain commas with double quotes
3. Save the processed CSV file to disk
4. Apply the BULK INSERT query on the new file using a format file which strips the double quotes.
 
I think the biggest worry you have is that you may have a comma in a sentence that is not already in quotes. Even if you pre-process it you will wind up with an incorrect number of columns.

When you pre-process it you will need your program to alert you if that problem exists. Your example data shows 3 columns per line. I would check for that.


Simi
 
Hi simian,

This is why I am going to pre-process the CSV file so all the string fields which may contain a comma will be surrounded by double quotes.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top