Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

inserting utf-8 csv in sql server 1

Status
Not open for further replies.

fluppe689

Programmer
Jul 11, 2008
75
BE
Dear Experts,

I Had a problem with inserting a csv file that a got from a unix server in utf-8 into an sql server table with 30 columns all text

I tried is several times with a bulk insert instruction but that did not work

There were twoo problems : text was utf-8 and eof line was only LF
Column delimiter is ;

So i made this simple code for a class

this.tb_tekst.value = "Leads"
this.tb_tijd.value=time()

sql = "select * FROM " + m.y_db1 + "f_par "
retcode = SQLEXEC(handle,sql,"t_par")
IF retcode > 0
SELECT t_par
GO top
m.xxextdir = T_par.extdir
m.xxprogdir = T_par.progdir
SET defau TO &xxextdir
gnextern = ADIR(gaDatabase, '*.csv')
FOR nCount = 1 TO gnextern
m.bestand = "\\server02\data\merlijn\aanvragen\" + gaDatabase(nCount,1)

gnFileHandle = FOPEN(m.bestand)
nSize = FSEEK(gnFileHandle, 0, 2)
IF nSize <= 0
WAIT WINDOW "This file is empty!" NOWAIT
RETURN
ELSE
= FSEEK(gnFileHandle, 0, 0)
m.lf = CHR(10)
m.flag1 = 0
m.tel = 0
m.teller = 0
m.veldnr = 1
cString = FREAD(gnFileHandle, nSize)
SELECT test
APPEND blank
FOR m.i = 1 TO nSize
IF SUBSTR(cString,m.i,1) = m.lf
SELECT test
APPEND BLANK
m.veldnr = 1
endif
IF SUBSTR(cString,m.i,1) <> ";"

naam = "test.veld" +STR(m.veldnr,2,0)
m.zone = SUBSTR(cString,m.i,1)
naam= CHRTRAN(naam,' ','0')
replace &naam WITH &naam+STRCONV(sUBSTR(cString,m.i,1),11)
ELSE
m.veldnr=m.veldnr+1
ENDIF
endfor
ENDIF
= FCLOSE(gnFileHandle)
SELECT test
GO top
SCAN
IF ALLTRIM(UPPER(veld01)) = "TYPE"
LOOP
ENDIF
IF EMPTY(ALLTRIM(UPPER(veld01)))
LOOP
ENDIF


sql = "insert into " + m.y_db1 + "f_extern (veld1,veld2,veld3,veld4,veld5,veld6,veld7,veld8,veld9,veld10,veld11,veld12,veld13,veld14,veld15,veld16,veld17,veld18,veld19,veld20,veld21,veld22,veld23,veld24,veld25,veld26,veld27,veld28,veld29,veld30)"+CHR(13) +;
"values ("+;
"'" + CHRTRAN(test.veld01,"'","´") + "'," +;
"'" + CHRTRAN(test.veld02,"'","´") + "'," +;
"'" + CHRTRAN(test.veld03,"'","´") + "'," +;
"'" + CHRTRAN(test.veld04,"'","´") + "'," +;
"'" + CHRTRAN(test.veld05,"'","´") + "'," +;
"'" + CHRTRAN(test.veld06,"'","´") + "'," +;
"'" + CHRTRAN(test.veld07,"'","´") + "'," +;
"'" + CHRTRAN(test.veld08,"'","´") + "'," +;
"'" + CHRTRAN(test.veld09,"'","´") + "'," +;
"'" + CHRTRAN(test.veld10,"'","´") + "'," +;
"'" + CHRTRAN(test.veld11,"'","´") + "'," +;
"'" + CHRTRAN(test.veld12,"'","´") + "'," +;
"'" + CHRTRAN(test.veld13,"'","´") + "'," +;
"'" + CHRTRAN(test.veld14,"'","´") + "'," +;
"'" + CHRTRAN(test.veld15,"'","´") + "'," +;
"'" + CHRTRAN(test.veld16,"'","´") + "'," +;
"'" + CHRTRAN(test.veld17,"'","´") + "'," +;
"'" + CHRTRAN(test.veld18,"'","´") + "'," +;
"'" + CHRTRAN(test.veld19,"'","´") + "'," +;
"'" + CHRTRAN(test.veld20,"'","´") + "'," +;
"'" + CHRTRAN(test.veld21,"'","´") + "'," +;
"'" + CHRTRAN(test.veld22,"'","´") + "'," +;
"'" + CHRTRAN(test.veld23,"'","´") + "'," +;
"'" + CHRTRAN(test.veld24,"'","´") + "'," +;
"'" + CHRTRAN(test.veld25,"'","´") + "'," +;
"'" + CHRTRAN(test.veld26,"'","´") + "'," +;
"'" + CHRTRAN(test.veld27,"'","´") + "'," +;
"'" + CHRTRAN(test.veld28,"'","´") + "'," +;
"'" + CHRTRAN(test.veld29,"'","´") + "'," +;
"'" + CHRTRAN(test.veld30,"'","´") + "')"
retcode = SQLEXEC(handle,sql,"")
IF retcode > 0
SELECT test
DELETE NEXT 1
ELSE
= AERROR(aErrorArray) && Data from most recent error
wait window str(aErrorArray(1),8,0) + chr(13) +;
aErrorArray(2) + chr(13) +;
'insert f_extern'
return
ENDIF

ENDSCAN
DELETE FILE gaDatabase(nCount,1)
endfor
SET defau TO &xxprogdir
ELSE
= AERROR(aErrorArray) && Data from most recent error
wait window str(aErrorArray(1),8,0) + chr(13) +;
aErrorArray(2) + chr(13) +;
'02 SELECT T_PAR'
return
ENDIF
 
It's a known fact unix line feeds are \n only (newline or linefeed - chr(10), while windows uses \r\n (carriage retutn plus linefeed, chr(13)+chr(10))

You can use BULK INSERT
The ROWTERMINATOR = '\n' will solve that row terminator problem.

BULK INSERT only does not support codepage 65001 (UTF-8) as stated there, but if you specify RAW you can convert to Unicode, Ansi or whtever codepage the sql server database is in, afterwards.

Bye, Olaf.
 
Even if doing this in VFP, much less effort is needed:

Code:
* creating UTF-8 csv file with ; as field seperator (instread of ,) and LF only instead of CRFL:
lcFile = AddBS(GetEnv("TEMP"))+Sys(2015)+".txt"

lcLine = Strconv("abc;äöü;ßâáè;xyz"+Chr(13),9)
StrToFile(lcLine,lcFile,0)
StrToFile(lcLine,lcFile,1)
? lcFile," created"

* create import cursor
Create Cursor curImport (cText1 C(254), cText2 C(254), cText3 C(254), cText4 C(254))
* read in CSV
Append From (lcFile) TYPE DELIMITED WITH CHARACTER ";"

* convert to Ansi text (might not be needed, if SQL Server accepts UTF-8
Replace all ;
   ctext1 with Strconv(cText1,11),;
   ctext2 with Strconv(cText2,11),;
   ctext3 with Strconv(cText3,11),;
   ctext4 with Strconv(cText4,11);
   in curImport

*Upload to SQL Server:
oCA = CreateObject("mycursoradapter") && cursoradapter class created via Builder for the sql server table, updatable.
oCA.SelectCmd = "Select * FROM sqlservertable WHERE 1=0"
oCA.FillCursor()
Select (oCA.Alias)
Append From Dbf("curImport")
TableUpdate(2,.F.,oCA.Alias, laErrors)

Now admittedly part of the effort is moved into creating "mycursoradapter" via the cursoradapter builder and you might add some error handling in case the TableUpdate() call returns .F., but otherwise it's much less effeort.

Bye, Olaf.
 
Yes Olaf,

I did all of those things but the import as you discribe doesn't work because of some fields wre greater than 255 characters

Mt original instruction was :
SQL = "bulk insert " + m.y_db1 + "f_extern from " +CHR(13) +;
"'" + m.bestand + "' " +;
"with (CODEPAGE ='RAW', MAXERRORS = 100000,DATAFILETYPE = 'WIDECHAR',FIELDTERMINATOR = ';',ROWTERMINATOR ='\l')"
retcode = SQLEXEC(handle,sql,"")
IF retcode > 0
DELETE FILE gaDatabase(nCount,1)
ELSE
= AERROR(aErrorArray) && Data from most recent error
wait window str(aErrorArray(1),8,0) + chr(13) +;
aErrorArray(2) + chr(13) +;
'02 INSERT F_EXTERN'
return
ENDIF
 
Then perhaps take a look at this:


It imports CSV even with multiline texts (what VFP fails to do) and creates char but also memo fields.

It creates a fox2.6 dbf, but in your case that will suffice.

Want to still see some more recommendations in regard of your coding?

Code:
gnFileHandle = FOPEN(m.bestand)
nSize =  FSEEK(gnFileHandle, 0, 2)
= FSEEK(gnFileHandle, 0, 0)
cString = FREAD(gnFileHandle, nSize)
FCLOSE(gnFileHandle)

*--- can be done via
cString = FileToSTR(m.bestand)

Code:
 replace &naam WITH &naam+STRCONV(sUBSTR(cString,m.i,1),11)
This cental line of your loop adding to the current field char by char should be is awful slow. I'd collect the field value in a var, perhaps even splitting lines via ALINES(), and fields of a line too, with ALINES and ";" as the "line" seperator.

Besides being slow, if you really would have some double byte UTF-8 characters in the cString, STRCONV(,11) fails, if you feed it with the first and second byte. See for yourself:

Code:
*problem with UTF-8 to ANSI conversion byte by byte:
lcChar = strconv('á',9)
? len(lcChar)
? strconv(substr(lcChar,1,1),11)+Strconv(substr(lcChar,2,1),11)
? strconv(lcChar,11)

See? Your byte by byte conversion doesn't do it. You better even just convert the whole file to Ansi before processing it via:

Code:
STRTOFILE(STRCONV(FILETOSTR(m.bestand),11),'newcsvfile.txt')

or in case you stick with cString simply not create a new file but only do the conversion step in one go:

Code:
cString = STRCONV(FILETOSTR(m.bestand),11)

Bye, Olaf.
 
Oh, I still give you a star for sharing your code.

Bye, Olaf.
 
Ok Olaf,

I think i have still a lot to learn, even when it is almost to late for vfp


Wfg,

Filip
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top