Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Bulk Insert Text File - Error with numeric data type 4

Status
Not open for further replies.

BxWill

MIS
Mar 30, 2009
367
US
Receiving data type error upon attempting to load a text file into Sql Server 2008R2 table.

Specifically, the error message received is similar to the following;

Code:
Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 6600, column 30 (DAILY_VALUE).
.
.
.

To attempt to resolve,

Step 1: I verified that the data type for the column "DAILY_VALUE" is numeric - I specified as "numeric(18,0)" .

Step 2: Reviewed the actual code that is used to bulk insert - It looks okay.

BULK INSERT dbo.Accounts
FROM 'C:\Data\Accounts.txt'
WITH ( FIELDTERMINATOR ='\t', FIRSTROW = 1 )

Step 3: Reviewed the actual text file using MS Excel and determined that it appears that all of the data type conversion
errors occur on records where the value in the column titled "DAILY_VALUE" is "0." My thought is that the numeric data type should handle values of "0".


Does anyone know how I can resolve the data type conversion errors? Thanks in advance!

 
first of all - never ever use excel to review any text file - Excel will do its own conversions and what is on the file is not what you see in excel.

So open the file with a good editor (hex edit, notepad++ or gvim, go to the line affected and look at the rows on and after the affected one and have a good look at the data and see if there is any thing that could cause the issue.



Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
You can also use the bcp command to load to the table on the cmd propmt
bcp dbo.Accounts IN 'C:\Data\Accounts.txt' -T -c


 
Did download and installed gvim.

However, after reviewing the text file using gvim, there are no additional characters near the records that have zero in the
field "Daily Value."

The process that I am using to load over 40 text files is the Bulk Insert method for each text file. I will definitely explore
other methods to load the text files including SSIS and BCP. Manually using the Bulk Insert method is quite time consuming!

The error remains!

I am still not able to load the text file containing zeroes in the "Daily Value" field.

Will maybe have to change the data type of the field to nvarchar(Max) or something similar to just get the table loaded.

Another idea that I contemplated is to split the table into two tables - 1) the first table would contain all records that have a numeric data type in the "Daily Value" field and 2) the second table would contain all records that have a zero in the "Daily Value" field that I will resolve at a later time.

Any additional thoughts is appreciated.

I will try the BCP method...
 
Are you sure it is a zero and not the letter O?

One thing you can do is create a staging table with the column as VARCHAR(appropriate length). Load the data into the staging table and then review it to see what the data in the column looks like. Lastly, insert it from the staging table into the actual table.


-SQLBill

The following is part of my signature block and is only intended to be informational.
Posting advice: FAQ481-4875
 
Did review a relatively small sample of records within the last remaining text file and it appears that the zeroes are indeed zeroes in the "Daily Value" column.

Basically, I received this task approximately four weeks ago - to convert from MS Access to Sql Server a multi-year data capture process.

The data was not actually loaded in the MS Access databases. Only the tables were setup in MS Access. The data is within the 40 text files that do not have a column header - that are updated every other week!

Initially, considered linking each text file to MS Access..

Then, decided on the use of Sql Server...

Therefore, I used SSMA to setup the table structure in Sql Server 2008R2 and modified the data types for the various fields within the 40 tables by highlighting the table, right clicking the mouse and selecting "Design" - successfully changing the data type. Then, I imported each text file to the respective table (using Bulk Insert) except one text file - a rather tedious process!

At this point, it appears that I should employ SSIS to create 40 staging tables and automate this process.

Note, this is only for one year - 2011! I still need to perform this for years 2012 and 2013! One option contemplated is to just
load all multi-year data into the same table instead of creating multiple tables for 2011, 2012 and 2013. Of course, I would have
to differentiate the records by adding two fields - "DateUpdated" and "YearOfData."

Any additional insight as to the preferred setup and update of the multi-year text files in Sql Server is appreciated.

I may need to re-post in reference to the preferred setup...
 
In SSIS you can do that really simple... just put your files in separate folders for each year and stay consistent with file names... Load each file in load table which should be all fields define as varchar(255) (just load file into SQL server, after that run validation step) and move to destination only valid records. In addition you can set up errors table and move records with errors there and add filed which for orig file name... And you do not need to set 40 load tables, you can reuse the same table in loop
 
Okay. So, just to recap - SSIS can load all records with a datatype of varchar(255) to a load table ("staging table") both numeric and non-numeric?

I have placed all of the text files in the respective folders at - C:\Data\2011 ; C:\Data\2012 ; and C:\Data\2013

Now, each of the paths above contain 40 text files that should be loaded to the 40 Sql Server tables for the respective year or within the same table that will contain multiple years' data - as long as I differentiate the records.

Therefore, I believe that I do indeed need 40 load tables if running this process concurrently.

On second thought, it appears that you are loading the tables consecutively - loading table 1, run validation step for table 1, move valid records to table 1,...load table 2, run validation step for table 2...

Is there a sample of code that better illustrate the process?

Will commence the intensive review of SSIS and appreciate any additional insight.
 
Few considerations with SSIS.

its picky... very very picky.

for example you got a file with "abc,0 ,def". you define the second field as being a int and to your surprise SSIS fails with conversion errors. why? SSIS doesn't like the spaces after the numbers.
However if you had "abc, 0,def" it would work fine.

So whenever doing processing on files where some of the data layout/format aspects cause SSIS to fail you need to resort to one (or a combination of) options that allow you to deal with data.
1-Derived column
2-Transformation script
3-Load as raw data to SQL


Option 1 will allow you to conver data types, to do substrings of the data to put it on a format acceptable by SSIS, to remove spaces (would solve the example above) and a few other things.
Option 2 will allow you to do everything you need... but requires you to know VB or C#
Option 3 is for when you its easier to do things in SQL and specially for when you have high volumes of errors or exceptions and its faster to deal with in SQL as you can do set based processing.

Note that for option 3 its very feasible to mix with a data flow for improved performance in certain cases

As for the particular case of your yearly files my approach would be as follows.



Code:
file id table -- support table
fileid int identity(1,1)
filepath varchar(2000)

work table used would have the following layout
recordid bigint identity(1,1)
fileid int
all required fields based on your file - all defined as varchar big enough to hold each individual field

Create a SSIS package with the following
loop container 1 - loop through all folders on the high level folder
   loop container 2 - loop through all files on the current folder
      (loop container would associate current file path with a variable for use below)

      insert new record onto file id table - retrieve allocated identity 
      data flow - load each file found onto the same table (table could vary also but no need)
        this data flow would insert into the work table all data records with associated file id
       
   end loop
end loop
execute sql step
  execute sql to process the loaded records and split them into the final tables after data validation is done.
  output invalid records onto a table with varchar fields for further analysis

The file id table above is to allow you to easily identify which file each record is from for the situations where you have an error and you may be required to ask for a new version of the file with data corrected.



Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Thanks for the insight.

Will select Option 3 - Loading raw data to SQL; initially to staging table, validating data and then moving data to Errors or final destination table

For the work table (final destination table), I will also add two fields at the end of the record - "DateUpdated" and "YearOfData."

Thought for the near future - The update of the records will be somewhat tricky considering that these are quite large files - sometimes a million records - and there is no indication as to the specific field on the record that was updated!

Therefore, I assume that this will definitely impact the initial setup of the final destination table. For example, it may impact the field "recordid." For example, if I initially load 1,000,000 records into a table and then two weeks later receive a "updated" text file
with 1,000,000 records but no indication as to which of the fields on the records were updated. Then, I believe that the easiest approach is to load the "updated" text file into the staging table, compare the staging table to the work table and populate the "DateUpdated" and "YearOfData" fields after populating the fields on the records that may have changed.

First things first, the focus will be on initially loading all of the files into the appropriate Sql Server tables.

Will continue the intensive review of SSIS, specifically the looping through all folders... and finalize the initial loading of the text files...



 
for the compare of records you may wish to consider to add a hash field to each record while loading in SSIS - that way compares at a later stage are a lot faster as you only compare the hash field to see if they changed

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Still at this...

Initially, I had planned to store multi-year data (2011, 2012 and 2013) in the same table.

However, I have decided to store the multi-year data in separate tables with each table containing data for a specific year.

Just to reiterate, data is updated bi-weekly for all years.

What I have so far is the following;

1. I created the following folder structure

C:\SSIS\2011\Updated
C:\SSIS\2011\Updated\Archived
C:\SSIS\2012\Updated
C:\SSIS\2012\Updated\Archived
C:\SSIS\2013\Updated
C:\SSIS\2013\Updated\Archived

2. I defined the following variables

Variable Name-----------Data Type-------Value--------------Is Expression--Expression
varSourceFolder1--------String----------C:\SSIS\2011-------FALSE
varArchiveFolder1-------String-----------------------------TRUE-----------@[User::varSourceFolder1] + "\\Archived"
varArchivePath1---------String-----------------------------TRUE-----------@[User::varArchiveFolder1] + "\\" + @[User::varFileName]
varFileName1------------String-----------------------------FALSE
varFilePath1------------String-----------------------------TRUE-----------@[User::varSourceFolder1] + "\\" + @[User::varFileName]

Variable Name-----------Data Type-------Value--------------Is Expression--Expression
varSourceFolder2--------String----------C:\SSIS\2012-------FALSE
varArchiveFolder2-------String-----------------------------TRUE-----------@[User::varSourceFolder1] + "\\Archived"
varArchivePath2---------String-----------------------------TRUE-----------@[User::varArchiveFolder1] + "\\" + @[User::varFileName]
varFileName2------------String-----------------------------FALSE
varFilePath2------------String-----------------------------TRUE-----------@[User::varSourceFolder1] + "\\" + @[User::varFileName]
.
.
.

3. Create oledb connection to use tempdb database (local.tempdb)
4. Create flat file connection for any text file in the source folder (for 2011) and made Dynamic so I can use one connection to load
several files from the 2011 source folder

5. Created execute sql task to execute several SQL statements similar to the following;

if object_id('tempdb.dbo.Staging_Customers') is null
begin
create table tempdb.dbo.Staging_Customers
(
FilePath varchar(255)
,CustName varchar(255)
,CustPhone varchar(255)
,CustEmail varchar(255)
)
end

truncate table tempdb.dbo.Staging_Customers


if object_id('tempdb.dbo.Staging_Customer_Address') is null
begin
create table tempdb.dbo.Staging_Customer_Address
(
FilePath varchar(255)
,Address1 varchar(255)
,Address2 varchar(255)
,City varchar(255)
,State varchar(255)
,Zip varchar(255)
)
end

truncate table tempdb.dbo.Staging_Customer_Address
.
.
.

Questions/Comments:

1. Is it advisable to have a identity column on each table to uniquely notate a particular record? My understanding is that gaps can occur between identity values if there are frequent deletions on a table.

2. Not quite sure at this point as to the coding to create a file id table

3. Not quite sure at this point as to how to perform validation and insert new records onto the file id table??

4. Still need to program the loading of the text files from the remaining years - 2012 and 2013

Thanks in advance for any additional insight.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top