Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Sql Server 2012 - String or Binary Data Would Be Truncated 2

Status
Not open for further replies.

BxWill

MIS
Mar 30, 2009
367
US
Timely insight needed.

Last month, the load of data was successful.

This month, there is the error - "String or binary data would be truncated"

It appears that the error is attributable to several instances of the actual lookup values in columns within the text file that should contain the lookup codes instead of the actual value.

For example, state code should be only two characters like "CA" instead of California.

Therefore, I modified the sql script, as displayed below, to only load records that only have valid data.

Was about to experiment with the use of "Datalength" instead of "len" to resolve the error messages. However, due to not resolving the errors within the last two hours, some insight may be helpful.

(Eventually, I will return the records with erroneous data back to the source department. However, my priority is to load the valid records as soon as possible and worry about the invalid records later...)


So far, the revised sql script does not work due to the following additional errors;

1. Incorrect syntax near LTRIM
2. Incorrect syntax near 'dup'


Any insight as to what revisions to the sql script below are necessary to resolve the errors? Is the use of "len" with RTRIM and LTRIM the culprit?


Code:
INSERT INTO #validate_EquipmentData
		SELECT t1.Equipment_Code,
		CASE  
		  WHEN RTRIM(LTRIM(t1.[Account_Number])) = ''
				THEN Null 
		  ELSE RTRIM(LTRIM(t1.[Account_Number]))
		END as [Account_Number],	
		RTRIM(LTRIM(t1.Sequence_Number)),
		CASE																	
		  WHEN len(RTRIM(LTRIM(t1.[Equipment_Flag]))) > 1
				THEN Null 
		  ELSE RTRIM(LTRIM(t1.[Equipment_Flag]))
		END as RTRIM(LTRIM(t1.[Equipment_Flag])),
	
		CASE																	
		  WHEN len(RTRIM(LTRIM(t1.[State_Code]))) > 2
				THEN Null 
		  ELSE RTRIM(LTRIM(t1.[State_Code]))
		END as RTRIM(LTRIM(t1.[State_Code])),
				
		CASE																	
		  WHEN len(RTRIM(LTRIM(t1.[Tracking_Number]))) > 15
				THEN Null 
		  ELSE RTRIM(LTRIM(t1.[Tracking_Number]))
		END as RTRIM(LTRIM(t1.[Tracking_Number])),

		CASE
		   when  len(t1.[Sale_Date]) = 8
		   and isdate(t1.[Sale_Date]) = 1
		   and RTRIM(LTRIM(t1.[Sale_Date])) like '[0-9][0-9][0-9][0-9][0-1][0-2][0-3][0-9]'
		   then convert(date,  t1.[Sale_Date], 112) 
		   else cast(null as date)
		end as [Sale_Date],

		CASE																	
		  WHEN ISNUMERIC(RTRIM(LTRIM(t1.[Sale_Amount]))) = 0
				THEN Null 
		  ELSE RTRIM(LTRIM(t1.[Sale_Amount]))
		END as [Sale_Amount],

		CASE																   When RTRIM(LTRIM(t1.[Purchase_Zipcode])) LIKE '[0-9][0-9][0-9][0-9][0-9]' then RTRIM(LTRIM(t1.[Purchase_Zipcode]))
		   When RTRIM(LTRIM(t1.[Purchase_Zipcode])) LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]' then substring(RTRIM(LTRIM(t1.[Purchase_Zipcode])),1 , 5)
		   else null
		end as t1.[Purchase_Zipcode],

		dup.count_dups
	    from #Bureau1 t1
		left outer join (SELECT [Equipment_Code]
					, [Account_Number]
					, [Sequence_Number]
					, count(*) as count_dups
				from #Bureau1
				group by [Equipment_Code]
					, [Account_Number]
					, [Sequence_Number]
					) dup
	on dup.Equipment_Code = t1.Equipment_Code
	and dup.Account_Number = t1.Account_Number
	and dup.Sequence_Number = t1.Sequence_Number
 
Please post the line number reported, it's cumbersome to read through all places trim functions are used to find erroneous usages.
Also please reduce the overlong line.

I spot one error in line 13: [tt]as RTRIM(LTRIM(t1.[Equipment_Flag]))[/tt], happens in several places.
AS aka ALIAS is just needing the name of the result column you want. The expression of it already is before AS.
So, you write out [tt]expression as columnname[/tt], but just take that as literal syntax description,
not as natural language instruction to use expressions for column names, the column names should just be normal names, the expression or expressions or even CASE statements used to compute the values are left of the AS, right of it only is a name.

Bye, Olaf.





 
Did use a text editor to review the text file (containing more than 800,000 records)and it appears that there are records that have the data misaligned.

Specifically, the data appears to be shifted right resulting in a "extra" column to the right for all the records that will error out.

Displayed below is an example of a record that will display an error;

Column Name------- Value Should Be------ Actual Value
Equipment_Code----- 246B------------ 246B
Account_Number--- 30986----------- 30986
Equipment_Flag-- fex-------------
State_Code------ Ca-------------- fex
Tracking_Number-- 44490----------- Ca
Sale_Date------- 20160212-------- 44490
Sale_Amount----- 1000------------ 20160212
Purchase_Zipcode-- 80717----------- 1000
------------------------------------------------80717

Of course, I can attempt to delete each row that has this type of error using the text editor. However, this is quite time consuming!

Any suggestions as to how to most effectively handle this type of situation?

I am thinking that if there is a table for all state codes then a CASE statement flagging the record as Null if the state code is any other value besides the 50 states may be a way to
nullify the record. However, having a case statement to compare the value of the state code field with each of the 50 states would be somewhat prohibitive.

Or, is there a text editor that will allow for the deletion of these type of errors so that the loading of the valid data will occur on a timely basis while the erroneous records are
reviewed and resolved by the source department? Have reviewed the data in two text editors so far and I cannot readily detect and delete the records containing this type of error.

Thanks in advance for any additional insight.




 
I was merely talking of a line in your posted code, nothing about rows of data.

In a thread, where you force the width of the whole thread to sizes most users won't be able to see on their display, you most likely don't get answers.

Besides that, when I asked about line of code, I'm asking for a line number the error messsage you got reported.

Oh, and if this is not related at all and you start a new question, please, dont even hijack your ow threads, simply start a new one.

Bye, Olaf.
 
The alias-related error is definitely one that I should have initially realized as an error.

But, I do not see an error with the field "t1.[Equipment_Flag]."

It appears that RTRIM is not needed when using "len."

If the Equipment_Flag field is a varchar field and I am trying to determine if the length of the value of this field is more than 1 character long, then
I am using "len."

Is using "len" the only method to determine if the value of a lookup code field is outside the range of acceptable values?

It appears that a better validation on all lookup fields is to determine if any value in all records within that column is "outside" the universe of acceptable values
instead of just validating the length of the field.

The code worked before I included validation for duplicate records in the text file and the validations for length and to trim the fields of any unwanted characters.


 
Ok. Now seeing the errors...

Thanks for the insight.
 
Now receiving the error, "Column name or number of supplied values does not match table definition."

Question - [Bold]When inserting into a temporary table (staging table 2) from another temporary table (staging table 1), must the data type be "varchar"
on all fields for each of the temporary tables?[/Bold]

Due to over 100 fields in each temporary table, I wanted to ask prior to modifying the data types on staging table 2 to resolve the error.


Code:
Use CapitalEquipment


IF OBJECT_ID('tempdb..#Bureau1') IS NOT NULL
BEGIN
	DROP TABLE #Bureau1
END
GO
--select * from #Bureau1
CREATE TABLE [#Bureau1] (		
		[Equipment_Code]   	[varchar] (60)  NULL,
		[Account_Number]   	[varchar] (60)  NULL,
		[Sequence_Number]   	[varchar] (60)  NULL,
		[Equipment_Flag]   	[varchar] (60)  NULL,
		[State_Code]   		[varchar] (60)  NULL,
		[Tracking_Number]   	[varchar] (60)  NULL,
		[Sale_Date] 		[varchar] (60) 	NULL,
		[Sale_Amount]  		[varchar] (60) 	NULL,
		[Purchase_Zipcode]   	[varchar] (70)  NULL
)	
GO


BULK INSERT #Bureau1
   	FROM 'C:\Data\Equip_2016.txt'
	
   WITH
      (
         FIELDTERMINATOR = '\,',
		 ROWTERMINATOR = '\n',
		 FIRSTROW = 2
      );
GO



IF OBJECT_ID('CapitalEquipment..Equipment_2016') IS NOT NULL
BEGIN
	DROP TABLE Equipment_2016
END
GO

CREATE TABLE [Equipment_2016](		
		[Equipment_Code]   [varchar] (5)  NOT NULL,
		[Account_Number]   [varchar] (45)  NOT NULL,
		[Sequence_Number] int NOT NULL,                   
		[Equipment_Flag]   [varchar] (1)  NULL,
		[State_Code]   [varchar] (6)  NULL,
		[Tracking_Number]   [varchar] (15)  NULL,
		[Sale_Date] date  NULL,                        
		[Sale_Amount]   int  NULL,
		[Purchase_Zipcode] [varchar] (9)  NULL,
		Constraint PK_Equipment_2016 Primary Key ([Equipment_Code], [Account_Number], [Sequence_Number])
	
)
GO


IF OBJECT_ID('tempdb..#validate_EquipmentData') IS NOT NULL
BEGIN
	DROP TABLE #validate_EquipmentData
END
GO


CREATE TABLE #validate_EquipmentData (	
	[Equipment_Code] [varchar] (5 ) NOT NULL,
	[Account_Number] [varchar] (45 ) NULL,
	[Sequence_Number] int NOT NULL,
	[Equipment_Flag]  [varchar] (1)  NULL,
	[State_Code]   [varchar] (6)  NULL,
	[Tracking_Number]   [varchar] (15)  NULL,
	[Sale_Date] date  NULL,   
	[Sale_Amount]   int  NULL,                      
	[Purchase_Zipcode]   [varchar] (9)  NULL,
	[COUNT_DUPS] int
)
GO


INSERT INTO #validate_EquipmentData
		SELECT t1.Equipment_Code,
		CASE  
		  WHEN RTRIM(LTRIM(t1.[Account_Number])) = ''
				THEN Null 
		  ELSE RTRIM(LTRIM(t1.[Account_Number]))
		END,	
		RTRIM(LTRIM(t1.Sequence_Number)),
		CASE																	
		  WHEN len(RTRIM(LTRIM(t1.[Equipment_Flag]))) > 1
				THEN Null 
		  ELSE RTRIM(LTRIM(t1.[Equipment_Flag]))
		END,
	
		CASE																	
		  WHEN len(RTRIM(LTRIM(t1.[State_Code]))) > 6
				THEN Null --Non numeric data
		  ELSE RTRIM(LTRIM(t1.[State_Code]))
		END,
				
		CASE																	
		  WHEN len(RTRIM(LTRIM(t1.[Tracking_Number]))) > 15
				THEN Null --Non numeric data
		  ELSE RTRIM(LTRIM(t1.[Tracking_Number]))
		END,

		CASE
		   when  len(t1.[Sale_Date]) = 8
		   and isdate(t1.[Sale_Date]) = 1
		   and RTRIM(LTRIM(t1.[Sale_Date])) like '[0-9][0-9][0-9][0-9][0-1][0-2][0-3][0-9]'
		   then convert(date,  t1.[Sale_Date], 112) 
		   else cast(null as date)
		END,

		CASE																	
		  WHEN ISNUMERIC(RTRIM(LTRIM(t1.[Sale_Amount]))) = 0
				THEN Null --Non numeric data
		  ELSE RTRIM(LTRIM(t1.[Sale_Amount]))
		END,

		CASE																	
		   When RTRIM(LTRIM(t1.[Purchase_Zipcode])) LIKE '[0-9][0-9][0-9][0-9][0-9]' 
			then RTRIM(LTRIM(t1.[Purchase_Zipcode]))
		   When RTRIM(LTRIM(t1.[Purchase_Zipcode])) LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]' 
			then substring(RTRIM(LTRIM(t1.[Purchase_Zipcode])),1 , 5)
		   else null
		END,
		dup.count_dups
	    from #Bureau1 t1
		left outer join (SELECT [Equipment_Code]
					, [Account_Number]
					, [Sequence_Number]
					, count(*) as count_dups
				from #Bureau1
				group by [Equipment_Code]
					, [Account_Number]
					, [Sequence_Number]
					) dup
	on dup.Equipment_Code = t1.Equipment_Code
	and dup.Account_Number = t1.Account_Number
	and dup.Sequence_Number = t1.Sequence_Number
 
you have 2 inserts on your code - you did not state which one gave the error.
Please be more clear on this.
As the second insert has the same number of columns as the destination table there are 2 options

1 - it is the bulk insert that is failing
2 - you have run the same code already, then changed the table layout without first dropping the table. To test this, manually drop the temp table before running the code again.


Regarding your question at the top no they do not need to be the varchar - but the data type of the columns being inserted need to be either the same data type or of a data type that can be converted to the destination datatype automatically.




Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Your CREATE TABLE #validate_EquipmentData defines a table with 10 columns, the Insert following also has 10.With such an insert you don'tneed to name your SELECT fields the same, so that's not the problematic code.
It must be elsewhere.

>the data type of the columns being inserted need to be either the same data type or of a data type that can be converted to the destination datatype automatically.
I second that, and to extend that though, as long as the bulk data you have in CSV form is unambiguous, your staging tables can already have the desired target type.
Integer columns are most often a positive case, dates are a negative case due to different formats and locales.

Every conversion, be it done by the bulk loading of data or an insert, will use implicit conversions where applicable, which is the limiting factor.
The notion to use varchar fields for everything just makes it easy as all strings can be read in and then be handled more individually than bulk insert allows.
If you must be suspicious about the quality of initial bulk files you will read in integer columns as varchar to reveal some row containing a scientific notation
of a large number, for example. This doesn't get reported in bulk insert, it might just disregard everthing following an E until the net comma and you'd perhaps
not even notice when checking for 0 or NULL.

Once you're through these initial checks and either corrected or kicked out wrong data you are able to trust your checked data to convert, you don't need to wait for
the conversion until the last step of the staging process, you can convert as soon as you're ready. Most often you'd use CAST, CONVERT or even stored procedure to do so,
but ins ame cases like varchar you checked is all digits and all <8 length, you can even use implicit conversions done by specifying an integer target fields with a
varchar value in the VALUES list or the SELECT field list used for the insert.

Bye, Olaf.
 
Could it be FIELDTERMINATOR = '\,'? If the field terminator is a comma, a backslash escape character isn't necessary, if it's taken literally the text file will be interpreted as one single column per row.

Bye, Olaf.
 
As previously stated, the source text data appears to have changed from prior months and is not of the quality expected.

Specifically, the Account_Number column contains several records with two different account numbers such as 21134-21135 and other
records may have one account number in the column "Equipment flag" (such as "21150") and then "21155" in the column to the immediate right of the Equipment flag column.

In instances such as this, the remaining data for the record is shifted right and the record is not usable.

The text file is delimited by a comma.

So, for all other records with this issue, the data is not usable.

Not definitively certain if this is the source of the error but it definitely appears to be a issue.
 
Can you post here either a sample file or 2 of the rows that have the issues mentioned above.

Straight cut and copy form the csv file opened with a text editor.

Issue may be a different one and solution may depend on what is on those lines

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Did run the sql script using last month's data and the error persists. Therefore, it appears that this month's text file is not the source of the error.

Did remove the validation for duplicate records and ran the following sql scripts to determine the columns and number of columns in the two temporary tables,
#Bureau1 and #validate_EquipmentData.

Code:
SELECT * FROM tempdb.sys.columns
WHERE object_id = (SELECT object_id FROM tempdb.sys.objects WHERE NAME like '#Bureau%' AND Type = 'U')


SELECT * FROM tempdb.sys.columns
WHERE object_id = (SELECT object_id FROM tempdb.sys.objects WHERE NAME like '#validate_EquipmentData%' AND Type = 'U')

[Bold]The number of columns are the same.[/Bold]

So, there appears to be an issue specifically related to the structure of the section, "Insert into #validate_EquipmentData"

Ran the section "Insert into #validate_EquipmentData" without the validation for duplicate records and the error persists.

Also, I did copy the fields from the CREATE TABLE [#Bureau1] section and pasted them to the CREATE TABLE #validate_EquipmentData section and the error still persists
when I run the Insert into #validate_EquipmentData section.

Review continues...
 
ByWill>Review continues

fredericofonsica>you have 2 inserts on your code - you did not state which one gave the error.

The error message you get from executing will tell you the line of error. It's a really (really really) helpful information to find an error.

Bye, Olaf.
 
you either have a changed file layout or your file has columns that also contain commas - and if this is the case you will never be able to do it with bulk insert natively.
See for some explanations.

In order to determine if it is a case of changed layout opening the file on excel is probably the easiest way.

If the file is a real CSV file - e.g. columns all with quotes involving the contents of the field, and quotes within the values of each column also escaped by a second quote then excel can also be used to determine if you have commas within the data.


If the above is not true then you can import the file onto sql server as a single column without splitting into rows -e.g. define a table with a field varchar(max) and load with a delimiter you know is not on the file - or use the row delimiter as column delimiter also.

then for each record loaded count the number of commas on each record. Where the count does not equate the number of expected columns - 1 you have the found the offending row(s).

There are alternatives to using the bulk insert to address its limitations.

the code used on that article is at the end.





Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Started from the beginning - Deleted database and re-creating all objects.

Also modified the sql script by deleting the second temporary table, #validation...

Now, with just 1 staging table, #Bureau1, is it possible to perform the validation for duplicate records within the text file?

Wouldn't the setup of a constraint - setting up a primary key allow for the insert of unique records into the database table?



Upon running the Bulk Insert section, I receive the following;


(893,402 row(s) affected)


(1 row(s) affected)


Usually, I believe that I would just receive 1 line.


Did review the text file in MS Excel and there are no double quotes around each field. Just appears that commas separate the fields.
Although I only display 9 columns of data, there are over 180 columns of data. So, this is a rather large text file - 893,402 records
and 185 columns! Due to confidentiality, I cannot display any additional fields.

It appears that the department that originated the text file have commas not only separating the fields but commas within a few columns (fields).
Quite a few records where the data extend up to 10 additional columns beyond the last column heading!

Has to be a more effective method by which I can readily identify records that have commas within the fields.

The suggestion below appears worth exploring. Exploring if I can also perform this using MS Excel and then pivoting the data...

(Would think that a text editor would allow for this functionality. The text editors that I have tried so far - Delimit and EmEditor do not.)

[Bold]
"If the above is not true then you can import the file onto sql server as a single column without splitting into rows -e.g. define a table with
a field varchar(max) and load with a delimiter you know is not on the file - or use the row delimiter as column delimiter also.

then for each record loaded count the number of commas on each record. Where the count does not equate the number of expected columns - 1
you have the found the offending row(s)."
[/Bold]

If I proceed to run the section "Insert into Equipment_2016", the error message is

Code:
Msg 8152, Level 16, State 14, Line 1
String or binary data would be truncated.
The statement has been terminated.


Code:
Use CapitalEquipment


IF OBJECT_ID('tempdb..#Bureau1') IS NOT NULL
BEGIN
	DROP TABLE #Bureau1
END
GO


CREATE TABLE [#Bureau1] (		
		[Equipment_Code]   	[varchar] (60)  NULL,
		[Account_Number]   	[varchar] (60)  NULL,
		[Sequence_Number]   	[varchar] (60)  NULL,
		[Equipment_Flag]   	[varchar] (60)  NULL,
		[State_Code]   		[varchar] (60)  NULL,
		[Tracking_Number]   	[varchar] (60)  NULL,
		[Sale_Date] 		[varchar] (60) 	NULL,
		[Sale_Amount]  		[varchar] (60) 	NULL,
		[Purchase_Zipcode]   	[varchar] (70)  NULL
)	
GO



IF OBJECT_ID('CapitalEquipment..Equipment_2016') IS NOT NULL
BEGIN
	DROP TABLE Equipment_2016
END
GO

CREATE TABLE [Equipment_2016](		
		[Equipment_Code]   [varchar] (5)  NOT NULL,
		[Account_Number]   [varchar] (45)  NOT NULL,
		[Sequence_Number] int NOT NULL,                   
		[Equipment_Flag]   [varchar] (1)  NULL,
		[State_Code]   [varchar] (6)  NULL,
		[Tracking_Number]   [varchar] (15)  NULL,
		[Sale_Date] date  NULL,                        
		[Sale_Amount]   int  NULL,
		[Purchase_Zipcode] [varchar] (9)  NULL,
		Constraint PK_Equipment_2016 Primary Key ([Equipment_Code], [Account_Number], [Sequence_Number])
	
)
GO



BULK INSERT #Bureau1
   	FROM 'C:\Data\Equip_2016.txt'
	
   WITH
      (
         FIELDTERMINATOR = '\,',
		 ROWTERMINATOR = '\n',
		 FIRSTROW = 2
      );
GO




INSERT INTO Equipment_2016
		SELECT t1.Equipment_Code,
		CASE  
		  WHEN RTRIM(LTRIM(t1.[Account_Number])) = ''
				THEN Null 
		  ELSE RTRIM(LTRIM(t1.[Account_Number]))
		END,	
		RTRIM(LTRIM(t1.Sequence_Number)),
		CASE																	
		  WHEN len(RTRIM(LTRIM(t1.[Equipment_Flag]))) > 1
				THEN Null 
		  ELSE RTRIM(LTRIM(t1.[Equipment_Flag]))
		END,
	
		CASE																	
		  WHEN len(RTRIM(LTRIM(t1.[State_Code]))) > 6
				THEN Null --Non numeric data
		  ELSE RTRIM(LTRIM(t1.[State_Code]))
		END,
				
		CASE																	
		  WHEN len(RTRIM(LTRIM(t1.[Tracking_Number]))) > 15
				THEN Null --Non numeric data
		  ELSE RTRIM(LTRIM(t1.[Tracking_Number]))
		END,

		CASE
		   when  len(t1.[Sale_Date]) = 8
		   and isdate(t1.[Sale_Date]) = 1
		   and RTRIM(LTRIM(t1.[Sale_Date])) like '[0-9][0-9][0-9][0-9][0-1][0-2][0-3][0-9]'
		   then convert(date,  t1.[Sale_Date], 112) 
		   else cast(null as date)
		END,

		CASE																	
		  WHEN ISNUMERIC(RTRIM(LTRIM(t1.[Sale_Amount]))) = 0
				THEN Null --Non numeric data
		  ELSE RTRIM(LTRIM(t1.[Sale_Amount]))
		END,

		CASE																	
		   When RTRIM(LTRIM(t1.[Purchase_Zipcode])) LIKE '[0-9][0-9][0-9][0-9][0-9]' 
			then RTRIM(LTRIM(t1.[Purchase_Zipcode]))
		   When RTRIM(LTRIM(t1.[Purchase_Zipcode])) LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]' 
			then substring(RTRIM(LTRIM(t1.[Purchase_Zipcode])),1 , 5)
		   else null
		END,
		from #Bureau1 t1
		where [Equipment_Code] is not null
		and [Account_Number] is not null
		and [Sequence_Number] is not null


Code:
POSSIBLY DELETE THE FOLLOWING SECTION


IF OBJECT_ID('tempdb..#validate_EquipmentData') IS NOT NULL
BEGIN
	DROP TABLE #validate_EquipmentData
END
GO


CREATE TABLE #validate_EquipmentData (	
	[Equipment_Code] [varchar] (5 ) NOT NULL,
	[Account_Number] [varchar] (45 ) NULL,
	[Sequence_Number] int NOT NULL,
	[Equipment_Flag]  [varchar] (1)  NULL,
	[State_Code]   [varchar] (6)  NULL,
	[Tracking_Number]   [varchar] (15)  NULL,
	[Sale_Date] date  NULL,   
	[Sale_Amount]   int  NULL,                      
	[Purchase_Zipcode]   [varchar] (9)  NULL,
	[COUNT_DUPS] int
)
GO


INSERT INTO #validate_EquipmentData
		SELECT t1.Equipment_Code,
		CASE  
		  WHEN RTRIM(LTRIM(t1.[Account_Number])) = ''
				THEN Null 
		  ELSE RTRIM(LTRIM(t1.[Account_Number]))
		END,	
		RTRIM(LTRIM(t1.Sequence_Number)),
		CASE																	
		  WHEN len(RTRIM(LTRIM(t1.[Equipment_Flag]))) > 1
				THEN Null 
		  ELSE RTRIM(LTRIM(t1.[Equipment_Flag]))
		END,
	
		CASE																	
		  WHEN len(RTRIM(LTRIM(t1.[State_Code]))) > 6
				THEN Null --Non numeric data
		  ELSE RTRIM(LTRIM(t1.[State_Code]))
		END,
				
		CASE																	
		  WHEN len(RTRIM(LTRIM(t1.[Tracking_Number]))) > 15
				THEN Null --Non numeric data
		  ELSE RTRIM(LTRIM(t1.[Tracking_Number]))
		END,

		CASE
		   when  len(t1.[Sale_Date]) = 8
		   and isdate(t1.[Sale_Date]) = 1
		   and RTRIM(LTRIM(t1.[Sale_Date])) like '[0-9][0-9][0-9][0-9][0-1][0-2][0-3][0-9]'
		   then convert(date,  t1.[Sale_Date], 112) 
		   else cast(null as date)
		END,

		CASE																	
		  WHEN ISNUMERIC(RTRIM(LTRIM(t1.[Sale_Amount]))) = 0
				THEN Null --Non numeric data
		  ELSE RTRIM(LTRIM(t1.[Sale_Amount]))
		END,

		CASE																	
		   When RTRIM(LTRIM(t1.[Purchase_Zipcode])) LIKE '[0-9][0-9][0-9][0-9][0-9]' 
			then RTRIM(LTRIM(t1.[Purchase_Zipcode]))
		   When RTRIM(LTRIM(t1.[Purchase_Zipcode])) LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]' 
			then substring(RTRIM(LTRIM(t1.[Purchase_Zipcode])),1 , 5)
		   else null
		END,
		dup.count_dups
	    from #Bureau1 t1
		left outer join (SELECT [Equipment_Code]
					, [Account_Number]
					, [Sequence_Number]
					, count(*) as count_dups
				from #Bureau1
				group by [Equipment_Code]
					, [Account_Number]
					, [Sequence_Number]
					) dup
	on dup.Equipment_Code = t1.Equipment_Code
	and dup.Account_Number = t1.Account_Number
	and dup.Sequence_Number = t1.Sequence_Number


 
BxWill,

As I said before with plain bulk insert that is never going to work.
And now that you said that business have confirmed that some of the values contain "," then that just makes it final on your current approach.

Regarding the link I supplied before I am sorry to say that the code does not seem to work. I'll try and contact Phill about it.

As another option you can use the following which may or not work with your files. One of the major requirements is that the file is a real CSV file.
By this I mean, as i stated before, that all columns that contain "," are delemited by double quotes, and if any of the columns also contain a doublequote that that is also escaped by another doublequote.


load the function attached to your database

Execute it for your case as follows. DO NOT change the following code except to change the location of the file.
Once you are happy you may change the name of the temp table below if you wish.
This also assumes that your csv file does contain a header.

This function for big files may be slow. Possibly the only alternatives will be either a CLR or doing this outside T-SQL with either SSIS or a powershell/c# script.
Or a similar tsql code to the one on the attached function.

Code:
   if object_id('tempdb..#CSVContents','U') is not null drop table #CSVContents
   create table #CSVContents (_RowNo_ int identity(1,1))
   
   exec dbo.usp_ProcessCSVFile
      @CSVFullyQualifiedFileName='C:\Data\Equip_2016.txt'
     ,@HeaderRowExists=1
     ,@DestinationTable='#CSVContents'

	 select * from #CSVContents

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top