Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

TSQL - Bulk Insert - Composite Primary Keys - Cannot insert the value NULL into column 2

Status
Not open for further replies.

BxWill

MIS
Mar 30, 2009
367
US
Using TSQL with bulk insert to create a table in Sql Server 2014 via Sql Server Management Studio.

Encountered the following error message upon specifying "Non NULL" in the CREATE TABLE section for a composite primary key that is comprised of three fields; "JurisdictionCode", "AccountNo", "SequenceNbr."

Code:
Msg 515, Level 16, State 2, Line 1
Cannot insert the value NULL into column 'AccountNo', table 'EquipmentLocation.dbo.Equipment_2016'; column does not allow nulls. INSERT fails.
The statement has been terminated.

[Bold]
What is the preferred method to setup a composite primary key during Bulk Insert?[/Bold]

The three fields (JurisdictionCode, AccountNo, SequenceNbr) uniquely identify the record and is primarily used to join with another table, "EquipmentTax", that has the same composite primary keys. Of course, there are additional lookup tables as well.

One may argue that all tables should have auto-generated PKs. It appears that if the primary key(s) change, then an auto-generated identity column may have value... However,I have already designated the three fields as composite primary keys...

[Bold]
So, would a solution be the addition of an identity column and set it as the primary key?[/Bold]

For example, Of the 20,000 records that I receive during the month of June 2017, if there are 1,000 records that have a nonsensical value in any one of the primary key fields, the 1000 records will not be allowed in my table, "Equipment_2017", because there will be a NULL in one of the primary key fields.

As this is the first Sql Server database that I have setup, my objective is to initially setup the database correctly and efficiently as possible.

[Bold]What modifications to the code below is necessary to accomplish the objective?[/Bold]

Code:
USE master ;  
GO  
DROP DATABASE EquipmentLocation ;  
GO  

CREATE DATABASE EquipmentLocation
go
 

Use EquipmentLocation

Go	

CREATE TABLE #EquipmentData1 (	
	[JurisdictionCode] [varchar] (60 ) NULL,
	[AccountNo] [varchar] (60 ) NULL,
	[SequenceNbr] [varchar] (60 ) NULL,
	[Zipcode] [varchar] (60 ) NULL,
	[SaleAmount] [varchar] (60 ) NULL,
	[SaleDate] [varchar] (60 ) NULL,
)	

CREATE TABLE [Equipment_2017](		
		[JurisdictionCode] [varchar] (5 ) NOT NULL,   --<<<<  Changed from "NULL" to "NOT NULL" to create composite primary key
		[AccountNo] [varchar] (45 ) NOT NULL,      --<<<<  Changed from "NULL" to "NOT NULL" to create composite primary key
		[SequenceNbr] int NOT NULL,              --<<<<  Changed from "NULL" to "NOT NULL" to create composite primary key
		[Zipcode] [varchar] (9) NULL, 
		[SaleAmount] int NULL,
		[SaleDate] date NULL, ---format is (YYYYMMDD)
		PRIMARY KEY ([JurisdictionCode], [AccountNo], [SequenceNbr])	
	)

BULK INSERT #EquipmentData1
   	FROM 'C:\EquipmentLocation\Data\201706EquipmentData.txt'    
   WITH
      (
         FIELDTERMINATOR = '\t',
		 ROWTERMINATOR = '\n',
		 FIRSTROW = 2
	 )

Select * from #EquipmentData1

insert into Equipment_2017
select 
RTRIM(LTRIM([JurisdictionCode])),
RTRIM(LTRIM([AccountNo])),
RTRIM(LTRIM([SequenceNbr])),

CASE																	
   When rtrim(ltrim([Zipcode])) LIKE '[0-9][0-9][0-9][0-9][0-9]' then rtrim(ltrim([Zipcode]))
   When rtrim(ltrim([Zipcode])) LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]' then substring(rtrim(ltrim([Zipcode])),1 , 5)
   else null
end as [Zipcode],

CASE																	
  WHEN ISNUMERIC(RTRIM(LTRIM([SaleAmount]))) = 0
        THEN Null 
  ELSE RTRIM(LTRIM([SaleAmount]))
END as [SaleAmount],

CASE
   when  len([SaleDate]) = 8
   and isdate([SaleDate]) = 1
   and rtrim(ltrim([SaleDate])) like '[0-9][0-9][0-9][0-9][0-1][0-2][0-3][0-9]'
   then convert(date,[SaleDate], 112)
   else cast(null as date)
end as [SaleDate]
from #EquipmentData1
where 
[SaleDate] between convert(date, '2017-05-01') and convert(date, '2017-05-31')
and RTRIM(LTRIM([SaleAmount])) is not null
OR RTRIM(LTRIM([SaleDate])) is not null 
OR RTRIM(LTRIM([SequenceNbr])) is not null

Select * from Equipment_2017





 
change the OR's to AND's and only valid records will be inserted - the others will not be selected

Also your isnumeric can in some cases return valid for invalid numbers.
see the following links for some examples and ways around it.



As another example, on input files I process both the following strings are valid and represent the same value
31,234.00
31.234,00

both equate to 31234.00 - decimal with 2 decimal places

My advice for validating numbers is to write a udf that checks if there are any invalid chars on the string and fail if so.
Such a function can also depending on how you use numbers on your country replace the likes of currency signs, thousands separators, move signs from the right side to the left side of the string e.g. 1- fails to convert to int, but -1 is valid but many software does put the sign on the right side
 
Will review the links.

Just to ensure that I understand, if I change the "Or" to "And", then wouldn't all of the records that are loaded not have any fields with "NULL?" Is the implication that "valid records" are records without a "NULL" in any of the fields?

By specifying the composite primary keys, if any one of the three fields has a NULL, then the entire record will not load but there
may be other fields that have all three composite primary key fields populated but will have a "NULL" populated in one or more of the other fields? But, by changing the ORs to ANDs, this will not be the case. Is this correct?

Is it not a good idea to display "NULL" in the record for one or more fields if the data is not available or is nonsensical?

Do not mean to be redundant, but I am ensuring that I understand this fully.

I did make the change and now receive the error below;

Code:
Msg 2627, Level 14, State 1, Line 1
Violation of PRIMARY KEY constraint 'PK__Equipmen__7AE3994FB824F069'. Cannot insert duplicate key in object 'dbo.Equipment_2017'. The duplicate key value is (8432142, 2.0011E+15, 1).
The statement has been terminated.

It appears that there is one or more records with the same composite primary key. Or, maybe one or more records does not have one or more of the composite primary key fields populated?
 
>It appears that there is one or more records with the same composite primary key. Or, maybe one or more records does not have one or more of the composite primary key fields populated?

As the message tells a specific tuple of values, it's that tuple of values, which already exists. Nothing more, nothing less. 2.0011E+15 sounds very suspicious of a sequence number, which has become too high, so that neighboring numbers make no difference anymore. Your primary key definition is [tt]PRIMARY KEY ([JurisdictionCode], [AccountNo], [SequenceNbr])[/tt], so 2.0011E+15 corresponds to sequence number. It is defined as [varchar] (60 ), but should contain a number. If you get so high in numbers, you'öö need BigInt.

Whatever is your sequence number source, which increments +1 in each step. It is here converted to varchar in scientific notation with E+15 = 10^15. The precision is only 0.0001E+15 (a 2.0011 has 4 decimals), which means the numbers (in non-scientific notation) 2001050000000000 to 2001149999999999 will all convert to the same string 2.0011E+15, this whole number range of numbers is always appearing as the identical string. So no matter if all the compound keys would differ in SequenceNbr, they now need to differ in JurisdictionCode and AccountNo to get unique.

You failed to choose the right data type for the sequence number to be a bigint, which covers up to E+18. I don't know whether you get the idea choosing varchar(60) will get you to the range of 60 digit numbers, but that's not how it works.

Learn your data types!!!

Bye, Olaf.
 
Using the following sql to determine duplicate rows.

Code:
SELECT 
    [JurisdictionCode], [AccountNo], [SequenceNbr] 
FROM (
    
    SELECT [JurisdictionCode], [AccountNo], [SequenceNbr] 
    FROM #EquipmentData1
    UNION ALL
    SELECT [JurisdictionCode], [AccountNo], [SequenceNbr] 
    FROM #EquipmentData1
) x
GROUP BY 
    [JurisdictionCode], [AccountNo], [SequenceNbr]
HAVING COUNT(1) > 1
order by [JurisdictionCode], [AccountNo], [SequenceNbr]

Receive approximately 36,000 records in query result similar to below.

The AccountNo column contains some data that is in scientific notation...

Why the column displays the values in scientific notation?

The AccountNo does indeed contain data that begins with "1" not "2" as I indicated in
the prior posting.

Note, the data type for column "AccountNo" IS VARCHAR.


Code:
JurisdictionCode	AccountNo			SequenceNbr
42145		NULL					0
42145		1.0113E+15				1
42145		1.0011E+15				1
42145		1.005E+14				1
42145		1.006E+14				1
42145		1.01E+14				1
42145		1.1012E+15				1
42145		1.2012E+15				1
43320		N24000100130031000			1
43320		N24000100130031300			1
43320		N24000100130025900			1
43320		N242000100130018200			1
43320		N242000100130023500			1
50224		V84 14101 0001				1
50224		V84 14102 0003				1 
50224		V84 14102 0013				1

It appears that I need to validate that no records with duplicate composite primary keys are within the temp table. Or, can duplicate records exist
in the temporary (staging) table as long as duplicate records are not inserted into the production table?


As far as a resolution to the error, I am still reviewing...
 
First, sorry for getting AccountNo and SequenceNbr ni the wrong order, but [varchar] (60 ) is the type for both of them and it doesn't feel right for a sequence number.

Your latest code will make every tuple of values appear more than once, that's what UNION ALL does, it makes every count=2 at the minimum.

It is the account numbers in the range E+15 that seem like coming from an excel cell having been formatted to "general" or "number" when it should be a string. As I showed above a precision in the range of 0.0001E+15 =1E+10 will make a range of 10 billion account numbers equal and thus create duplicate keys.

In some step of the processing, you have a conversion problem. On that arises, if you, for example, put 2001050000000000 and 2001050000000001 into an excel cell, which then displays 2.0011E+15 fro both of them, spits that out into CSV and you read this in as 2.0011E+15 while originally two different account numbers where meant.

The data you show now tells, that the nature of account numbers includes alphanumerical numbers, too. Long/Large numbers alone make it a case of storing them into character type fields and keep them this way. I assume on at least one step in the processing the account number is interpreted as a numeric value and then a conversion from the original long number to scientific notation kills the uniqueness.

Bye, Olaf.
 
Did get the files cleaned up. Now to resume...

Recap: I am responsible for loading multiple monthly text files of equipment sales data into the Sql Server 2014 database.

Another table within the database will be "Capital Tax" that has the field "AccountNo." It appears that there will be a one to many relationship between the table "Capital Tax" and yearly Equipment tables.

In other words, there are several records within the Equipment tables ("Equipment_2015", "Equipment_2016" and "Equipment_2017") that have the same values for the fields - [JurisdictionCode], [AccountNo], [SequenceNbr] but a different value for the field "SaleDate." Of course, this implies that the equipment was sold and resold one or more times during the year.

[Bold]
Therefore, it appears that for the Capital Tax table, the primary Key field should be "AccountNo." but for the Equipment tables, the primary key should be an auto-incrementing id number field and the composite foreign keys should be [JurisdictionCode], [AccountNo], [SequenceNbr], and [Sale Date].[/Bold]

Was not successful trying to setup the auto-incrementing id field on the Equipment tables. Some have advised that the Id field needs to be setup in the text file using a text editor. Others have advised that auto-incrementing id field does not have to be setup in each individual text file but the field can be "auto-populated" upon the bulk insert. ???

[Bold]
Any insight as to the preferred method to insert a auto-incrementing id in a sql server table when the text file does not contain the id field?[/Bold]

However, I will proceed to complete the other portion of the Sql script.


Question - Currently, I am using multiple temporary tables (#EquipmentData1 , #EquipmentData2 , #EquipmentData3, etc. ) to load the monthly data. For example, I use #EquipmentData1 to load data from 201701_EquipmentData.txt, then I plan to use #EquipmentData2 to load data from 201702_EquipmentData.txt, and so on.


[Bold]
Considering that the structure of each temporary table is the same, any thought to using the same temporary table to load a Sql Server table, then maybe empty the temporary table and load the next month's text file, and so on?[/Bold]

(Note, this issue arises because I have a pre-existing set of monthly text files to load)

For example, I have the following text files to load;

'C:\EquipmentLocation\Data\'

201701_EquipmentData.txt
201702_EquipmentData.txt
201703_EquipmentData.txt
201704_EquipmentData.txt
201705_EquipmentData.txt
201706_EquipmentData.txt
201707_EquipmentData.txt


Code:
USE master ;  
GO  
DROP DATABASE EquipmentLocation ;  
GO  

CREATE DATABASE EquipmentLocation
go
 

Use EquipmentLocation

Go	

CREATE TABLE #EquipmentData1 (	
	[JurisdictionCode] [varchar] (60 ) NULL,
	[AccountNo] [varchar] (60 ) NULL,
	[SequenceNbr] [varchar] (60 ) NULL,
	[Zipcode] [varchar] (60 ) NULL,
	[SaleAmount] [varchar] (60 ) NULL,
	[SaleDate] [varchar] (60 ) NULL,
)	

CREATE TABLE [Equipment_2017](		
		[JurisdictionCode] [varchar] (5 ) NOT NULL, -<<<<  Changed from "NULL" to "NOT NULL" to create composite primary key
		[AccountNo] [varchar] (45 ) NOT NULL,      --<<<<  Changed from "NULL" to "NOT NULL" to create composite primary key
		[SequenceNbr] int NOT NULL,              --<<<<  Changed from "NULL" to "NOT NULL" to create composite primary key
		[Zipcode] [varchar] (9) NULL, 
		[SaleAmount] int NULL,
		[SaleDate] date NULL, ---format is (YYYYMMDD)
		PRIMARY KEY ([JurisdictionCode], [AccountNo], [SequenceNbr])	
	)

BULK INSERT #EquipmentData1
   	FROM 'C:\EquipmentLocation\Data\201701_EquipmentData.txt'    
   WITH
      (
         FIELDTERMINATOR = '\t',
		 ROWTERMINATOR = '\n',
		 FIRSTROW = 2
	 )

Select * from #EquipmentData1

insert into Equipment_2017
select 
RTRIM(LTRIM([JurisdictionCode])),
RTRIM(LTRIM([AccountNo])),
RTRIM(LTRIM([SequenceNbr])),

CASE																	
   When rtrim(ltrim([Zipcode])) LIKE '[0-9][0-9][0-9][0-9][0-9]' then rtrim(ltrim([Zipcode]))
   When rtrim(ltrim([Zipcode])) LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]' then substring(rtrim(ltrim([Zipcode])),1 , 5)
   else null
end as [Zipcode],

CASE																	
  WHEN ISNUMERIC(RTRIM(LTRIM([SaleAmount]))) = 0
        THEN Null 
  ELSE RTRIM(LTRIM([SaleAmount]))
END as [SaleAmount],

CASE
   when  len([SaleDate]) = 8
   and isdate([SaleDate]) = 1
   and rtrim(ltrim([SaleDate])) like '[0-9][0-9][0-9][0-9][0-1][0-2][0-3][0-9]'
   then convert(date,[SaleDate], 112)
   else cast(null as date)
end as [SaleDate]
from #EquipmentData1
where 
[SaleDate] between convert(date, '2017-01-01') and convert(date, '2017-12-31')
and RTRIM(LTRIM([SaleAmount])) is not null
and RTRIM(LTRIM([SaleDate])) is not null 
and RTRIM(LTRIM([SequenceNbr])) is not null
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top