Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SSIS import rows or whole table- primary key issues

Status
Not open for further replies.

glgcag1991

Programmer
Oct 15, 2007
72
US
I have been given the task of converting several Access databases to SQL Server 2005- I'm doing it in stages so I have the need to import Access data into the SQL Server hourly for the users that will consume it read-only for now.

What is the best process for doing the hourly import:

1. Complete table import (if so, how does the DROP TABLE and CREATE TABLE impact performance if I have heavy usage of that table during the hourly import?)

2. Compare Access table with SQL Server table and only import new records. (If so, with primary keys I will have to drop the primary key constraint on the tables, import the new records, then add the constraint back, right? Again, how will that impact heavy usage during the hourly import?)

I have looked a bit at Linked Servers but I need to do some transformations in the hourly SSIS job so that the data fits into the different architecture I have between the Access and SQL Server databases.

I appreciate the help!



 
You are always better off importing into staging tables rather than trying to dump straight into your live tables. Unless your PK's are user aware (i.e. values they will recognise and search under), drop them and create new ones.

After you've dumped into a staging table, the choice as to whether you drop/recreate the live table or add new rows, will depend on the volume of data you have and the user access requirements. Personally, I wouldn't be buggering about with a live db during the day anyway, but if you must you need to weigh up the impact on the indexes. You could always drop the indexes, dump the new rows in and recreate the indexes. Again though, live system... hmmm... rather you than me.

Tell them you need a dev box and a test box.
 
In the SSIS world you are not always better off staging into a set of tables and the processing from there. The main performance area for SSIS is it's abilties to process data in memory very quickly removing the boundaries of the RDBMS world.

You Base data flow would:
1) Datasource - This would be the table in access you are importing.

2) Lookup Task - This task you would lookup your PK column(s) from your destination table. Join the Dataflow to the lookup using your PK columns and insert a single column from the SQL table. Configure the error handeling to ignore unmatches columns.

3) Conditional Split Task - Using IsNull(SQLColumn) you can split the data into a flow that contains only the columns where the SQL Column is Null meaning it does not exist in your destination.

4) Data Destination - The table in the SQL DB where you are inserting your data.


There are times when first staging the data to a set of tables or even another DB is of benefit. If you have millions of rows, the source is a very busy server or complex joins are required and not supported by indexes. These would be the 3 main times you may consider staging locally before processing. Remember you want to ETL not ELTL.
 
Also there is a SSIS forum that focuses on SSIS issues.

forum1555
 
Thank you MDXer for the process. That is exactly what I'd like to do, but since the table in SQL has a primary key constraint on it, how would I insert new records from the Access database into the SQL table? Is there a way to set up the SQL Server table with the primary key yet still allow inserts on the table?
 
if you want to by pass the PK why have it?

If you filter out the records that violate the PK then this shouldn't even be an issue.
 
I guess what I mean is if I have a Project table and I want to update the SQL Server with any new projects that have been added in the Access database, the Access database is generating the ProjectID since it's an autonumber PK field, but the PK field in the SQL Server won't allow those numbers to be inserted.

I suppose I could create the Project table in SQL Server without the PK constraint, which would then allow me to add new records to the table and it will still function as a primary key for all intents and purposes as far as my joins go . . .

Does that make sense?
 
do you mean that in the SQL table you defined the project column as being an identity column as well so you can't insert your access values?


if so just remove the identity property from the column.
 
So, to clarify- the identity property of my ID field is the only thing preventing me from inserting new rows into the table? I can keep the primary key constraint on the ID field and it will still allow inserts?

 
Personally, I would have closed the database down and imported all the records in single user mode. This way you could set indentity insert to on and use the existing pks. If the system is in prod, your best best is to add a column for the old pk to the tables and let SQL Server create a new pk. Then when you add the records for the child tables, join to the parent table on the old id for the select to get the records to insert but put the new id in the table when you do the insert.

"NOTHING is more important in a database than integrity." ESquared
 
Placing the database into single user mode Hourly seems a bit much. It really depends on the intended use for the system. If it is a reporting system then most if not all of the referential integrity should be handled within the ETL and then the best implementation would be a star schema where identity PKs are more of a dimensional use.
 
I think I found the solution: after comparing the Access and SQL tables and determining the new rows that need to be updated to the SQL Server, set IDENTITY INSERT to ON for the table I'm inserting the new records into- once done, turn it to OFF and I should be good!

Here's a link to the suggestion I found:

 
Bad bad idea if the table is being accessed by anyone other than you. Fine if you are in single user mode on the target database.

MDXer, I wasn't suggesting doing this hourly, but doing the whole process once with data entry shut down for the duration of the conversion. This is the only way to guarantee, you don;t have data problmes inthe resulting data or user issues when you do such things as setting the identity insert on (which now means no one can enter records whil this process is running as all other inserts will be broken because they don't include the identity column).

"NOTHING is more important in a database than integrity." ESquared
 
SQLSister,

But since this is a read-only database as far as users are concerned, won't the IDENTITY INSERT be the perfect solution since the package I'm running hourly is the only updating on the database?
 
glgcag1991 said:
But since this is a read-only database as far as users are concerned, won't the IDENTITY INSERT be the perfect solution since the package I'm running hourly is the only updating on the database?

So why even have the identity property on the column? You can create the column as it is on your source DB but not make it an identity column. this removes the needless step of setting identity insert on and then back off.

SQLSister said:
which now means no one can enter records whil this process is running as all other inserts will be broken because they don't include the identity column

In a well designed and efficient package you would probably have this issue anyways as you would enable the Table Lock property for the fast load
 
MDXer,

Point well taken- I'll just set the PK fields to Int NOT NULL and do it as you suggest. I think I'm just trying to find a way to update IDENTITY fields on the fly as I imagine it's necessary at some point (as is mentioned in the article link I posted). I guess it's much more complex to do than I thought it was going to be, at least on an updateable database, when taking into consideration possible inserts and updates concurrent with inserting into an IDENTITY field.
 
On a database where you have other elemens Inserting and Modifying data then you would want to do everything in your power to find a way to not have to do an identity insert especially on a frequency as you described. if you have other users inserting records to this table and you do an identity insert you run the risk of something failing if you are inserting a record with an id that was just inserted by a previous process.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top