Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SSIS Best Way To Load Huge Tables

Status
Not open for further replies.

leiela

Programmer
Dec 14, 2007
15
GB
Ok im reasonably new to SSIS and i was wondering if you could help.

I have a large database from which i need to pull large amounts of data. Row's numbering in the several million easily.

I need to pull the data out, aggrigate and transform it and pump it into a new database. I need this to run on an overnight load and due to amount of changing data it seems most reasonable to truncate and re-load each evening rather than performing a merge.

The problem is that i have 16 of these tables to load, running each one as a piece of SQL takes about 40mins to an hour each on the server we have.

First Question. As all these aggrigated tables use SQL pulling data from the same tables im i correct in assuming its a bad idea to try and run them in parallel??

Secondly, My concern is that if one of these 16 tables fail's i can't afford to re-start the whole lot in the morning when i recive the notification because that will cause an outage on the entire datawarehouse for an entire day because it takes so long to load. That is unacceptable for the the business.

Ideally what im looking for is a solution which will allow me to run all tasks/packages in such a way that it will allow the rest to load even if one fail's.

Is there a way to get the task to continue on failure? just sending an alert to let me know so i can run that task seperately rather than failing the entire package.
 
Yes, you can change the on failure properties of the data transfer tasks so that they don't fail the entire package.

If you have enterprise edition on the source system running them in parallel isn't an issue as the engine can do some cool stuff in the back ground to save resources. If the source is standard edition it isn't so bad to do either, especially if the source table fits into memory.

If you are pulling from the same table for all 6 feeds, why not have a single source, then multi-cast that source into the 6 different transformations, then write the data out to the 6 destination tables. This was you only read the source once.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / SQL 2005 BI / SQL 2008 DBA / SQL 2008 DBD / SQL 2008 BI / MWSS 3.0: Configuration / MOSS 2007: Configuration)
MCITP (SQL 2005 DBA / SQL 2008 DBA / SQL 2005 DBD / SQL 2008 DBD / SQL 2005 BI / SQL 2008 BI)
MCM (SQL 2008)
MVP

My Blog
 

Found the failure property and it works wonderfully thanks.

Yeah we are running enterprise so i'll try having them run in parallel and see what happens.

However as for using one source item, they are all coming from the same database/collection of tables but each feed implements a SQL statement .. One feed might pull data from tables A and B while another might use B,C and D for example.

The SSIS package is being used to take the data from a relational Database and transform it into a de-normalised, starschema for data warehousing with Facts and Dimensions.
 
It might help if you look at this on a table by table basis. Instead of doing complex joins against the source system database, grab the deltas from each individual table you need on the source system. So you would grab new or changed rows from table A, new or changed rows from table B, etc. You can store them in a persistent staging or ODS (depending on what terminology you use). You could then build your dimension and fact tables from your persistent copies. These persistent copies would either reside on the data warehouse server itself, or the ETL server.
 
I see a few different ways that the data flows could be changed to optimize performance. It'll just be a matter of finding the right one for your setup.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / SQL 2005 BI / SQL 2008 DBA / SQL 2008 DBD / SQL 2008 BI / MWSS 3.0: Configuration / MOSS 2007: Configuration)
MCITP (SQL 2005 DBA / SQL 2008 DBA / SQL 2005 DBD / SQL 2008 DBD / SQL 2005 BI / SQL 2008 BI)
MCM (SQL 2008)
MVP

My Blog
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top