Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SSIS Insert New Data Only to Backup Tables 1

Status
Not open for further replies.

kevinc88

IS-IT--Management
Nov 3, 2015
2
US
Hello,

I am new to using SSIS and have been tasked with creating a package to be a backup for some tables. Basically I have a total of 6 tables which have data in them that I need to create copies of in another database. I have created the 6 identical tables in the other database but was curious if someone could explain what SSIS tools I would need to use to create a package that would allow me to copy the NEWLY inserted rows from the original 6 tables to the newly created 6 tables for my backup in the other database. I have stumbled my way through having SSIS copy all the rows over but that will become cumbersome in the future as 220,000 rows will quickly turn into millions within the next few years. So I don't want to copy that many rows everyday when most of the data is already going to be inside of the tables already.

The original 6 tables are updated constantly all day and I just need to be able to run an SSIS package once per day to move that data into the backup tables, but again I just want to try and move only the new rows and not copy the entire table over again.

Any help would be appreciated.

I am using SSIS 2012.

Best,

Kevin C
 
several factors get into place on a situation like this.

first question - will the backup tables be on a database on the same instance, and if so should have them on a different instance be an option to this design?

If the answer to the first question is no to the first point or yes to the second point then you have 2 options
1 - CDC or CT (data capture or change tracking) or custom trigger (this last one required a unique way of identifying rows)
2 - copy full tables

if the answer to the first question is Yes and No respectively then you may have another option if all tables have a way of uniquely identifying the rows. This option is to join both live and archive tables on their unique keys and copy over the new records based on this criteria - with high volumes this will eventually get slower with time

and finally if all tables have a "insertedOnDate" column that only gets updated on then this can also be used as a way to limit data transfer only to new rows.

And note that nothing of this is SSIS - it is all data selection.

All options can be done with a SSIS dataflow task - the join option can also be done with a straight insert into the archive table.
And for the purists I am not considering a linked server as an option for this as high volumes would make it quite slow

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
I did read up on CDC but I don't think my boss will like that, just a hunch. One concern I do have, about copying the full table every time is that the program that is constantly populating the original 6 tables has had the data go missing whenever the program had an update. So that is why I am creating the backup. So if I am copying the tables over that won't work because if the table has missing data and I copy it would mean that my 6 backup tables will have missing data. So it sounds like I would just need to make an argument for the CDC, which I can just add the changes that are collected in the CDC to my backup table right? I know how it does keep a record of inserts, updates, and deletes but nothing I read on it talks about adding the inserts,updates, and deletes to the table. Sounded more like an auditing tool to see when something was altered and how it was altered.

Thanks again.
 
CDC will keep for a period of time a full copy of each record inserted, deleted and updated (before and after image).
so when using CDC one copies the data directly from the CDC version of the table. This is the one to use if you need the original record inserted into the table before any update is done to that record.

CT will only keep a list of the key fields and metadata about which fields changed within the record.

see for details.

It can be used as an auditing tool but it is commonly used for incremental updates of other tables which is precisely what you are after.


the CDC tables are created on the system tables within your database, with schema CDC.
sample would be
dbo.mytable
would have a cdc table created as
cdc.dbo_mytable_ct
with some extra columns used by CDC

it does require the enterprise or developer editions.
see
using CDC you would normally have a configuration table that contains the last LSN that was extracted by each table.
you then extract the data from CDC using one of their table functions supplying the stored LSN as the start point of extraction, and the current LSN as the end point.

quite easy to use really





Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Hi Kevin

Try this
Lets call original data the Source, and the Backup will be the Destination

I believe that your Source have Primary keys. After creating the tables on the destination, open you SSIS project.

Pull in a Data Flow Task per each table, and have your OLE DB Source and Destination. In the OLE DB Source, have a query that references both your source and destination tables

e.g.
Code:
SELECT Column
     FROM SOURCE.dbo.TABLEA A
     
     WHERE  A.PRIMARY_KEY NOT IN (SELECT B.PRIMARY_KEY FROM DESTINATION.dbo.TABLEA B WHERE B.PRIMARY_KEY IS NOT NULL)

This will return only the new data from your source table.

Then Join OLE DB Source to you OLE DB Destination and make sure your columns are mapped accordingly.

Remember that the above will only work if you do not want the updated records from your source to be be updated in the destination tables.
If you want data to be updated, you need to use a Slow Changing Dimension


Regards
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top