Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Bulk data Insert with validation

Status
Not open for further replies.

rk68

Programmer
Jul 15, 2003
171
IN
Hi,
This is related to a website developed by our tech team, where in the user will be inserting text data using SQL Loader into the temporary table. Post this the data from temporary table needs to be inserted into live table with data validations (e.g. checks like field length, date format, numeric/text check etc.)
The data will be around 2-3 million & would be an ongoing process.

Is there a fastest way to insert the data as normal insert would take hours?
Been thru various forums but couldn't get a concrete solution.

TIA,
Raj
 
The initial table for loading should not have any constraints. That is, no primary key, no referential integrity, etc. Do all the validation during the process that inserts rows from the loading table to the production table. Or, build the table with no constraints, then put the constraints on the table to identify problems, then load to the production table. It is the constraints that will slow the insert process.

==================================
advanced cognitive capabilities and other marketing buzzwords explained with sarcastic simplicity


 
Thanks Johnherman.

The initial will be the temporary table where the user will be uploading data (csv/text) thru an application. Post this the system will be inserting the data into the production table & before inserting will carry out the data validation.
While doing so, it takes lot of time for insert. How can I minimize that time.
 
That may depend on what you mean by "normal insert"

If you have your TEMPtable matching your PRODUCTIONtable (same fields, same order of fields, same types, etc.), I would try just:[tt]
INSERT INTO PRODUCTIONtable
SELECT * FROM TEMPtable[/tt]



---- Andy

There is a great need for a sarcasm font.
 
Andy, by Normal Insert what I meant is the same what what you mentioned in your thread but that would take hours to insert.
This insert process will be repeated every 2-3 days and the user won't wait for hours.
So was just checking alternate methods.
 
around 2-3 million" records "every 2-3 days" - I would try to process it in a Stored Procedure, and Commit 1000 records at the time.


---- Andy

There is a great need for a sarcasm font.
 
The class of tool you're looking for is called an ETL tool. That is, Extract-Transform-Load. It's purpose is to pull data from somewhere (E), transform and validate it (T), and load it to someplace (L). This is the industrial strength solution to your problem.

One I have experience with is Informatica. With Informatica you could eliminate the temp table load step you're doing and have Informatica do it all, including cleaning and validating fields. It can still go into a temp table for further validating against other data, but it can start you out with much cleaner data and cut down on manual intervention.

Just Google "etl tools" and start from there.


 
I agree that you want to COMMIT every 1000 to 10000 rows as Andy states. This can be done by the process that loads from the temporary table to the permanent table. A couple of other ideas. If this is the only update process for the master table, make a copy of the master table calling it _NEW. Update the 2nd copy of the master table, the _NEW version. Then, at a pre-arranged time, take down the old production table and rename it as _OLD. Rename the _NEW production table as the new master table.

Another technique is to drop all indexes on the production table before the INSERT process, then build the indexes anew after the INSERT finishes. Indexes and referential integrity checks can really slow down your update process. A lot depends on how much downtime you can tolerate with the production system. If none, then you are probably best with a production ETL system like Informatica, Data Stage, BODS, etc as SamBones suggests.

==================================
advanced cognitive capabilities and other marketing buzzwords explained with sarcastic simplicity


 
Thanks to all of you.
I had thought of all those solutions you all have written about.
This being client data, I won't have the option of downtime.
Finally, I think is using the Stored procedure using counter to commit records at regular interval.
 
If there is never any downtime for the client database, then you must accept that you will have contention with user activity while doing your update. You don't appear to have any option.

==================================
advanced cognitive capabilities and other marketing buzzwords explained with sarcastic simplicity


 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top