Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Find and Delete All Orphan Child records in database?

Status
Not open for further replies.

carnahantom

Technical User
Apr 2, 2014
5
US
I am working in a transactional SQL Server 2012 development environment where I am writing a script to "shrink" (remove excess records from) the large tables in backup copies of our Production database. The tables have record counts in the multi-million range. Referential is not a big issue for this process since it is in a development environment using test data. However, I do need to follow up afterwards and eliminate orphans.

To increase performance and keep from filling up the transaction log, I am using the following sequence of transaction:
[ol 1]
[li] Copy records to be retained from the original table to a new non-permanent table[/li]
[li] Disable FKs and Triggers[/li]
[li] Truncate the original table [/li]
[li] Insert records from the non-permanent table into the original table[/li]
[li] Drop the non-permanent table[/li]
[li] Re-enable FK's and Triggers[/li]
[/ol]

The problem is I end up with a lots of orphans (expected since I am bypassing DRI).

Question: is there a generic way that I can delete the orphans using a batch process?

The ideal would be for the process to find and delete all orphaned records in the database, but I would settle for feeding the process the names of the tables where I know there are orphans. What I am trying to avoid is writing queries for each specific table resulting in massive code writing.

Any suggestions would be greatly appreciated.



Tom
 
I'm going to make a assumption that your definition of an orphan record is one whose parent record no longer exists, although I believe this approach could still work for foreign keys. Create a copy of the table you want to check for orphans. If there are space restrictions, you only need the columns associated with the cursor (see below). Add an additional column (or reuse an existing one) to denote "orphan". Create a cursor to select the primary key, parent key, and orphan indicator. Process the cursor, doing a lookup on each parent key. If the parent does not exist, update the orphan indicator in the child record. Otherwise, fetch the next record. When you have finished, your copy of the table will have the orphans identified.

==================================
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright


 
you can delete number of record by using transactions, in this case log record will be recorded one for each transaction... it will speed up and decrease size of log file.. I use that method for deleting millions records from tables with billions of records, if record not long you can set up loop for deleting records inside transaction and release transactions for million records...
 
GK53 -- I am not quite sure what you are suggesting. I am trying to stay away from transactions, but what you are suggesting is interesting, yet I don't see how it would speed things up and keep the log size small. The tables I have to shrink are large, have triggers and lots of FKs that reference them. Perhaps 75% of the records need to be deleted. Time is a factor too.

Could you elaborate on what you had in mind?

Tom
 
the code should be something like that

SQL:
declare @Records as decimal(19,4), @BatchSize as int, @BatchCount as int, @CurrentBatch as int


set @BatchSize = 1000000

set @Records = (select count(*) from yourtable (nolock) where conditionForDelete)

select @BatchCount = ceiling(@records/@BatchSize)

set @CurrentBatch = 1

while @CurrentBatch <= @BatchCount
begin
       begin transaction
       delete top (@BatchSize) from yourtable 
       where conditionForDelete
       commit transaction
       
       set @CurrentBatch = @CurrentBatch + 1
end
you can disable triggers before executing code
if you not trying to violate FK it will work
 
carnahantom,

Whether or not you explicitly use transactions, transactions occur anyway. This is true regardless of any database settings. By controlling the size of the transaction, you can minimize the transaction log size.

Transactions are important because it is how SQL Server prevents data corruption. Consider, for a moment, what would happen if there were no such thing as transactions. Now delete a million rows from a table. While the delete is occurring, pull the plug on the server. Without transactions, your database would be corrupt. With transactions, SQL Server would recognize that a pending operation was occurring and would roll it back when the server is restarted. This is why you cannot really do anything without transactions.

There are 2 schools of thought regarding massive deletes. You can move the data you want to keep in to a temp table, drop the original table, and then rename the temp table to the same name as the temp table. pros: super fast. con: Before dropping the table, you need to drop foreign keys. After renaming the table, you need to recreate foreign keys indexes and triggers.

Alternatively, you can delete your data in smaller batches (as gk53 is suggesting). pro: You don't need to modify the structure of your database. con: slightly slower than copy-drop-rename method.

Personally, I prefer the 2nd method (the one that gk53 is suggesting) because database structures have a tendency to change over time. For example, suppose you write a script to copy-drop-rename that includes creating the 4 indexes that exist on the table. 6 months from now you (or someone else) adds another index to improve performance. 2 months after that you re-run the copy-drop-rename script which only puts the 4 indexes back and misses the new one that was created to solve some other problem you were having with the performance. Basically, it's safer (in my opinion) to delete in batches, even if it takes a little longer to complete.





-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Hi George,

The table I am trying to shrink has a trigger, many FKs pointing to it, and worse, a bunch of schema-bound views that prevent me from dropping it. I am going to try Option #2 to see how it does.

Thanks George and GK53!

Tom
 
GK53,

For other tables that I needed to shrink, I:

[ol 1]
[li]copied the records to retain to a temporary table (SELECT * INTO tempTable FROM ...) [/li]
[li]truncated the original table[/li]
[li]inserted the retained records from the temp table into the original[/li]
[li]dropped the temp table[/li]
[/ol]

That worked rather well for most of the tables I was shrinking. I was able to shrink a table that was 24 million records in under 2 minutes.

That technique will not work on this table due to a trigger, 10 FKs referencing it, and 10+ schema bound views which kept me from truncating. While it was not hard to deal with the trigger and FK's, dropping the views and recreating them was complicated. For one, we don't own all the views.

I tried your method (above). My conditions required the deletion of 31 million records of a 36 million record table. It took 39 minutes to run. Can you think of any way to speed it up?

Thanks,

Tom



Tom
 
I was looking for ways to speed up the script that GK53 suggested. I had someone here suggest that I use the following code to speed up my overall process:

Code:
exec sp_msforeachtable @command1="alter table ? nocheck constraint all";
exec sp_msforeachtable @command1="alter table ? disable trigger all";

< DELETE RECORDS for multiple tables here >

exec sp_msforeachtable @command1="alter table ? enable trigger all";
exec sp_msforeachtable @command1="alter table ? check constraint all";

I realized that disabling/re-enabling all triggers might not be a good idea since often they are used to maintain integrity, but I was looking at selectively disabling triggers on some tables.

The trigger for my "problem" table does not have any CASCADE DELETES or other critical operations in it, so disabling the trigger would help and would not cause a problem. When I tried with just the trigger disabled, I went from 900,000 records per minute to 4,000,000 records per minute. That resulted in a reduction in runtime from 34 minutes to 8 minutes. That helped.

What I don't know is if disabling/re-enabling constraints would cause a problem. I am thinking that since they primarily affect inserts and updates, it probably would not affect referential integrity.

Does anyone have a different idea on that?


Tom
 
If you follow this route, you should understand that constraints and keys can be "untrusted". If they are untrusted, SQL server will not use them to improve performance. I strongly encourage you to read this:




-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top