speed up delete where value not in multiple tables 1

jhaganjr · May 23, 2008

Hello Amazingly Smart People,

I did a search and found the answer to my question. However, the answer is excessively slow. I'm wondering if you know a better way.

Need: To delete records from TblA when the key value of the records does not appear in several related tables.

Solution:
DELETE * FROM tblA
WHERE value Not In (SELECT value FROM tblBSub)
AND value Not In (SELECT value FROM tblCSub)
AND value Not In (SELECT value FROM tblDSub)
AND value Not In (SELECT value FROM tblESub)
AND value Not In (SELECT value FROM tblFSub);

I'm dealing with only thousands and tens of thousands of records, but this takes many minutes to run. I haven't had the patience to wait beyond 5 or 6 minutes for it to complete. But I know it works by testing it on small test data sets first.

Another solution is to use a union query to create a single list of "value" from all the sub-tables. The union query runs very fast on its own. But when I place it into the Not In statement above the Delete query runs just as slow.

Another solution is to update a True/False field in tblA first, then delete records from tblA based on that field.

This runs MUCH faster - in a matter of 20 seconds or so - but it is a two step process ... first the update then the delete. I'm guessing the first two solutions are slower because using the IN approach may eliminate the benefits of indexing??

The update, then delete solution is not a problem, per se, but I'm wondering if anyone knows how to speed up the delete query itself. Is there better syntax that executes faster?

Thanks,
Joe

TheAceMan1 · May 23, 2008

How are ya jhaganjr . . .

The best I can think of is to cut down the amount of records returned by each subQuery so your not parsing thru entire tables! Example:

Code:

[blue]DELETE tblA.Value FROM tblA 
WHERE (tblA.Value Not In (SELECT tblBSub.value FROM tblBSub 
			  WHERE (tblA.Value=tblBSub.Value))) AND 
      (tblA.Value Not In (SELECT tblCSub.value FROM tblCSub 
			  WHERE (tblA.Value=tblCSub.Value))) AND 
      (tblA.Value Not In (SELECT tblDSub.value FROM tblDSub 
			  WHERE (tblA.Value=tblDSub.Value))) AND 
      (tblA.Value Not In (SELECT tblESub.value FROM tblESub 
			  WHERE (tblA.Value=tblESub.Value))) AND 
      (tblA.Value Not In (SELECT tblFSub.value FROM tblFSub 
			  WHERE (tblA.Value=tblFSub.Value)));[/blue]

[blue]Your Thoughts? . . .[/blue]

See Ya! . . . . . .

Be sure to see thread181-473997
Also faq181-2886

jhaganjr · May 23, 2008

My thoughts WERE (when I first read it) ...

I understand the goal, but how does what you wrote achieve it?

Still, I plugged it into my live data, and it works. Thanks a million!

I'm still trying to get my amateur brain around why it works. It seems like a "double-negative" ... looking for something that's NOT in a table, but using a WHERE clause that specifies the values from each table are the same.

I keep reading it and trying to imagine the data sub-sets, but I just don't get it.

No matter. It works. Plus it's lightning fast.

Thanks, again!

Joe

Golom · May 23, 2008

This is multi-step but you should be able to do it in far less than 20 minutes

Code:

Public Sub DeleteFromTblA()
    ' Create an indexed temporary table.
    CurrentDb.Execute "CREATE TABLE Temp (Value text(255),  " & _
                      "CONSTRAINT [Primary] PRIMARY KEY (Value));"

    ' Insert all the values into it
    CurrentDb.Execute "INSERT INTO Temp (Value) " & _
                      "Select Value From " & _
                      "(" & _
                      "      SELECT value FROM tblBSub " & _
                      "UNION SELECT value FROM tblCSub " & _
                      "UNION SELECT value FROM tblDSub " & _
                      "UNION SELECT value FROM tblESub " & _
                      "UNION SELECT value FROM tblFSub " & _
                      ");"

    ' Create another indexed temporary table.
    CurrentDb.Execute "CREATE TABLE Temp1 (Value text(255),  " & _
                      "CONSTRAINT [Primary] PRIMARY KEY (Value));"

    ' Insert everything that's only in tblA into Temp1
    CurrentDb.Execute "INSERT INTO Temp1 (Value) " & _
                      "Select Value " & _
                      "From tblA LEFT JOIN Temp " & _
                      "    ON tblA.Value = Temp.Value " & _
                      "Where Temp.Value IS NULL;"

    ' Now Delete everything in that table from tblA.
    CurrentDb.Execute "DELETE * FROM tblA  " & _
                      "Where tblA.Value IN (Select Value From Temp1);"

    ' Clean up your mess.
    CurrentDb.Execute "DROP TABLE Temp;"
    CurrentDb.Execute "DROP TABLE Temp1;"

End Sub

This will (I hope) be able to use indexes and, dispite the multiple steps, should be faster than the full-table scans that you are having to endure now.

TheAceMan1 · May 23, 2008

jhaganjr . . .

I'm at work and will explain when I get home. However, you may want to try a union query with the new SQL (like before). Should get even faster! . . .

See Ya! . . . . . .

Be sure to see thread181-473997
Also faq181-2886

jhaganjr · May 23, 2008

Golom,

Thanks for the code. I like it, and will be able to use it in other settings.

Joe

TheAceMan1 · May 24, 2008

jhaganjr . . .

Sorry to get back so late! To explain:

jhaganjr said:
[blue] . . . It seems like a "double-negative" ... looking for something that's NOT in a table, but using a WHERE clause that specifies the values from each table are the same[/blue]

[purple]The key is in the amount of records returned by each subQuery[/purple], available for searching with the [blue]Not In[/blue] clause. Consider . . .

Code:

[blue]SELECT value
FROM tblBSub[/blue]

. . . this subQuery [red]returns all records in the table[/red] for the field [blue]'Value'[/blue], which according to you could be thousands to tens of thousands of records. [blue]This is your time killer for the search[/blue], espcially considering you have to search thousands to tens of thousands of records for each subQuery!

Now consider . . .

Code:

[blue]SELECT tblBSub.Value
FROM tblBSub 
WHERE ([purple][b]tblA[/b][/purple].Value=[purple][b]tblBSub[/b][/purple].Value[/blue]

. . . we have the same subQuery, only now we've set criteria to match that were looking for. The subQuery now only returns those records where 'Value' exists! This gives us a much smaller recordcount for searching! We now have a few records to search thru, or none! if 'Value' is not found. Remember, [purple]were talking records returned by the subQuery for search by the [blue]NOT IN[/blue] clause![/purple] When no records are returned [blue]NOT IN[/blue] is True because no records are available for search. Hence your fastest execution occurs when 'Value' doesn't exist for all subQueries.

I hope the above is a little better insight for you.

If you'd like to crank out more speed!, optimize a little by performing the following:
[ol][li]In succession open each table in design view then
[ol a][li]Select the Value Field.[/li]
[li]In the [blue]General Tab[/blue] on the bottom left, check the [blue]Indexed[/blue] property. If is saids [blue]No[/blue] then change it to [blue]Yes (Duplicates OK)[/blue][/li].
[li]Save & close the table.[/li][/ol][/li]
[li]For each subQuery:

Code:

[blue]Change: SELECT [i]TableName[/i]
To    : SELECT [purple][b]Top 1[/b][/purple] [i]TableName[/i][/blue]

For those situations where more than one record is found in a subQuery, the above [blue]makes sure only one record is returned! After all, for each subQuery, we only need to know if 'Value' occurs once.[/blue][/li][/ol]

Special Note: The first time you run any query (after editing), its optimized by the jet, making the first run the slowest. There after the query runs in its optimized/fastest state until you edit again.

[blue]Your Thoughts? . . .[/blue]

See Ya! . . . . . .

Be sure to see thread181-473997
Also faq181-2886

jhaganjr · May 26, 2008

Your explanations make perfect sense. And I know from putting your solution into practice it works fantastically.

My indexes were already in place, but I did add the "Top 1" syntax to speed things up even further.

Thanks, again. I learned a lot from your posts!

Joe

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

speed up delete where value not in multiple tables 1

jhaganjr

IS-IT--Management

TheAceMan1

Programmer

jhaganjr

IS-IT--Management

Golom

Programmer

TheAceMan1

Programmer

jhaganjr

IS-IT--Management

TheAceMan1

Programmer

jhaganjr

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor