Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

[ODBC SQL Server Driver] Time0ut expired - WIRED problem!

Status
Not open for further replies.

JoBlink

Technical User
May 28, 2004
38
0
0
US
SUMMARY

Query works on a 1 Mil row table but gives above error on 10 mil. row table.


QUERY
___________________________________________________________

DELETE
FROM [1]
WHERE (Name = ANY
(SELECT Name
FROM [1]
GROUP BY Name
HAVING (COUNT(Name) > 100)))
___________________________________________________________


WIRED PART

Replace DELETE with SELECT * or SELECT Phone or SELECT Name - and it works!
But try SELECT COUNT(*) as expr1 or COUNT (Phone)as expr1 and above error pops up again.


ABOUT

1) Table has Name-Address-Phone data, all varchar 255.
2) INDEXED on Name and State
3) Queries executed through Management Console, so there is NO APPLICATION CONNECTION

4) Environment: SQL Enterprise 2000 on Windows XP Pro
5) Hardware: 4 GB Ram on Intel Core 2 Duo CPU - should be enough!


Thanks for reading!
 
There are a lot of reasons why deletes can be slower than selects. Consider that there could be a lot of transaction log activity, possibly even increasing the size of the transaction log to accommodate a large delete. Also consider that SQL must delete rows from the indexes as well as the table. And finally, there could be a poorly performing delete trigger on the table. All of these things can stack up and cause horrible performance for deletes.

You will likely get better performance if you delete your data in batches. I suggest you read this: faq183-3141

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Is [1] a table name? Sounds a bit strange to me.

Try

Code:
DELETE from myTable T
where exists (select 1 from myTable T1 
where T.Name = T1.Name
group by Name having count(Name) > 100)

Not sure if this will perform better than your version.


PluralSight Learning Library
 
Let me clarify.
This Sub-query works, and I want to delete all of those records returned by it:

(SELECT Name
FROM [1]
GROUP BY Name
HAVING (COUNT(Name) > 100)))


Is there a better way to delete those ?

------------------------------------------------
Gmmastros,

this is a static table, just addresses, with no transaction logs.

are you suggesting to remove indexes to speed up query?

Also, doing it in batches MAY defeat the purpose - which is to remove duplicates if they count over 100.
If I do a batch of 50000 or whatever, as you suggested, one batch may catch 80 duplicates, and another batch will catch 90 of the same, and both batches will leave those records in, even though the total is 80+90=170 - and they should be deleted.


------------------------------------------
Markros, yes, [1] is a table name. :)
I tried your query, but substituted DELETE with SELECT Count (*) - it returned ALL records, which is wrong.

I said earlier that SELECT COUNT(*) fails, but only when using SELECT (Name = ANY (SELECT---- construct .
EXISTS returns EVERYTHING!
---------------------------------------------------

Thank you again!
 
When posting code on this forum, you should use the code blocks (simply because it makes it prettier). Ex:

[ignore]
Code:
Put your code here
[/ignore]

I understand what you are saying about the batches. It's critically important that you get this right because the data needs to be protected against corruption. That being said, do you think this would work?

Code:
DELETE
FROM         [1]
WHERE     (Name = ANY
                        (SELECT     [!]Top 100[/!] Name
                         FROM          [1]
                         GROUP BY Name
                         HAVING      (COUNT(Name) > 100)))

Assuming the code shown above, you could capture the row count after the delete and loop until the rowcount = 0.

Since this is deleting data, I strongly encourage you to verify that you have a good backup prior to running this code.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Another variation of the same theme:
Code:
select T.* from [1] T inner join (select Name from [1]
group by [Name] having count([Name]) > 100) X
on T.Name = X.Name
-- you can substitute select T.* with select count(*) to check how many complete duplicates you have

I also believe that my version of that query should have worked the same way.

One more version (although I don't like IN syntax that much):
Code:
select T.* from [1] T where [Name] IN (select [Name]
from [1] group by [Name] having count([Name])>100)

We can get the result of this select into a temp table and then delete records in batches based on ID from temp table.

PluralSight Learning Library
 
OK, this is really frustrating!

GMMASTROS, I like your suggestion to limit queries to a few TOP results. I verified your suggestion with SELECT statement, and it returns just the records that need to be erased .
HOWEVER!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

when I am ready to delete, your code gives error message that is in the subject line of this post. Please note, I am only deleting top 10, and it comes to 1010 records!

-----------------------------------------------------

MARKROS, the same problem is with your last suggested code

Code:
DELETE FROM [1]
WHERE     (Name IN
                          (SELECT     TOP 10 Name
                            FROM          [1]
                            GROUP BY Name
                            HAVING      (COUNT(Name) > 100)))

----------------------------------------------------------
All the suggestions I got from you work well on small tables of about 1 million records.

On a 10 million row STATIC(!) table, your queries (and my) return select statements - nice!

The problem, as I said, is that a DELETE command on the same query gives [ODBC SQL Server Driver] Timeout ERROR

FYI, I do have indexes on Name- Desc, State and few other fields. Would you suggest removing all indexes?


Thank you again!
 
Can you run the following 2 commands in a query window and post the results?

Code:
exec sp_helptrigger [1]
exec sp_helpindex [1]

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
FYI, I do have indexes on Name- Desc, State and few other fields. Would you suggest removing all indexes?

Dropping indexes, doing stuff, and then recreating indexes seems.... somehow.... dirty (to me). I would only recommend this as a last resort.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
MARKROS,

I don't have primary key, but I will look into the article you suggested, thanks!

-------------------------------------------------------------

GMMASTROS,

I have no triggers, so the first command returned blank.

Here is second:
Code:
index_name	index_description	       index_keys
State	nonclustered located on PRIMARY	Mailing_State, Name(-), Volume, Code1(-)
 
JoBlink,

Am I interpreting your response correctly? The second query only returned one row? If this is the case, you will probably see an improvement if you put an index on the name column.

Code:
Create index idx_Name On [1](name)

With 10 million rows, it may take a little while to create the index, but then the query is likely to be much faster to run.

Please let us know.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
OK, I got it working, but there was a price to pay!

I removed all indexes except for the Name index.

Modified code suggestions by GMMASTROS: instead of "TOP 100", which FAILS, I tried "TOP 10" - FAILS, then "TOP 5" - WORKS!

So I am moving along at a snails pace.
My boss promised to fire me if this project not finished by Friday. I will be living in the office until then :)
 
Of the 10 million rows in this table, how many do you expect will be deleted, and how many do you expect to remain?

I ask because if most of the rows will be deleted, it may be better to copy the rows you want to keep in to another table, delete all the rows (using the truncate table command) and then copying the "kept" rows back in to the table.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Out of 10 million rows, 1,2 million should be deleted.

Can you please suggest a way to loop this code until subquery result is ZERO - no duplicate exceeds count 100.

Each DELETE removes about 6000 records - that's 200 loops!

Code:
DELETE FROM [1]
WHERE     (Name IN
                          (SELECT     TOP 5 Name
                            FROM          [1]
                            GROUP BY Name
                            HAVING      (COUNT(Name) > 100)))

THANK YOU
 
This should do the trick:

Code:
Declare @DeleteCount Int

Set @DeleteCount = 1

While @DeleteCount > 0
  Begin

    DELETE FROM [1]
    WHERE     (Name IN
                          (SELECT     TOP 5 Name
                            FROM          [1]
                            GROUP BY Name
                            HAVING      (COUNT(Name) > 100)))

    Set @DeleteCount = @@RowCount
  End

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top