Group By Optimize Problem

mark1110 · Feb 24, 2010

I am trying to delete duplicate records from a table (tempfile). I populate the file using the following sql statement:

INSERT INTO tempfile
(counter, ssn, encrypt_ssn, field1, field2,...field50)
select count(encrypt_ssn) as counter
,ssn
,encrypt_ssn
,field1
,field2
...
,field50
from permfile
group by encrypt_ssn

DELETE FROM tempfile
WHERE counter > 1 AND field1 = ""

The ... indicates there are 50 fields total (field3, field4, field5, etc). I have about 100,000 records and it is taking a long time to do. Is there a faster way of doing this? The tempfile is not indexed but the permfile has ssn and field4 as the primary key.

Thanks

Mark

r937 · Feb 24, 2010

this is mysql, yes?

perhaps you will get better answers in that forum

in ANSI SQL (the forum you posted in) your GROUP BY clause is illegal

r937.com | rudy.ca
Buy my new book Simply SQL from Amazon

mark1110 · Feb 24, 2010

I am using Sybase 11.9. The group by works it is just slow. Should I try doing a SELECT INTO before doing an INSERT INTO?

JarlH · Feb 24, 2010

This is somewhat confusing. The general GROUP BY rule is:
If a GROUP BY clause is specified, each column reference in the SELECT list must either identify a grouping column or be the argument of a set function.

You have lots of columns not fulfilling that requirement.

What exactly are you trying to do?

mark1110 · Feb 25, 2010

Hi,

I have a table with 50 fields and 100,000 records. I want to get rid of all duplicate records based on the encrypt_ssn field only which is why I do a group by on that field only, the other fields I don't care about. Once I find the records where the counter is greater than 1 I delete all records in which field1 is blank.

I hope that explains it a little better. If I am doing this incorrectly, please tell me and I will redo my sql code.

Thanks,

Mark

r937 · Feb 25, 2010

i am concerned about the "other fields i don't care about" remark

encrypt_ssn other fields
123 NULL NULL NULL NULL NULL NULL NULL 9
123 NULL NULL NULL NULL 60 NULL NULL NULL
123 125 400 815 10 52 187 204 37
123 NULL NULL 937 NULL NULL NULL NULL NULL

which row would you like to keep?

your way, you could get any one of them

r937.com | rudy.ca
Buy my new book Simply SQL from Amazon

r937 · Feb 25, 2010

oops i overlooked the "where field 1 is blank" part

sorry

r937.com | rudy.ca
Buy my new book Simply SQL from Amazon

mark1110 · Feb 25, 2010

If you can see a better way to do the query I am open to suggestions. What I was thinking was something like this:

select count(encrypt_ssn) as counter ,encrypt_ssn
INTO temp
from permfile
group by encrypt_ssn

DELETE permfile
FROM permfile, temp
WHERE permfile.encrypt_ssn = temp.encrypt_ssn
AND temp.counter > 1 AND field1 = ""

Thanks,

Mark

JarlH · Feb 28, 2010

I suppose that would make it, but the syntax is far from ANSI SQL compliant.

The ANSI SQL way looks something like:
[tt]INSERT INTO temp (column-list) SELECT count(encrypt_ssn) ...

DELETE FROM permfile WHERE encrypt_ssn IN (SELECT encrypt_ssn FROM permfile, temp WHERE ...)[/tt]

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Group By Optimize Problem

mark1110

Programmer

r937

Technical User

mark1110

Programmer

JarlH

Programmer

mark1110

Programmer

r937

Technical User

r937

Technical User

mark1110

Programmer

JarlH

Programmer

Similar threads

Part and Inventory Search

Sponsor