Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Group By Optimize Problem

Status
Not open for further replies.

mark1110

Programmer
Apr 20, 2005
85
US
I am trying to delete duplicate records from a table (tempfile). I populate the file using the following sql statement:

INSERT INTO tempfile
(counter, ssn, encrypt_ssn, field1, field2,...field50)
select count(encrypt_ssn) as counter
,ssn
,encrypt_ssn
,field1
,field2
...
,field50
from permfile
group by encrypt_ssn

DELETE FROM tempfile
WHERE counter > 1 AND field1 = ""

The ... indicates there are 50 fields total (field3, field4, field5, etc). I have about 100,000 records and it is taking a long time to do. Is there a faster way of doing this? The tempfile is not indexed but the permfile has ssn and field4 as the primary key.

Thanks


Mark


 
this is mysql, yes?

perhaps you will get better answers in that forum

in ANSI SQL (the forum you posted in) your GROUP BY clause is illegal

r937.com | rudy.ca
Buy my new book Simply SQL from Amazon
 
I am using Sybase 11.9. The group by works it is just slow. Should I try doing a SELECT INTO before doing an INSERT INTO?
 
This is somewhat confusing. The general GROUP BY rule is:
If a GROUP BY clause is specified, each column reference in the SELECT list must either identify a grouping column or be the argument of a set function.

You have lots of columns not fulfilling that requirement.

What exactly are you trying to do?
 
Hi,

I have a table with 50 fields and 100,000 records. I want to get rid of all duplicate records based on the encrypt_ssn field only which is why I do a group by on that field only, the other fields I don't care about. Once I find the records where the counter is greater than 1 I delete all records in which field1 is blank.

I hope that explains it a little better. If I am doing this incorrectly, please tell me and I will redo my sql code.

Thanks,

Mark
 
i am concerned about the "other fields i don't care about" remark

encrypt_ssn other fields
123 NULL NULL NULL NULL NULL NULL NULL 9
123 NULL NULL NULL NULL 60 NULL NULL NULL
123 125 400 815 10 52 187 204 37
123 NULL NULL 937 NULL NULL NULL NULL NULL

which row would you like to keep?

your way, you could get any one of them


r937.com | rudy.ca
Buy my new book Simply SQL from Amazon
 
If you can see a better way to do the query I am open to suggestions. What I was thinking was something like this:

select count(encrypt_ssn) as counter ,encrypt_ssn
INTO temp
from permfile
group by encrypt_ssn

DELETE permfile
FROM permfile, temp
WHERE permfile.encrypt_ssn = temp.encrypt_ssn
AND temp.counter > 1 AND field1 = ""

Thanks,

Mark
 
I suppose that would make it, but the syntax is far from ANSI SQL compliant.

The ANSI SQL way looks something like:
[tt]INSERT INTO temp (column-list) SELECT count(encrypt_ssn) ...

DELETE FROM permfile WHERE encrypt_ssn IN (SELECT encrypt_ssn FROM permfile, temp WHERE ...)[/tt]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top