Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

"clean up" table (update?)

Status
Not open for further replies.

aas1611

Programmer
Dec 14, 2001
184
DE
Hi,

another problem comes up. I want to "clean up" a table, in a way that there is no duplicate data in it. But I think it's more complicated than it sounds. At least to me.

Suppose I have the following table T1 whose contain is like this:

F1 F2 F3 F4 F5 F6
---------------------------------------------------
Row1: 2005-08-31 17:00:00 111 222 00 88
Row2: 2005-09-01 17:00:00 555 666 00 99
Row3: 2005-09-02 17:00:00 555 666 00 88

The key here is Fields F3, F4, F5. So is the logic:
when there are two or more rows that have the same F3 and F4 and F5, update this table with the newest date. When the combination of F3, F4, F5 is unique, the record stays.

So, the table above is supposed to look like this:
2005-08-31 17:00:00 111 222 00 88
2005-09-02 17:00:00 555 666 00 88

Row2 is deleted (not selected).

I've tried with several methods, but somehow I got stuck. Can it be done without creating an additional table? (Well, don't really care actually if several more tables have to be created). Can it be done at all?

Thanks guys for any clues!
 
If you GROUP BY f3, f4, and f5 do you get the correct number of rows that you wish to have in the clean table?
Code:
SELECT f3, f4, f5,
       MAX(f1) AS "newest_date",
       COUNT(*) AS "duplicates"
FROM T1
GROUP BY f3, f4, f5

Probably you do not know at this point how many rows you should have in the clean table. No matter. Possibly you have a ballpark idea.

The query above may identify which rows have duplicates, any COUNT(*) > 1 indicates combinations of f3, f4, and f5 that have more than one row. MAX(f1) is the latest date from those rows. You could look at a few of them and confirm that the query identifies the f3,f4,f5 combinations with problems.

After you feel confident that these are the rows that need to be changed or deleted look at the rows returned by the following.
Code:
SELECT T1.*
FROM T1
JOIN (
       SELECT f3, f4, f5,
              MAX(f1) AS "newest_date",
              COUNT(*) AS "duplicates"
       FROM T1
       GROUP BY f3, f4, f5
      ) d ON d.f3 = T1.f3
           AND d.f4 = T1.f4
           AND d.f5 = T1.f5
           AND d.newest_date = T1.f1

Are these exactly the rows that you wish to keep in the clean table?

If yes, then the easiest thing might be to create a new table and load it with clean data by INSERTing the results of the second query.

How does that sound?

 
Hi rac2,

thanks for your clue. The code you gave me seems about right. I will give it a try on Monday when I get back on my office. I'll let you know whether it works or not.

Thanks!
 
Hi,

I had to change the code a little bit for the data that has same date but different time, but it works!

Thanks again.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top