Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need help with deleting duplicates in Access2k

Status
Not open for further replies.

pompeyjon

MIS
Mar 12, 2003
16
GB
Hi Folks,

I have the following query:

SELECT First([APCGeneral].[PATID]) AS [PATID Field],
First([APCGeneral].[SPELLNO]) AS [SPELLNO Field],
First([APCGeneral].[CONSEPNO]) AS [CONSEPNO Field],
First([APCGeneral].[STARTDT]) AS [STARTDT Field],
Count([APCGeneral].[PATID]) AS NumberOfDups
FROM APCGeneral
GROUP BY [APCGeneral].[PATID], [APCGeneral].[SPELLNO],
[APCGeneral].[CONSEPNO], [APCGeneral].[STARTDT]
HAVING (((Count([APCGeneral].[PATID]))>1) AND
((Count([APCGeneral].[STARTDT]))>1));

This gives a result as follows:

PATID SPELLNO CONSEPNO STARTDT NUMBEROFDUPS
(2135 ROWS)

What I actually want is as follows:

DATETIMEY PATID SPELLNO CONSEPNO STARTDT
(4395 ROWS)
i.e. a complete record for each duplicate

Where DATETIMEY is another column in the record that does not contain duplicate information (as IS the case over the other 4 columns). I need the records ordered by PATID and DATETIMEY and need to retain the record in each set of duplicates with the highest value for DATETIMEY.

Any and all help appreciated.

PompeyJon
 
Sure:

This is what I'm currently getting with the existing SQL:

PATID SPELLNO CONSEPNO STARTDT NUMBEROFDUPS
000031 123 01 20040401 4
000056 456 01 20040402 2

What I want is

DATETIMEY PATID SPELLNO CONSEPNO STARTDT
20040501 000031 123 01 20040401
20040501 000031 123 01 20040401
20040501 000031 123 01 20040401
20040601 000031 123 01 20040401
20040501 000056 456 01 20040402
20040601 000056 456 01 20040402

In the above sample data, I want to be able to retain records 4 and 6 and junk the rest.

Thanks.
 
something like this:

Code:
SELECT A1.DATETIMEY, A1.[PATID], A1.[SPELLNO], A1.[CONSEPNO], A1.[STARTDT] 
FROM APCGeneral A1
WHERE A1.DATETIMEY = (SELECT MAX(DATETIMEY) FROM APCGeneral A2 WHERE A1.PATID = A2.PATID);

Leslie
 
Thanks - that seems to select the correct records. I guess the easiest thing to do would be to use that SQL as the basis for a MAKE TABLE query? Or would it be possible to use the above SQL in a DELETE query (e.g. DELETE records where NOT IN the recordset)?

PompeyJon
 
I want to retain the records selected in the query you suggested and delete all other duplicates.

I also obviously need to retain all unduplicated records in the table.

PompeyJon
 
Leslie - your query works for the sample data because the duplicate records have the earlier DATETIMEY. If all the duplicates have an earlier DATETIMEY, then you can run a delete query:

Delete From APCGeneral
Where DATETIMEY < (Select Max(a.DateTimeY) from APCGeneral as a where a.PATID=APCGeneral.PatID)

pompeyjon - do you have duplicate PATIDs with the most recent DATETIMEY as well?

This select query will give you one record per PATID assuming the SPELLNO, CONSEPNO, and STARTDT are the same for a PATID (you can adjust the query if they aren't).

SELECT A1.[PATID], Max(A1.DATETIMEY) as DATETIMEY,
A1.[SPELLNO], A1.[CONSEPNO], A1.[STARTDT]
FROM APCGeneral A1
Group By A1.[PATID], A1.[SPELLNO], A1.[CONSEPNO], A1.[STARTDT];

If you want to get rid of duplicates and just keep the most recent DATETIMEY, you could change this to a make-table query, delete the original records, and load the new records from the new table to the original table.
 
JonFer,

Thanks for your suggestions. I tried your SELECT query and it gave me the correct number of records (15356). However, when I expanded the query to include the rest of the columns in the table, the record count went up to 15566.

Have you got any ideas as to why this happened?

PompeyJon
 
Well I made it work, using a bit of a fudge admittedly. I altered the table structure to add an Autonumber field based on a replication id.

Then I amended JonFer's query above to:

SELECT Max(A1.ID), A1.[PATID], Max(A1.DATETIMEY) as DATETIMEY,A1.[SPELLNO], A1.[CONSEPNO], A1.[STARTDT]
FROM APCGeneral A1
Group By A1.[PATID], A1.[SPELLNO], A1.[CONSEPNO], A1.[STARTDT];

This gave me the requisite 15356 records. Then I wrote a new Make Table query, joining this recordset back to the original table and wrote the resulting recordset out to a new table.

Now I just need to work out how to automate this on a monthly basis!

Thanks everyone!

PompeyJon
 
why don't you just update the DateTimeY each time instead of inserting a new record? Then you don't have to do this process.

leslie
 
Thing is the DateTimeY field is how I determine which records I want to keep. DateTimeY contains the date that the record was generated. I need to keep later records and delete older records that are meant to be replaced.

The model may be rubbish but unfortunately that's what I have to work with.

Regards,

PompeyJon
 
but if you only want to keep the latest one, then just replace the DateTimeY field each time.

For instance you have:


DATETIMEY PATID SPELLNO CONSEPNO STARTDT
20040501 000031 123 01 20040401
20040601 000056 456 01 20040402


PatID 31 returns on 20040615

then you would do:

UPDATE APCGeneral SET DATETIMEY = 20040615 WHERE PATID = 000031

now all you have is the latest date. Say person 000031 comes back on 20040723

Then you could update again:

UPDATE APCGeneral SET DATETIMEY = 20040723 WHERE PATID = 000031

That way the only record is the latest one.



Leslie
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top