Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Table Of Grouped Duplicate Records. Statistics needed

Status
Not open for further replies.

Felix18807

Programmer
Jun 24, 2011
39
GB
I have a query which groups together duplicate accounts within a database. It has the following fields

GroupID, AccountIDs, AccountCreatedDate, AddressPopulation

(GroupID indicates thats the AccountIDs belong together)

What I want to be able to produce is

1) A Count of AccountIDs within a GroupID Where AccountCreatedDate is the most recent (max) within that group AND where AddressPopulation is the maximum within that group.

AND

2) A Count as above where the most recent is NOT the most populated..

The goal being to prove/disprove if the most recently produced accounts contain the largest amount of address information for the purposes of merging accounts.

I hope this makes sense...

I am either being silly and missing something basic or this involves subqueries which are not my forte

Many thanks for reading this.
 

It would be helpful if you would show a sample of your data showing the problem you want to address, and the outcome you are hoping for. A picture is worth a 1000 words of explaining....

Have fun.

---- Andy
 
I'm having trouble making sense of it myself.

Here is some sample data

GroupID AccountIDs AccountCreatedDate AddressPopulation
1 51223 06/09/2010 20:04:24 1
1 51600 04/11/2011 12:14:28 4
1 55460 04/11/2011 13:14:28 5
2 73353 04/11/2011 12:14:28 4
2 131134 02/09/2011 11:14:21 4
3 130881 04/11/2010 12:14:28 5
3 131480 04/11/2011 12:14:28 2
4 12331 01/01/2009 10:21:00 5
4 70741 04/11/2011 12:14:28 3
4 81433 04/11/2011 12:14:28 2
4 119614 04/11/2011 12:14:28 1
5 53359 02/03/2009 11:10:10 4
5 53366 06/09/2010 20:04:24 3
6 11138 07/10/2011 13:04:07 4
6 80751 04/11/2011 12:14:28 5
6 82076 06/09/2010 20:04:24 3
7 130974 04/11/2011 12:14:28 5
7 131162 04/11/2011 12:14:28 4

Groups 1,2,3,6 and 7 have the largest address population against the most recent date and should therefore be counted

The result should look something like this
Count of Max Pop AND Max Date = 5
Count where above isnt true = 2

I hope this helps to clarify my dilema.

My latest attempt is here (some of the fieldnames etc are slightly different as this is a work in progress and I tried to simplify the nature of the issue)
Code:
 SELECT DISTINCT [Results].GroupID, Last([Results].AccountIDs) AS SelectedAccountID
FROM (SELECT [Group Criteria] & " - " & [Group Postcode] AS GroupID, Max([Results].AddressPopulation) AS MaxPop FROM [Results] GROUP BY [Group Criteria] & " - " & [Group Postcode])  AS [Parsed Duplicates MaxPop] INNER JOIN ((SELECT [Group Criteria] & " - " & [Group Postcode] AS GroupID, Max([Results].[Account Modified]) AS LastDate FROM [Results] GROUP BY [Group Criteria] & " - " & [Group Postcode])  AS [Parsed Duplicates LastDate] INNER JOIN [Results] ON ([Parsed Duplicates LastDate].GroupID = [Results].GroupID) AND ([Parsed Duplicates LastDate].LastDate = [Results].[Account Modified])) ON ([Parsed Duplicates MaxPop].GroupID = [Results].GroupID) AND ([Parsed Duplicates MaxPop].MaxPop = [Results].AddressPopulation)
GROUP BY [Results].GroupID
ORDER BY [Results].GroupID;
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top