Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SELECT TOP 1 to return multiple records

Status
Not open for further replies.

grnzbra

Programmer
Mar 12, 2002
1,273
US
I have a table of people that has a unique primary key for each record. However, due to radical differences in written language, people can appear several times with different spellings of the same name. There is another foreign key field for which each person has a unique value even though the person has more than one record. I cannot use the primary key.

What I need to do is select the most recent record for each person based on the foreign key and the date that the record was modified. A TOP 1 clause would return just one record. What I need is a select statement that returns the TOP 1 for each person in the table. I could do this with a cursor, but this is already inside of a cursor - I'd like to avoid nested cursors.

I vaguely remember seeing a solution for this (perhaps even here) but that was a long time ago.

Does anyone know how this is done?
 
Could you present a sample of your data, something like:

[pre]
PK_Field FK_Field LastName Date_Mod
1 22 Brown 1/1/2001
2 23 Brownie 5/5/2005
3 34 Brwney 6/6/2006
4 78 XYZ 9/9/2008
[/pre]
and what outcome you would expect based on what rule(s)? (Which records with different spellings would make the same last name?)

Have fun.

---- Andy

There is a great need for a sarcasm font.
 
OK, I don't know, if I got a correct understanding, but I designed simple sample data about two persons each having 2 entries in a names (history) table.

For the query to pick the latest name entered for each person it's essential the enteredAt datetime differs, so I cheated creating sample data with GetDate()+X. Computers are too fast today, so successive calls of GetDate() might change in split seconds somewhere at decimal places 5 or 6, but before the precision of the datetime type.

That aside here's sample code and the CTE to use:

Code:
Declare @names as table (nameID int identity(1,1), name nchar(50), personID int, EnteredAt datetime);

Insert Into @names values 
('Grean Zebra' ,1,getdate())  ,('Green Zebra' ,1,getdate()+1),
('Olaf Doshkey',2,getdate()+2),('Olaf Doschke',2,getdate()+3);

with latestnamerecords as(
select personID, Max(EnteredAt) as LatestNameAt from @names Group By personID)

Select n.* from @names n inner join latestnamerecords lnr
on n.personID = lnr.personID AND n.EnteredAt = lnr.LatestNameAt
order by personID

Obviously the CTE query could also be written as a simpler sub query, but I think it makes things better readable and understandable. First step is to pick out max(enteredAt) dates per personID, which is a foreign key to a not implemented persons table holding other non moving data of a person, perhaps. Anyway, all entered/modified versions of names are in the @names table, and the row with the max(enteredAt) date per personID is the latest name record of that person. So the whole row of the max(enteredAt) date per personID is of interest. And that's what the final query retrieves with an inner join of the latestnamerecords via matching both personID and the datetime.

By the way, SQL 2016 has a feature you will like about such historical data: temporal tables, for an introduction see
Besides some other things it introduces some new query clauses like querying data AS OF a certain datetime, back historically or of course also usable AS OF NOW via specifying AS OF GetDate().

Bye, Olaf.
 
Visually:

latestnames_cdrtrp.png


The result of latest names (above) in comparison with all names (below). Only rows 2 and 4 are picked, as needed.

Bye, Olaf.
 
Olaf,
grnzbra said: "people can appear several times with different spellings of the same name"
Your example shows people with the same spellings of the names.
That's why I asked for example of data and the rules, especially what makes "different spellings of the same name" - the 'same name'.... [ponder]

Have fun.

---- Andy

There is a great need for a sarcasm font.
 
Look closer, the names change from wrong to correct spellings.

Grean -> Green
Doshkey -> Doschke

Bye, Olaf.
 
Oh, I see.
You then assumed "each person has a unique value even though the person has more than one record" with your personID, and I hope you are right.

"each person has a unique value" could also be interpreted as: :-( [pre]
nameID Name PersonID
1 17
2 19
3 27
4 33[/pre]
That's why I always ask for an example of data and don't assume anything, but that's just me :)

Have fun.

---- Andy

There is a great need for a sarcasm font.
 
It would be really nice to see sample data to give a solution fitting the actual data. But since it's about the idea to get a top 1 record per group of records, you can take it from there and adapt it to another situation, too.

The idea of temporal tables I pointed to has an even stricter outset, it asks of two datetime2 columns, a starttime and endtime of the validity of a record, but it separates the current records in one major table and all previous records in a history table and thus getting the latest data is even simpler than querying AS OF NOW, you simply query the data from the main table.

So, whenever you have the chance to go for SQL2016, grnzbra, you should go for that feature, as it's really about the topic of keeping all data history and still have an easy way to act on current data. The nice thing about this feature compared to an older history keeping with CDC (change data capture) is, that it's a feature even going into the free Express version of MSSQL 2016. And on top of that, the history table is maintained from the system automatically, even if you alter the main table. So this surely will be a goal to migrate any self rolled history data keeping done via triggers, if not CDC, so far. It has some weak spots, but especially querying this data is much easier and should outperform every self done history mechanisms especially in getting data AS OF a certain datetime on the past.

I did something like that myself in about 2010 and had a T_data for current data and H_data for historical states. To get the row as of some datetime I did [tt]SELECT * FROM t_data UNION SELECT * FROM h_data HAVING adatetime >= validfromDatetTime AND adatetime < validtoDateTime[/tt], which can of course be optimized via indexes on these datetime columns, but it's nevertheless a not ideal brute force attack on the problem.

Bye, Olaf.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top