Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Return unique rows

Status
Not open for further replies.

micang

Technical User
Aug 9, 2006
626
US
SQL 2008 R2

Hi All,

If I have a query like so:

select contractno, Address1,dateJoined
from tableTest
where datejoined between 2012-12-01 and 2015-12-31

and my results are:

Contractno,address1,datejoined
387, johnroad, 2013-06-01
267, bobroad, 2015-12-01
267, bobroad, 2015-12-01
578, ellenroad, 2012-12-24 (edit - this date was wrong)

Columns Contractno,address1,datejoined should be unique, but because of poor data, the result has duplicates (rows,2 and 3). There is another field in the table "DatelastTrans" - which is a datefield. In the event of a duplicate, I would like to pull out the max of "DatelastTrans" - this would then give me unique results.

How can I write a query where if there are duplicates, then look at datelasttrans and get the max record?

So my expected result would be:

Contractno,address1,datejoined
387, johnroad, 2013-06-01
267, bobroad, 2015-12-01
578, ellenroad, 2012-12-24 (edit - this date was wrong)

I am not sure if I can do this in one query?

Thanks
Michael
 
First of all:
Why you have more than one record for one ContractNo?
Code:
select DISTINCT contractno
               ,Address1
               ,dateJoined
from tableTest
where datejoined between 2012-12-01 and 2015-12-31

Borislav Borissov
VFP9 SP2, SQL Server
 
Hi bborissov,

You are correct, my sample data should read:

Contractno,address1,datejoined
387, johnroad, 2013-06-01
267, bobroad, 2015-12-01
267, yellowroad, 2015-12-01
578, ellenroad, 2012-11-24

And the actual column that causes the duplicate is contractno - the end result has to have unique contractno's. I basically need to choose one of the addresses and I need to use the max DatelastTrans to get a unique address if there are more than 1.

Thanks
Michael
 
DISTINCT is an easy solution in this case. I would wonder, why 2012-11-24 falls in the range of 2012-12-01 and 2015-12-31.

Here's a sample determining the max(datLastTrans) as LatestTransferDate and picking the record of that dates per contractno

Code:
Declare @tableTest as Table (contractno int, Address1 varchar(20), dateJoined datetime, dateLastTrans datetime);

insert into @tableTest Values
 (387,'johnroad' ,'20130601','20160201')
,(578,'ellenroad','20121124','20160209')
,(267,'b0broad'  ,'20151201','20160210')
,(267,'bobroad'  ,'20151201','20160211');

with LatestTransfer as
(
  select contractno, max(dateLastTrans) as LatestLastTransDate from @tableTest group by contractno
)

Select tt.* From @tableTest tt
inner join LatestTransfer lt on lt.contractno = tt.contractno 
and  lt.LatestLastTransDate = tt.DateLastTrans 
Where tt.datejoined between '20121101' and '20151231'

It seems you're querying on a staging table instead of the real contract table with just the latest data merged into it, where contractno should be a primary key (unique). That would simplify everything as you wouldn't have older data in it. Data history is a good way to be able to revert to older states and see changes/corrections made. But why query in history data, if you want to get the latest state? I'd design a table structure, which only holds latest data and a history table with all states separate, because historic record states are special case queries and wanting the current state of data is the norm. The norm query shouldn't get so complicated. Just think a step further, when you want to join other data.

Notice: I used a date range from November 2012 to end of 2015, so contract 578 does fall into that time span.

The slight advantage over DISTINCT: You can fetch the real latest dateLastTrans, as you can see for contract 267 with b0broad corrected to bobroad in the 11th February transfer.

And the Big advantage over DISTINCT: If there are any changes/corrections in the Address1 or dateJoined, you get the latest state of the data, otherwise you still get all different states of data to a contract, ie Select distinct tt.contractno, tt.Address1, tt.dateJoined From @tableTest tt still gives both states of contract 267.

Bye, Olaf.
 
Olaf, thank you very much, that query works - I learnt something new, appreciate it.

Looking further at the data, it's not normalized and now I see that there are records that have a duplicate dateLastTrans, I will need to reference a second max column - does this complicate things or can it easily be added?

Thanks

Michael

 
Well, you can take the other max value in the LatestTranser CTE and you add it to the join condition. The main idea is to have enough info to pick the one record per contractno you really want picked.
If you end up in double rows even considering further columns a last resort is adding rownumbers within the LastTranser records and pick Rownum=1.

Bye, Olaf.
 
thank you Olaf, that makes complete sense, appreciate your time with this.

Michael
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top