Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Removing Duplicates 1

Status
Not open for further replies.

MarkZK

Technical User
Jul 13, 2006
202
GB
Hi all,

I'm trying to reduce the size of a table by only returning unique rows based on name, IP, Useragent.

for example, I have something like,

Code:
RowID		Name		IP			UA								Date
----------------------------------------------------------------------------------------------------------------------------------------------
 1		bob  		127.0.0.1      		Opera/9.60 (Windows NT 5.1; U; ru)                     		11/14/2008 10:18:10 AM
 2		tom  		255.255.255.255		Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)		11/12/2008 11:10:50 AM
 3		tommy		255.255.255.255		Opera/9.51 (Windows NT 5.1; U; cs)               		11/11/2008 8:15:31 AM
 4		bob  		127.0.0.1      		Opera/9.60 (Windows NT 5.1; U; ru)  	           		11/10/2008 9:24:40 AM
 5		tom  		255.255.255.255		Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)		11/9/2008 10:27:04 AM

and what I want to do is (somehow) compare each row with each other (probably not efficient, as there's 23,000+ rows) to check for changes to any of the main three columns. The example would return,

Code:
RowID		Name		IP			UA								Date
----------------------------------------------------------------------------------------------------------------------------------------------
 1		bob  		127.0.0.1      		Opera/9.60 (Windows NT 5.1; U; ru)				11/14/2008 10:18:10 AM
 2		tom  		255.255.255.255		Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)		11/12/2008 11:10:50 AM
 3		tommy		255.255.255.255		Opera/9.51 (Windows NT 5.1; U; cs)				11/11/2008 8:15:31 AM

as, "bob" has two rows that match all of the three main columns (name,IP,UA) the older row can be removed, but because "tom" used the name "tommy" and a different browser only the one row was removed that matched all three columns.

I hope that makes sense, as I say, I only "want" to do this, not "need", so if it's too tricky then please feel free to ignore this post.

Thanks
 
Joining the table back to itself and grouping by the fields you want should do it.

Code:
select dbo.IPTable.rowid, dbo.IPTable.[Name], dbo.IPTable.IP, dbo.IPTable.UA, dbo.IPTable.Date
from dbo.IPTable
inner join ( select min(rowid) min_rowid from dbo.IPTable group by [Name], [IP], [UA] ) min_IPTable ON dbo.IPTable.rowid = min_IPTable.min_rowid

There might be a more efficient way to do this but this should give you a start.
 
Hi, EKOnerhime

Thanks for that, I'm fairly sure that works :) .... well, the amount of rows is about half and I can't see any that match on all three columns, I just can't understand where in that statement it was able to know not to put any matching rows in the new table, how does it compare ?..... sorry I'm a noob


Thank you very much.
 
The first part of the statement does a standard select and would normally give you all the records.

Code:
select dbo.IPTable.rowid, dbo.IPTable.[Name], dbo.IPTable.IP, dbo.IPTable.UA, dbo.IPTable.Date
from dbo.IPTable

The second part of the statement finds the minimum rowId for each unique combination of: Name, UP and UA. As the date is excluded from the seconds part, we can easily band these values together.

Code:
select min(rowid) min_rowid from dbo.IPTable group by [Name], [IP], [UA]

Finally, we join our first statement to our second statement using an inner join on the min rowid from our second statement to the rowid in our first statement.

What we're saying is, return all records from my table where the record is the first record to appear for the unique combination of: Name, UP and UA.

The second statement is a subquery, you can find out more about them in BOL.

More reading:
- 2008
- 2005
- 2000

HTH
 
Thanks again, EKOnerhime, very good of you to take the time out.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top