Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Return rows ignoring duplicates over a particular field 1

Status
Not open for further replies.

janinja

Technical User
Nov 28, 2000
18
NZ
Hello everyone,

I've written the following query which worked fine until I added the Group By clause.

SELECT DISTINCT Email,FormName,Timestamp,FirstName,LastName,Street,Town,City,County,Postcode,DayPhone,Newsletter,MoreInfo,Other FROM #TName#
WHERE FormName = '#CName#'
GROUP BY Email
ORDER BY FirstName

After adding the Group By clause, I'm being given the following error:

Microsoft][ODBC SQL Server Driver][SQL Server]Column 'table.FormName' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Can anyone help? I've tried a million different ways of writing the query and I can't get it to work.

love ninja :)

 
If you are just trying to return distinct rows, remove the entire GROUP BY clause.

If this isn't what you want, can you explain the result you would like to see?
 
I'm returning entries for a competition - some users have entered more than once so I'm trying just to return one row per email address.

I tried using distinct, but it applied the 'distinct' over all the columns in the row, instead of just the email column. That could have worked - but because I have a timestamp column, it meant that duplicates were still being returned because that element in the duplicate entry was different for each entry.

ninja :)
 
Do you really need to return all the columns? If you just want the distinct email addresses use:

Code:
SELECT DISTINCT email
FROM tablename
WHERE formname = 'cname'
 
I need to return them all unfortunately. The other info is needed to be sent on to the companies that sponsor the comps.
 
Are you saying that you could have two rows with the same email but different addresses? (If so, how do you know which address to take?)

Or is it only the timestamp column that is throwing the DISTINCT out? If so, can you just remove this column?
 
If I removed the timestamp column it would work fine. But I need the timestamp info too unfortunately. I think I just have to accept that DISTINCT can't run over one column in a list and find another way of doing it (sigh)

ninja :)
 
Try using the MIN() function something like this:

Code:
SELECT Email, FormName, MIN(Timestamp), FirstName, LastName, Street, Town, City, County, Postcode, DayPhone, Newsletter, MoreInfo, Other
FROM #TName#
WHERE FormName = '#CName#'
GROUP BY Email, FormName, FirstName, LastName, Street, Town, City, County, Postcode, DayPhone, Newsletter, MoreInfo, Other
ORDER BY FirstName

This will give you the same rows as if you did DISTINCT without the timestamp column, but include the earliest timestamp for each row too.
 
Thank you for that, but I can't get it to work unfortunately. It doesn't seem to include the timestamp column because I'm getting this message returned:

The column TIMESTAMP is not present in the query named QUERY. It is likely that you have misspelled the name of the column.
 
What is the data type of the Timestamp column? If it is datetime then I can't see a problem with my query (assuming the query you first posted returns some results?).

If it is actually timestamp data type then you can't use MIN() on it. But I don't see why you would need to include that column in this case as it would be meaningless data (ie not actually a date/time).
 
The data type of the timestamp column is datetime. I need to collect the data from that column so we know exactly when people entered the comp.

When I tried your query, it wouldn't output the timestamp data further down the page:

"The column TIMESTAMP is not present in the query named ENTRIES. It is likely that you have misspelled the name of the column."

I tried changing the query around a bit as well, moving the MIN(Timestamp) to the beginning of the list and removing the comma after it, and adding Timestamp to the group by clause but none of these got rid of the error. I'll continue to investigate today and if I get any result I'll post it up here.

ninja :)
 
James - I got your query to work by writing the MIN(TimeStamp)in the select as:

MIN(TimeStamp) AS TimeStamp

And then I added the distinct keyword to the query and it didn't fall over so I presume it's working OK now. I have also seen no more duplicates!

Here's the finished and working query:

SELECT DISTINCT MIN(TimeStamp) AS TimeStamp, Email, FormName, FirstName, LastName, Street, Town, City, County, Postcode, DayPhone, Newsletter, MoreInfo, Other
FROM Table
WHERE FormName = 'FormName'
GROUP BY Email, FormName, FirstName, LastName, Street, Town, City, County, Postcode, DayPhone, Newsletter, MoreInfo, Other
ORDER BY FirstName

thanks James for pointing me in the right direction

ninja : )
 
There are times when you want to run a SELECT DISTINCT query in order to get
back a unique list on one (or more) of the table columns. You don't want any repeats
in the select list column(s), but you need to have all the fields in the records returned.

I think that this problem has been discussed in detail in another post. Unfortunately,
the search engine is down and I cannot find the exact reference for you. This clever
idea came from the Tek-Tips forum on Oct 23, 2002. A programmer who calls himself
"NVSbe" submitted the following solution:

SELECT * FROM #Tname#
WHERE myId IN (SELECT MAX(myId)
FROM #Tname#
GROUP BY eMail)
ORDER BY firstname

The magic takes place in the inner query. The inner query groups the results on
the eMail address and uses the MAX() function to pull only ONE record out of the
group of repeat eMail addresses. That provides the uniqueness we are looking for
on the eMail address field. You can use either MIN() or MAX() to return only one
record.

The list of myId field values is then used by the outer query to select everything
from the reduced set of records.

Obviously, this whole idea falls apart if you do not have an ID field in your records.

 
If you want only one record per Email and that the Timestamp is maximum try this:

SELECT Email,FormName,Timestamp,FirstName,LastName,Street,Town,City,County,Postcode,DayPhone,Newsletter,MoreInfo,Other
FROM #TName# X
WHERE FormName = '#CName#'
and X.TimeStamp=(select max(TimeStamp) from #TName# where Email=X.Email)
ORDER BY FirstName
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top