Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SELECT DISTINCT on only one column

Status
Not open for further replies.

jimoblak

Instructor
Oct 23, 2001
3,620
0
0
US
I'm clueless. How can I make a DISTINCT SELECTion on just one column while other columns may have duplicates?

- - picklefish - -
 
You can't.

What are you trying to accomplish?

Describe some sample data and the expected result.
 
ID|Company|Person|Email
1|Peter Products|John Doe|info@peterprod.com
2|Peter Products|Joe Cool|info@peterprod.com
3|PaulCorp, Division of Peter Products|Bob Smith|info@peterprod.com
4|Peter Products|Rob Smith|rsmith@peterprod.com

The sample above is a rough example of the data I have. I want to find distinct instances of the 'Email' address. I do not need to send three emails to info@peterprod.com.

Wouldn't a DISTINCT SELECT on all columns only return row 1 since the Email is the same in row 1,2,3 and the Company is the same in 1,2,4?

I need the Email from rows 1 and 4 returned.

- - picklefish - -
 
Distinct is always applied to all columns in a record.

In your sample, the company name and the email adress always coincide. Is this the case for all your data?

if so

Code:
select company,min(person),email
from t
group by company,email

If not, you need subqueries which is not support in any
[bold] production [/bold] version of Mysql. It can be accomplished by using multiple queries and temporary tables.
 
It appears that I am simply an idiot...

I was distinct selecting more data than I needed:
SELECT DISTINCT * FROM mytable; //DOESN'T WORK

SELECT DISTINCT Email FROM mytable; //DOES WORK

I really only needed the email addresses for my mailing list so trying to select the other stuff was pointless.

RE: my data? No, nothing coincides. It's a mess.

Thanks

- - picklefish - -
 
If you are using a relational DB you must see that a table is mostly 2-dimensional: these two dimensions can be regarded as horizontal and vertical.

A SELECT-Statement will give you any result you want and you can restrict it horizontically as well as vertically. Each restriction is done by the SELECT-Statement in a logical clear way:

First restriction is the horizontical restriction. The horizontical dimension of a table is also called COLUMNS. In your case it's the ID, Company, Person and Email. So, if you want Email and Email only you have to say:

SELECT email

There is no need to name another (or all) column in the SELECT-Section if you don't want to see it.

The next step in a SELECT is to give the source or the Table-Name

SELECT email
FROM table

The next step is the vertical restriction, done by comparison in the WHERE-Clause

SELECT email
FROM table
WHERE ...

The where-Clause is not really necessary in your case, but it is useful and completes the logic of a relational Database.

In your case, only the

SELECT email
FROM table

is necessary. The DISTINCT-Command in the SELECT-Clause will eliminate duplicates, and so is a vertical restriction, but not an explicite comparison like it is in the where-clause.

SELECT DISTINCT email
FROM table


Please note this. I think, your problem is that you didn't understand exactly the way a relational database should be handled. It's not important if any column has duplicate data, and it's also not important if any row cantains duplicate data. The important thing is, that you must have a restriction plan of which data you want to see, and that a relational database always can show you this data if you put the right SQL-Statement.

The words "horizontal" and "vertical" as substitutes for "columns" and "rows" are, on first view, a little unusual, but they will show you the way a relational database works in a more geometrical way.

Thomas
 
> I do not need to send three emails to info@peterprod.com.

welcome to the wonderful world of databases

which one of the three would you like to send it to?

you need some kind of condition, on one or more of the remaining columns, to be able to pick out a single row from any group of more than one row with the same email

the condition is often something like "last one entered" or "with the most sales" or something

i've seen people use nonsensical but otherwise functional conditions like "lowest alphabetical name"

in your case, this would give you 3 instead of 1, because Bob comes before Joe and John, plus of course 4

select email, min(person)
from yourtable
group by email

if you want the company too, you could use

select email, min(person), min(company)

but that might give you Bob with the wrong company (except it doesn't in this instance, by coincidence)

you could also try

select email, min(pad(company)||person))

so that at least you get the right person with the right company, but now this gives you Bob and Joe, plus of course Rob

note the pad function, which i have invented here, pads the company on the right with spaces before concatenating the person, so that wherever this is going (mail merge?) can substring it

rudy
 
Thanks everyone for your help.

The sample I offered was actually a distilled version of a very lengthy SELECT statement that queried several tables. As I whittled it down to its essence for the example above, I realized its unnecessary complexity.

I've been in the bad habit of selecting data used in the WHERE condition even though I only needed to select a single column for my output. Although I am still pulling data from several tables, I now realize that I only needed to SELECT the email address.

- - picklefish - -
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top