ID|Company|Person|Email
1|Peter Products|John Doe|info@peterprod.com
2|Peter Products|Joe Cool|info@peterprod.com
3|PaulCorp, Division of Peter Products|Bob Smith|info@peterprod.com
4|Peter Products|Rob Smith|rsmith@peterprod.com
The sample above is a rough example of the data I have. I want to find distinct instances of the 'Email' address. I do not need to send three emails to info@peterprod.com.
Wouldn't a DISTINCT SELECT on all columns only return row 1 since the Email is the same in row 1,2,3 and the Company is the same in 1,2,4?
Distinct is always applied to all columns in a record.
In your sample, the company name and the email adress always coincide. Is this the case for all your data?
if so
Code:
select company,min(person),email
from t
group by company,email
If not, you need subqueries which is not support in any
[bold] production [/bold] version of Mysql. It can be accomplished by using multiple queries and temporary tables.
If you are using a relational DB you must see that a table is mostly 2-dimensional: these two dimensions can be regarded as horizontal and vertical.
A SELECT-Statement will give you any result you want and you can restrict it horizontically as well as vertically. Each restriction is done by the SELECT-Statement in a logical clear way:
First restriction is the horizontical restriction. The horizontical dimension of a table is also called COLUMNS. In your case it's the ID, Company, Person and Email. So, if you want Email and Email only you have to say:
SELECT email
There is no need to name another (or all) column in the SELECT-Section if you don't want to see it.
The next step in a SELECT is to give the source or the Table-Name
SELECT email
FROM table
The next step is the vertical restriction, done by comparison in the WHERE-Clause
SELECT email
FROM table
WHERE ...
The where-Clause is not really necessary in your case, but it is useful and completes the logic of a relational Database.
In your case, only the
SELECT email
FROM table
is necessary. The DISTINCT-Command in the SELECT-Clause will eliminate duplicates, and so is a vertical restriction, but not an explicite comparison like it is in the where-clause.
SELECT DISTINCT email
FROM table
Please note this. I think, your problem is that you didn't understand exactly the way a relational database should be handled. It's not important if any column has duplicate data, and it's also not important if any row cantains duplicate data. The important thing is, that you must have a restriction plan of which data you want to see, and that a relational database always can show you this data if you put the right SQL-Statement.
The words "horizontal" and "vertical" as substitutes for "columns" and "rows" are, on first view, a little unusual, but they will show you the way a relational database works in a more geometrical way.
> I do not need to send three emails to info@peterprod.com.
welcome to the wonderful world of databases
which one of the three would you like to send it to?
you need some kind of condition, on one or more of the remaining columns, to be able to pick out a single row from any group of more than one row with the same email
the condition is often something like "last one entered" or "with the most sales" or something
i've seen people use nonsensical but otherwise functional conditions like "lowest alphabetical name"
in your case, this would give you 3 instead of 1, because Bob comes before Joe and John, plus of course 4
select email, min(person)
from yourtable
group by email
if you want the company too, you could use
select email, min(person), min(company)
but that might give you Bob with the wrong company (except it doesn't in this instance, by coincidence)
you could also try
select email, min(pad(company)||person))
so that at least you get the right person with the right company, but now this gives you Bob and Joe, plus of course Rob
note the pad function, which i have invented here, pads the company on the right with spaces before concatenating the person, so that wherever this is going (mail merge?) can substring it
The sample I offered was actually a distilled version of a very lengthy SELECT statement that queried several tables. As I whittled it down to its essence for the example above, I realized its unnecessary complexity.
I've been in the bad habit of selecting data used in the WHERE condition even though I only needed to select a single column for my output. Although I am still pulling data from several tables, I now realize that I only needed to SELECT the email address.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.