Group by v. Distinct

jdgonzalez · Mar 30, 2005

I'm hoping someone can answer this for me. Which is worse to use: group by or distinct? I took a peek at the execution plans for both and they are identical. In this example, group by and distinct serve the same purpose, to remove dupes from a dataset. Does SQL Server know this and just apply the same logic internally?

It's just something I thought of earlier today that I realized I didn't know the answer to.

SQLBill · Mar 30, 2005

GROUP BY and DISTINCT do not do the same thing. They do not remove duplicates from a dataset.

GROUP BY does what it says, it GROUPS the result in the order of the BY list.

SELECT col1, col2
FROM tablename
GROUP BY col1

That command will return EVERY ROW from tablename. GROUP BY "divides a table into groups" (from the BOL).

DISTINCT returns unique rows based on the columns being selected.

col1 col2
a 1
a 2
b 1

SELECT DISTINCT col1, col2
FROM tablename

will return:
col1 col2
a 1
a 2
b 1

because the 'combinations' are unique (DISTINCT).

SELECT DISTINCT col1
FROM tablename

will return:

col1
a
b

-SQLBill

Posting advice: FAQ481-4875

vongrunt · Mar 30, 2005

If GROUP BY has nothing to aggregate (COUNT, SUM, blah) then both methods are identical.

------
heisenbug: A bug that disappears or alters its behavior when one attempts to probe or isolate it
schroedinbug: A bug that doesn't appear until someone reads source code and realizes it never should have worked, at which point the program promptly stops working for everybody until fixed.

jdgonzalez · Mar 30, 2005

Thanks vongrunt... That's what I meant to say.

SQLBill · Mar 30, 2005

Just ignore my brief stu****dity. I goofed. Call it lack of sleep or whatever. My post was completely wrong.

-SQLBill

Posting advice: FAQ481-4875

jdgonzalez · Mar 31, 2005

anyone?

vongrunt · Mar 31, 2005

Anyone what?

This code:

Code:

SELECT col1, col2
FROM tablename
GROUP BY col1[b], col2[/b]

... and this:

Code:

SELECT DISTINCT col1, col2
FROM tablename

... do the same thing. The only eventual difference is order of final results (no ORDER BY clause - no explicit sort).

Does SQL Server know this? Indirectly, yes. There are two families of internal operators capable of performing distinction: Distinct and Stream Aggregate. First can only remove dupes, second remove dupes and stream aggregates to other operators like Compute Scalar. If there is nothing to aggregate, optimizer is free to choose any of these operators - and exec plans become identical.

Btw. DISTINCT is preferred way because of clear semantics (DISTINCT = don't wanna dupes, GROUP BY = wtf?) and no redundant syntax (col1, col2 must be typed only once).

------
heisenbug: A bug that disappears or alters its behavior when one attempts to probe or isolate it
schroedinbug: A bug that doesn't appear until someone reads source code and realizes it never should have worked, at which point the program promptly stops working for everybody until fixed.

jdgonzalez · Mar 31, 2005

So what you're saying is that if you're using either group by or distinct to get distinct rows, use 'distinct' as a best practice. From a performance standpoint, SQL server doesn't care. Correct?

vongrunt · Mar 31, 2005

From my experience - to get only distinct rows - yes. If there is any exception, I haven't found it yet.

------
heisenbug: A bug that disappears or alters its behavior when one attempts to probe or isolate it
schroedinbug: A bug that doesn't appear until someone reads source code and realizes it never should have worked, at which point the program promptly stops working for everybody until fixed.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Group by v. Distinct

jdgonzalez

Programmer

SQLBill

MIS

vongrunt

Programmer

jdgonzalez

Programmer

SQLBill

MIS

jdgonzalez

Programmer

vongrunt

Programmer

jdgonzalez

Programmer

vongrunt

Programmer

Similar threads

Part and Inventory Search

Sponsor