Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Group by v. Distinct

Status
Not open for further replies.

jdgonzalez

Programmer
May 11, 2004
72
US
I'm hoping someone can answer this for me. Which is worse to use: group by or distinct? I took a peek at the execution plans for both and they are identical. In this example, group by and distinct serve the same purpose, to remove dupes from a dataset. Does SQL Server know this and just apply the same logic internally?

It's just something I thought of earlier today that I realized I didn't know the answer to.
 
GROUP BY and DISTINCT do not do the same thing. They do not remove duplicates from a dataset.

GROUP BY does what it says, it GROUPS the result in the order of the BY list.

SELECT col1, col2
FROM tablename
GROUP BY col1

That command will return EVERY ROW from tablename. GROUP BY "divides a table into groups" (from the BOL).

DISTINCT returns unique rows based on the columns being selected.

col1 col2
a 1
a 2
b 1

SELECT DISTINCT col1, col2
FROM tablename

will return:
col1 col2
a 1
a 2
b 1

because the 'combinations' are unique (DISTINCT).

SELECT DISTINCT col1
FROM tablename

will return:

col1
a
b

-SQLBill


Posting advice: FAQ481-4875
 
If GROUP BY has nothing to aggregate (COUNT, SUM, blah) then both methods are identical.

------
heisenbug: A bug that disappears or alters its behavior when one attempts to probe or isolate it
schroedinbug: A bug that doesn't appear until someone reads source code and realizes it never should have worked, at which point the program promptly stops working for everybody until fixed.
 
Just ignore my brief stu****dity. I goofed. Call it lack of sleep or whatever. My post was completely wrong.

-SQLBill

Posting advice: FAQ481-4875
 
Anyone what? :)

This code:
Code:
SELECT col1, col2
FROM tablename
GROUP BY col1[b], col2[/b]
... and this:
Code:
SELECT DISTINCT col1, col2
FROM tablename
... do the same thing. The only eventual difference is order of final results (no ORDER BY clause - no explicit sort).

Does SQL Server know this? Indirectly, yes. There are two families of internal operators capable of performing distinction: Distinct and Stream Aggregate. First can only remove dupes, second remove dupes and stream aggregates to other operators like Compute Scalar. If there is nothing to aggregate, optimizer is free to choose any of these operators - and exec plans become identical.

Btw. DISTINCT is preferred way because of clear semantics (DISTINCT = don't wanna dupes, GROUP BY = wtf?) and no redundant syntax (col1, col2 must be typed only once).

------
heisenbug: A bug that disappears or alters its behavior when one attempts to probe or isolate it
schroedinbug: A bug that doesn't appear until someone reads source code and realizes it never should have worked, at which point the program promptly stops working for everybody until fixed.
 
So what you're saying is that if you're using either group by or distinct to get distinct rows, use 'distinct' as a best practice. From a performance standpoint, SQL server doesn't care. Correct?
 
From my experience - to get only distinct rows - yes. If there is any exception, I haven't found it yet.

------
heisenbug: A bug that disappears or alters its behavior when one attempts to probe or isolate it
schroedinbug: A bug that doesn't appear until someone reads source code and realizes it never should have worked, at which point the program promptly stops working for everybody until fixed.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top