Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Coalesce vs. IF 4

Status
Not open for further replies.

sparkbyte

Technical User
Sep 20, 2002
879
US
Wis it that Coalesce takes so much longer than 3 if statements?

IF Example
Code:
USE [MailroomTracking]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

-- =============================================
-- Author:		John Fuhrman
-- Create date: 2010/06/22
-- Description:	QC Report
-- =============================================

SET NOCOUNT ON;

Declare 
	@Enter_Starting_Date DateTime, 
	@Enter_Ending_Date DateTime,
	@Enter_Employee_Last_Name VarChar(50)

Set @Enter_Starting_Date = Null
Set @Enter_Ending_Date = Null
Set @Enter_Employee_Last_Name = Null

	IF 	IsDate(@Enter_Starting_Date) <> 1 
		 Set @Enter_Starting_Date = Convert(varchar,Getdate(),101)

	IF 	IsDate(@Enter_Ending_Date) <> 1 
		Set @Enter_Ending_Date = Convert(varchar,Getdate(),101)

	IF @Enter_Employee_Last_Name IS NULL
		SET @Enter_Employee_Last_Name =  '%'

SELECT     TOP (100) PERCENT 
		dbo.tblTrackingTable.Tracking_ID, 
		EmployeeFullName = Case
				When EmployeeMI = '' Then EmployeeLN + ', ' + EmployeeFN
				When EmployeeMI Is Null Then EmployeeLN + ', ' + EmployeeFN
				When EmployeeMI <> '' Then EmployeeLN + ', ' + EmployeeFN + ' ' + EmployeeMI
			End,
		dbo.tblEmployee.EmployeeFN, 
		dbo.tblEmployee.EmployeeLN, 
		dbo.tblTrackingTable.EmployeeID, 
		dbo.tblTrackingTable.MachineName, 
		UPPER(dbo.tblTrackingTable.BoxNumber) AS BoxNumber, 
		UPPER(dbo.tblTrackingTable.FileNumber) AS FileNumber, 
		dbo.tblTrackingTable.TrackingDate

FROM         dbo.tblTrackingTable FULL OUTER JOIN
                      dbo.tblEmployee ON dbo.tblTrackingTable.EmployeeID = dbo.tblEmployee.EmployeeID

WHERE     
(dbo.tblTrackingTable.TrackingDate BETWEEN 
	DATEADD(dd, DATEDIFF(dd, 0, @Enter_Starting_Date), 0) 
	And DATEADD(ss, - 1, DATEADD(dd, DATEDIFF(dd, 0, @Enter_Ending_Date), 1)))
		AND	(dbo.tblEmployee.EmployeeLN Like @Enter_Employee_Last_Name + '%')
		AND	(dbo.tblTrackingTable.EmployeeID IS NOT NULL) 
		AND (dbo.tblTrackingTable.FileNumber <> '') 
		AND (dbo.tblTrackingTable.BoxNumber <> '') 
		AND (dbo.tblTrackingTable.FileNumber <> '.BOX.END.') 
		AND (dbo.tblTrackingTable.TrackingDate IS NOT NULL)

ORDER BY dbo.tblTrackingTable.TrackingDate, BoxNumber

COALESCE Example:
Code:
--Select 
--	Coalesce(Null, DATEADD(dd, DATEDIFF(dd, 0, Getdate()), 0)), -- Beginning of Day
--	Coalesce(Null, DATEADD(ss, - 1, DATEADD(dd, DATEDIFF(dd, 0, Getdate()), 1))) -- Ending of Day
--	

Declare 
	@Enter_Starting_Date DateTime, 
	@Enter_Ending_Date DateTime,
	@Enter_Employee_Last_Name VarChar(50)

Set @Enter_Starting_Date = Null
Set @Enter_Ending_Date = Null
Set @Enter_Employee_Last_Name = Null

SELECT     TOP (100) PERCENT 
		dbo.tblTrackingTable.Tracking_ID, 
		EmployeeFullName = Case
				When EmployeeMI = '' Then EmployeeLN + ', ' + EmployeeFN
				When EmployeeMI Is Null Then EmployeeLN + ', ' + EmployeeFN
				When EmployeeMI <> '' Then EmployeeLN + ', ' + EmployeeFN + ' ' + EmployeeMI
			End,
		dbo.tblEmployee.EmployeeFN, 
		dbo.tblEmployee.EmployeeLN, 
		dbo.tblTrackingTable.EmployeeID, 
		dbo.tblTrackingTable.MachineName, 
		UPPER(dbo.tblTrackingTable.BoxNumber) AS BoxNumber, 
		UPPER(dbo.tblTrackingTable.FileNumber) AS FileNumber, 
		dbo.tblTrackingTable.TrackingDate

FROM         dbo.tblTrackingTable FULL OUTER JOIN
                      dbo.tblEmployee ON dbo.tblTrackingTable.EmployeeID = dbo.tblEmployee.EmployeeID

WHERE     
(dbo.tblTrackingTable.TrackingDate 
		BETWEEN Coalesce(@Enter_Starting_Date, DATEADD(dd, DATEDIFF(dd, 0, Getdate()), 0)) -- Beginning of Day
		And Coalesce(@Enter_Ending_Date, DATEADD(ss, - 1, DATEADD(dd, DATEDIFF(dd, 0, Getdate()), 1))) -- Ending of Day
		
		AND	(dbo.tblEmployee.EmployeeLN Like Coalesce(@Enter_Employee_Last_Name + '%','%'))
		AND	(dbo.tblTrackingTable.EmployeeID IS NOT NULL) 
		AND (dbo.tblTrackingTable.FileNumber <> '') 
		AND (dbo.tblTrackingTable.BoxNumber <> '') 
		AND (dbo.tblTrackingTable.FileNumber <> '.BOX.END.') 
		AND (dbo.tblTrackingTable.TrackingDate IS NOT NULL)
)

ORDER BY dbo.tblTrackingTable.EmployeeID, dbo.tblTrackingTable.TrackingDate, BoxNumber


[red]Stats Compare[/red]

Major differences are in red.

COALESCE
Client Execution Time 16:40:18 16:40:09 16:36:59
Query Profile Statistics
Number of INSERT, DELETE and UPDATE statements 0 0 0 0.0000
Rows affected by INSERT, DELETE, or UPDATE statements 0 0 0 0.0000
Number of SELECT statements 5 5 5 5.0000
Rows returned by SELECT statements 5490 5490 5490 [red]5490.0000[/red]
Number of transactions 0 0 0 0.0000
Network Statistics
Number of server roundtrips 3 3 3 3.0000
TDS packets sent from client 3 3 3 3.0000
TDS packets received from server 202 202 202 202.0000
Bytes sent from client 3972 3972 3972 3972.0000
Bytes received from server 818842 818842 818842 818842.0000
Time Statistics
Client processing time 187 203 188 192.6667
Total execution time 7655 8718 7640 8004.3330
Wait time on server replies 7468 8515 7452 [red]7811.6670[/red]

IF
Client Execution Time 16:40:06 16:39:59 16:36:49
Query Profile Statistics
Number of INSERT, DELETE and UPDATE statements 0 0 0 0.0000
Rows affected by INSERT, DELETE, or UPDATE statements 0 0 0 0.0000
Number of SELECT statements 8 8 8 8.0000
Rows returned by SELECT statements 0 0 0 [red]0.0000[/red]
Number of transactions 0 0 0 0.0000
Network Statistics
Number of server roundtrips 6 6 6 6.0000
TDS packets sent from client 7 7 7 7.0000
TDS packets received from server 204 204 204 204.0000
Bytes sent from client 5428 5428 5428 5428.0000
Bytes received from server 813094 813094 813094 813094.0000
Time Statistics
Client processing time 516 46 62 208.0000
Total execution time 3687 1499 1421 2202.3330
Wait time on server replies 3171 1453 1359 [red]1994.3330[/red]

Thanks

John Fuhrman
 
quick note:

Code:
dbo.tblTrackingTable.TrackingDate BETWEEN
    DATEADD(dd, DATEDIFF(dd, 0, @Enter_Starting_Date), 0)
    And DATEADD(ss, - 1, DATEADD(dd, DATEDIFF(dd, 0, @Enter_Ending_Date), 1))
This isn't best. Presuming the column is indeed smalldatetime then it will work, but it isn't best practice and will fail if the column is ever changed to datetime or datetime2. Instead you should get in the habit of doing it this way:

Code:
TrackingDate >= Date1
AND TrackingDate < Date2

Also, even though the two ways of doing these queries should be identical because the date expressions should result in the same values, in my experience having the complex expression in the query can sometimes give table scans when seeks are possible.

Last, your two queries are NOT equivalent because they aren't returning the same number of rows. I don't have time right now to really drill into exactly the differences between the two queries and why that might be, but I can say for certain that something is wrong with one or both of the queries as they are empirically not semantically identical.
 
I agree with you whole heartedly, the 2 queries are not the same. That was very evident when looking at the execution plans.

Query with IF

Clustered Index scan Cost: 81%
Hash Match(Inner Join) Cost: 14%
[red]Sort Cost: 3%[/red]
Parallelism(Gather Streams) Cost: 2%

Query with coalesce

Clustered Index scan Cost: 23%
Hash Match(Inner Join) Cost: 8%
[red]Sort Cost: 63%[/red]
Parallelism(Gather Streams) Cost: 5%

I have to assume the reason for the slow down is because of the difference in the SORT. Just not sure why the 2 queries are so different.

I would also like to eliminate the like in the where clause because that casued a significant slow down but unsure how I can do it another way.

Thanks

John Fuhrman
 
Code:
Declare     @Enter_Starting_Date DateTime,     @Enter_Ending_Date DateTime,    @Enter_Employee_Last_Name VarChar(50)Set @Enter_Starting_Date = NullSet @Enter_Ending_Date = NullSet @Enter_Employee_Last_Name = Null    IF     IsDate(@Enter_Starting_Date) <> 1          Set @Enter_Starting_Date = Convert(varchar,Getdate(),101)    IF     IsDate(@Enter_Ending_Date) <> 1         Set @Enter_Ending_Date = Convert(varchar,Getdate(),101)

You set the varaiable to null then use coalesce to put a value in that. Why? I presume you really want to actually send values for lastname and being and end dates not always use default values.

One problem I see is sargability. Read up on this topic ( and you will understand why designing a where clause with coalesce is a bad idea. You might be better served doing this in dynamic SQl and building the where clause you want. Or use the IF process once you figure out the difference in records returned, it clearly is superior.

"NOTHING is more important in a database than integrity." ESquared
 
Thanks SQLsister!! Excellent article. I will bet you are right on the mark!

The colaesce function in the where clause causing it to NOT use the index.


But the article also says that using "IS NULL", "<>", "!=", "!>", "!<", "NOT", "NOT EXISTS", "NOT IN", "NOT LIKE", and "LIKE '%500'" generally prevents (but not always) the query optimizer from using an index to perform a search.

So did I just get lucky and the date range is using an index? Being that {BETWEEN ##/##/#### AND ##/##/####} would be seen by the interpreter as
{TrackingDate >= ##/##/#### AND TrackingDate < ##/##/####}.



Thanks

John Fuhrman
 
Between is sargable (usually). As long as you don't put a column on the right side. You should also realize that a sargable query will only use an index if the appropriate index exists. If you get an execution plan item for sort that is taking 60% of the execution time, and the query is sargable, it usually means an appropriate index does not exist.

I'll post more on this tomorrow when I have a real keyboard and not this dumb iPhone pretend keyboard.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Thanks George, much appreciated.

Thanks

John Fuhrman
 
Here's an example that demonstrates my point.

First, the between operator includes both end points. So... Where Number between 2 and 6 will return rows where number = 2 or number = 6, and any values between them. Ex:

Code:
Declare @Temp Table(Number Int)

Insert Into @Temp Values(1)
Insert Into @Temp Values(2)
Insert Into @Temp Values(3)
Insert Into @Temp Values(4)
Insert Into @Temp Values(5)
Insert Into @Temp Values(6)
Insert Into @Temp Values(7)

Select	* 
From	@Temp
Where	Number Between 2 and 6

The results of the query above are:
[tt][blue]
Number
------
2
3
4
5
6
[/blue][/tt]

Now for the sargable part. The query I just showed IS sargable because an index (if it exists) could be used to speed up the query. In the previous query, notice that I create a table variable and add some rows. The query is sargable, but there is no index to help the query.

To demonstrate this next part, let's create a real table and load some data in to it.

Code:
Create Table SargableDemo(Number Int)

Insert Into SargableDemo Values(1)
Insert Into SargableDemo Values(2)
Insert Into SargableDemo Values(3)
Insert Into SargableDemo Values(4)
Insert Into SargableDemo Values(5)
Insert Into SargableDemo Values(6)
Insert Into SargableDemo Values(7)

Note that we don't have a primary key or any indexes. Now, let's write a query and see how it performs by examining the execution plan.

Code:
Set SHOWPLAN_ALL On
go

Select	* 
From	SargableDemo
Where	Number Between 2 and 6

go
Set SHOWPLAN_ALL Off

When you run this code, you will not see the results of the query, but you will see the steps that the query engine uses to generate the results. Notice that we get a "Table Scan". Table scans are bad for performance. This is less true for a table with just a small handful of rows, but with large tables, it can make a big difference. Basically, SQL needs to examine each row to determine if that row should be included in the results. Examining every row in a large table can be very time consuming.

Now, let's add an index.

Code:
Create Index idx_SargableDemo_Number On SargableDemo(Number)

Now, let's re-run the previous code:

Code:
Set SHOWPLAN_ALL On
go

Select	* 
From	SargableDemo
Where	Number Between 2 and 6
go
Set SHOWPLAN_ALL Off

Notice that this time we get an index seek instead of a table scan. For such a small table, you should ignore the EstimatedIO and EstimatedCPU columns. But, rest assured that index seeks are better for performance on large tables.

Now let's change the query one more time:

Code:
Set SHOWPLAN_ALL On
go

Select	* 
From	SargableDemo
Where	Number [!]* 1[/!] Between 2 and 6
go
Set SHOWPLAN_ALL Off

Notice that we are now multiplying the number column by 1 before doing the comparison. Multiplying by 1 does not affect the value so the results are the same. Unfortunately, SQL can no longer use an index seek and is now using an index scan. The * 1 example may be a bit goofy (because you could argue that SQL Should be smart enough to recognize it), but the concept is valid. Basically, whenever you use a function on a column, SQL must evaluate the function for each row in the table to determine whether the row should be included in the results.


We're done with this table, so let's remove it.

Code:
Drop Table SargableDemo

When you are trying to improve the performance of a query, the first thing you should do is to make sure the query is sargable. If this does not noticeably improve the performance then it must be that an index does not exist, so the next step MAY be to add an index.

Adding indexes to tables is not something you should take lightly. Indexes exist to make queries faster, but there is a cost associated with indexes. Indexes take storage space and use RAM. Indexes also slow down things like insert, update, and delete (because the index data needs to be maintained). Most of these costs are minor because RAM and disk space are cheap. Indexes slow down inserts, updates and deletes, but the amount of "slow down" is usually quite small. The problem occurs when developers continue to add more and more indexes to a table. Eventually all of these costs will begin to add up and cause a real problem for you.

Lastly, beware of "duplicate" indexes. With SQL, it's possible to create multiple indexes on a table. It's also possible to have multiple identical indexes. This is wasteful of disk, memory, and overall resources. It's pretty easy to find identical indexes, but there are other cases when you can be wasteful without realizing it.

For example, suppose you have an Orders table with a ProductId column that is referenced by another table. Putting an index on the ProductID column will speed up the join. But, suppose there is another query where you decide to add a multi-column index on ProductId and OrderDateTime (with ProductId as the first column in the index). At this point, you have 2 indexes that start with ProductId. You no longer need the first index because SQL can use the 2nd one (almost) as efficiently.

I hope all of this makes sense. If not, let me know and I will explain more.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Well said as always George.

"NOTHING is more important in a database than integrity." ESquared
 
Thanks so much George!!! I would not have realized the multiple index problem especially as in your example.

gmmastros said:
Lastly, beware of "duplicate" indexes. With SQL, it's possible to create multiple indexes on a table. It's also possible to have multiple identical indexes. This is wasteful of disk, memory, and overall resources. It's pretty easy to find identical indexes, but there are other cases when you can be wasteful without realizing it.

For example, suppose you have an Orders table with a ProductId column that is referenced by another table. Putting an index on the ProductID column will speed up the join. But, suppose there is another query where you decide to add a multi-column index on ProductId and OrderDateTime (with ProductId as the first column in the index). At this point, you have 2 indexes that start with ProductId. You no longer need the first index because SQL can use the 2nd one (almost) as efficiently.

Great and understandable explanation of Sargable.

Thanks

John Fuhrman
 
I love it when you find something you didn't even know that you needed to know!

Thank you George.
 
You're welcome. I'm glad you found it useful.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
John,

I would like to point out that just because two execution plans are not the same does not mean the two result sets aren't identical or that the two queries don't have identical semantic meaning.

I could use a LEFT JOIN RightTable with a WHERE RightColumn = NULL or a NOT EXISTS (SELECT 1 FROM RightTable) and they might have different execution plans but the same results.

Or, I could leave a JOIN condition out that doesn't affect the result set but converts an index seek to a scan. So don't get too hung up on execution plans. I have many times seen an execution plan gloss over huge performance differences, and not just costs embedded in function calls (rowset returning or not). You MUST use SQL Profiler in conjunction with execution plans to truly tune properly.
 
Thanks Emtucifor, I am aware that there are always more than one way to get teh same results. It was the difference in the plans that made me ask... Ok I get teh same results using either and the performance and plans are drasticaly different, so what is the difference between the 2.

I haven't played with SQL profiler too much as of yet, so that will be one of the next tools to use and exploit as this project moves along.

Thanks!!!!

Thanks

John Fuhrman
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top