Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Union vs. Union All

Status
Not open for further replies.

snitin78

Programmer
Mar 16, 2009
96
US
I have a stored procedure with couple of CTEs and 1 of the CTEs has a section something like below:

Code:
SELECT
     DISTINCT st1.user_id as user_id
		from 
			  someTable st1 WITH(NOLOCK)
		where 
			  st_nid = 5 and st1.some_other_frn_id2 is null
		union
			SELECT user_id from 
               ( SELECT sot1.user_id user_id,
                        sot1.id
                        FROM    some_other_table1 sot1 WITH (NOLOCK)
                        WHERE   user_id BETWEEN     100000000000 AND 200000000000
               ) allUsers
               INNER JOIN
               ( SELECT some_other_id2               
                 FROM    some_other_table2 WITH (INDEX(someIndex), NOLOCK)
                         INNER JOIN someTable st2 on some_other_frn_id2 = some_other_id2
                     ) requiredUsers
                   ON      sot1.id = some_other_id2

Here, the Union part is written in a way so that the indexes on those tables are utilized in an efficient way.

Here is the problem that I am facing. This section of the Stored Procedure has been working perfectly well for the given parameters (5, 10000... etc). However, after some data munging operations on the super set i.e. the users, the UNION just fails to return, even though each of the selects work perfectly when run individually. To exasperate the situation, the whole query starts working if I change UNION to UNION ALL!!

Any suggestions as to what could be going wrong here? The logic works perfectly fine for other set of parameters. It is just this set (for which data munging was performed) that is hanging. This kind of logic is there in quite a few different Stored Procedures and I don't wanna change UNION to UNION all everywhere. What things can I/should check for?

Nitin
 
I would always use UNION ALL unless you specifically need to use UNION. UNION will take out duplicates. If you already know that none will exist, then use UNION ALL as it will be faster.
 
Thanks for the reply, RiverGuy. I'll start replacing UNION with UNION ALL everywhere :)

Nitin
 
I had a question on the interview once to describe a situation when UNION will perform better than UNION ALL.

I'll address this question now to users of tek-tips.
 
Hi Guys,

I am still stuck, perplexed and (really) frustrated :-( So I was hoping if I can get some more pointers.

What is happening is that when I change UNION to UNION ALL (as suggested above) the query above starts working for a certain set of parameters, basically for users of a company, that return around 1600 records. However, the same query with UNION ALL slows down (almost hangs) for another company that has around 79K records. If I change back to UNION the response is totally opposite i.e. it works for the company with 79K records and just HANGS (so much so that even Database Tuning advisor is stuck) for the company with 1600 records!!!!

What is even more funny is that if I try the same query WITH UNION on a different DB on the same server it works perfectly well for both companies.

I should mention that the DB where the queries are not working were recently restored from a PRODUCTION copy while the one where they are working is a stale copy. Given this, I thought may the indexes are screwed up because of the restore...so rebuilt couple of them but in vain.

What could be going wrong here? I know it's really hard to debug this problem without the DB/table/index structure and/or the data, but any help is much appreciated.

Thanks.

Nitin
 
There are several things going on in your query. Firstly, you are using the NOLOCK hint a lot. This tells me that you probably have a pretty busy or poorly designed system with lots of inserts and updates holding locks and you are returning or having to examine many rows. You're also using a specific index hint with someIndex.

You should probably look to rewriting and optimizing your query from scratch. You might need new or altered indexes. The bottom line is that the way SQL Server is designed, you shouldn't have to do these many workarounds in a SELECT statement to make the query responsive.

And finally, without knowing any more, I would speculate that your UNION ALL is taking longer in your one test case simply because it takes time to move a lot of data to the client app. If you do have a bunch of duplicates and are not filtering them out with since you have changed to UNION ALL, then it's going to take longer to move a larger resultset across the network than a smaller one.
 
RiverGuy, I agree with you on that this is indeed a pretty busy and somewhat poorly designed system...but different behavior of the same query on different instances of the same DB is still a bit puzzling.

While using NOLOCK is a directive from our DBA, I provided the index hint(s) with the hope of making the whole stored procedure a bit faster, if at all.

As for the uniqueness of records for a larger dataset, there is indeed a distinct on the select and I don't think that the records are repeated as a result of the UNION ALL, at least not for the company that I am trying with.

Having said that, I think I am gonna spend a little bit more to unravel this mystery and then just get on to re-writing the query in (hopefully) an optimized way.

Oh the pain of coding against a poor DB :-(

Thanks for you suggestions once again...

Nitin
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top