Outer Join vs. Union

DugzDMan · Jun 5, 2003

What is the proper way to pull data from two tables, when I need them joined on some items, but also need the data if it only exists in one table?

Should I use:

Select a.a, b.b
from a full outer join b
on a.blah = b.blah

OR

Select a.a, b.b
from a left outer join b
on a.blah = b.blah
Union
Select a.a, b.b
from a right outer join b
on a.blah = b.blah

OR

Am I completely off base??

Thanks!

Sridharan · Jun 5, 2003

If it exists in one of the tables or from both tables. If it is one of the Table you can use Left or Right Outer Join appropriately. If you want to get it from both the tables then do a full outer join.

Sri

DugzDMan · Jun 5, 2003

Thanks Sri. I want it if it exists in either table. I'm currently using the full outer join, but was curious if that was really they way it should be done. My users are blowing spool and I'm trying to pin down the reason.

Thanks!

BillDHS · Jun 6, 2003

You could be blowing out spool because the first column is being retured as NULL in rows where it does not exist with the second column. All NULLs will go to the same AMP, skewing the spool table and maxing out spool.

Could you add a highly unique id attribute as the first column to each table and bring that along to the union?

DugzDMan · Jun 6, 2003

That is a possibility, the nulls.

What do you mean by adding the unique id as the first column? First item in query, join, ...?

Thanks!!

BillDHS · Jun 6, 2003

Can you add a first column that is a sequential set of numbers to use for distribution?

TableA
TA.Sequence # TA.col1
TableB
TB.Sequence # TB.col1

Select
TA.Sequence# || TB.Sequence#,
TA.Col1,
TB.Col1

From TA cross Join TB
ON .....

DugzDMan · Jun 6, 2003

Are you saying add this (seq #) to the database, then use that in the join? Or select that item first in the query?

I've never done this, so I'm not quite following.

dnoeth · Jun 6, 2003

Sorry, BillDHS, but this is nonsense.
He's not creating a table with the result of the outer join. Of course this would skew perm space but only if the column containing lots of Nulls was the PI. Using a more unique column (or multiple columns) as PI would definitly help in that case.
But it's a spool and if it's redistributed on that skewed column as preparation for a join, it doesn't matter if it's the first column. If the optimizer knows about skewness it may produce a different plan avoiding that.
This is one of the cases where a huge spool may be not enough just because of 1 very common value in a join...

Maybe DugzDMan should post the explain and/or the query (if he's allowed to do that)

Dieter

DugzDMan · Jun 6, 2003

I've gotten around the problem for the moment. I just ran the reports for them, then saved them in pdf format. I'll have to work on the query some more, but I'm going to be on vacation for a week. Ah, a week away from work

Also a week away from the Internet :-(

Thanks for the help guys. I'll look at this more when I get back next week.

BillDHS · Jun 8, 2003

Dnoeth:

How does TD distribute spool tables? Since no PI is defined doesn't it use the first column? Does it analyze the uniqueness? Wouldn't a NULL first column in spool cause the skew I was trying to limit?

dnoeth · Jun 8, 2003

"How does TD distribute spool tables?"
It depends...

'... Spool x, which is duplicated on all AMPs.'
Copy the whole spool to all AMSp, maybe huge amount of rows on each AMP, but no skew

'... Spool x, which is built locally on the AMPs.'
No distribution necessary,
e.g. because it's already distributed by the right column (preparation for a merge join on PI columns, there may be skew, but then the table is skewed also)
or a product/hash join follows
or it's the result spool.

'... Spool x, which is redistributed by hash code to all AMPs.'
This is the important one,
e.g. doing a DISTINCT (PI on all columns)
or preparation for a join (PI of the spool on the columns used in the following join step)
This may result in skew:
If you join on tab1.state = tab2.state and there are lots of 'CA' customers, all rows with 'CA' will be sent to one AMP. The same for customers outside US having state NULL, but in that case the optimizer may add a 'state IS NOT NULL' to the WHERE-clause.

"Wouldn't a NULL first column in spool cause the skew I was trying to limit?"
Only if it was the only column used to distribute the spool and it would be the same, if it was column 28 ;-)

Dieter

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Outer Join vs. Union

DugzDMan

Programmer

Sridharan

Technical User

DugzDMan

Programmer

BillDHS

Programmer

DugzDMan

Programmer

BillDHS

Programmer

DugzDMan

Programmer

dnoeth

Instructor

DugzDMan

Programmer

BillDHS

Programmer

dnoeth

Instructor

Similar threads

Part and Inventory Search

Sponsor