Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Set Operations vs Subqueries

Status
Not open for further replies.

LaurenM

Programmer
Jun 5, 2006
62
CA
I'm running a bunch of queries comparing data in two different tables, the my usual way of doing it is to use subqueries, select waht I need from each table and compare those results, however these tables are very large, 30,000 records each, so they take a long time to return results. I have been playing with Oracle's set operations, and I love the speed however they are returning way to much information, I want the records that are common to both tables, and I haven't been able to do that yet, I either get everyone in both tables or I get nothing.

A basic example of what I am trying to do:
Code:
select *
from
    (select a,b,c,d
    from x
    )a,
    (select a,b,c,d
    from y
    )b
where
    a.a = b.a
Columns a,b,c are identifying information like name and employee number, and column d is the information I am looking at. If there is a way to do this with set operations, I would love the hint.
Lauren

Lauren McDonald, B.Sc.
Programmer Analyst
Saskatchewan Healthcare Employees' Pension Plan

 
Lauren,

the following script
Code:
DROP TABLE LAUREN1;
DROP TABLE LAUREN2;

CREATE TABLE LAUREN1
(
 COLUMN_A VARCHAR2(10),
 COLUMN_B VARCHAR2(10),
 COLUMN_C VARCHAR2(10),
 COLUMN_D VARCHAR2(10)
);

CREATE TABLE LAUREN2
(
 COLUMN_A VARCHAR2(10),
 COLUMN_B VARCHAR2(10),
 COLUMN_C VARCHAR2(10),
 COLUMN_D VARCHAR2(10)
);

INSERT INTO LAUREN1 (COLUMN_A,COLUMN_B,COLUMN_C,COLUMN_D) VALUES ('L1A','L1B','L1C','L1D');
INSERT INTO LAUREN1 (COLUMN_A,COLUMN_B,COLUMN_C,COLUMN_D) VALUES ('L2A','L2B','L2C','L2D');
INSERT INTO LAUREN1 (COLUMN_A,COLUMN_B,COLUMN_C,COLUMN_D) VALUES ('L3A','L3B','L3C','L3D');
INSERT INTO LAUREN1 (COLUMN_A,COLUMN_B,COLUMN_C,COLUMN_D) VALUES ('L4A','L4B','L4C','L4D');


INSERT INTO LAUREN2 (COLUMN_A,COLUMN_B,COLUMN_C,COLUMN_D) VALUES ('L1A','L1B','L1C','L1D');
INSERT INTO LAUREN2 (COLUMN_A,COLUMN_B,COLUMN_C,COLUMN_D) VALUES ('L2A','L2B','L2C','L2D');
INSERT INTO LAUREN2 (COLUMN_A,COLUMN_B,COLUMN_C,COLUMN_D) VALUES ('L3A','L3B','L3C','L3D');
INSERT INTO LAUREN2 (COLUMN_A,COLUMN_B,COLUMN_C,COLUMN_D) VALUES ('L4A','L4B','L4C','L4D');

SELECT * 
  FROM LAUREN1 L1
INTERSECT
SELECT *
  FROM LAUREN2;

produces the following output:-

Code:
COLUMN_A   COLUMN_B   COLUMN_C   COLUMN_D   
---------- ---------- ---------- ---------- 
L1A        L1B        L1C        L1D        
L2A        L2B        L2C        L2D        
L3A        L3B        L3C        L3D        
L4A        L4B        L4C        L4D        

4 rows selected

So, if you want matching records from the two tables, just use intersect. If this is pulling in more info than you want, just select the columns you want from the intersection query.

For example
Code:
SELECT COLUMN_A, COLUMN_D
  FROM 
(
SELECT * 
  FROM LAUREN1 L1
INTERSECT
SELECT *
  FROM LAUREN2
);

produces

Code:
COLUMN_A   COLUMN_D   
---------- ---------- 
L1A        L1D        
L2A        L2D        
L3A        L3D        
L4A        L4D        

4 rows selected

Let us know how you get on.

Regards

Tharg

Grinding away at things Oracular
 
Lauren,

As a clarification, are the values of column D shared between tables x and y, or do the values of D differ when a, b, and c match? If the latter, then which columns do you want to see from which tables?


[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
I guess I didn't clarify myself per usual, the values in column D are supposed to be the same, but I am looking for instances where they are not the same, so if column D held employee's eye colour let's say, then I am looking for employees where their eye colour is different in the two tables, and when they are I have to find out why they are different. So, I need to show the value in table x and the value in table y, and the difference between the two, which means the intersect set operation would not work since I need both values, and when I try to fool the set operation with dummy values, I get zero records returned, because column d does not have any nulls in it.
Code:
select *
from
    (select a,b,c,d,to_number(null)
    from x
    intersect
    select a,b,c,to_number(null),d
    from y
    )b
So Santa, it doesn't matter which tables a,b,c come from just that I get enough info to id the person.

Thanks Tharg, for the help.


Lauren McDonald, B.Sc.
Programmer Analyst
Saskatchewan Healthcare Employees' Pension Plan

 
I don't know if I mentioned it, but not all employees are in both tables, between the two all employees are covered, but for the differences in the two tables, I just want the ones that are in both. The are legitmate reasons why an employee is only in one and not in the other.

Lauren McDonald, B.Sc.
Programmer Analyst
Saskatchewan Healthcare Employees' Pension Plan

 
Lauren, (with a 'Thank you very much' to Tharg for seeding tables with data for me), here are some table values:
Code:
SQL> select * from lauren1;

COLUMN_A   COLUMN_B   COLUMN_C   COLUMN_D
---------- ---------- ---------- ----------
L1A        L1B        L1C        L1D
L2A        L2B        L2C        L2D
L3A        L3B        L3C        L3D
L4A        L4B        L4C        L4Diff

SQL> select * from lauren2;

COLUMN_A   COLUMN_B   COLUMN_C   COLUMN_D
---------- ---------- ---------- ----------
L1A        L1B        L1C        L1D
L2A        L2B        L2C        L2Diff
L3A        L3B        L3C        L3D
L4A        L4B        L4C        L4D
SQL>
With the above values, our code should give us two rows (L2 and L4) with COLUMN_D values from both tables.

Here is set-operator code that does that:
Code:
select x.COLUMN_A, x.COLUMN_B, x.COLUMN_C, x.COLUMN_D "LAUREN1_D", Y.COLUMN_D "LAUREN2_D"
  from lauren1 x, lauren2 y
 where EXISTS
       (SELECT NULL
          FROM ((SELECT COLUMN_A, COLUMN_B, COLUMN_C
                   FROM LAUREN1 L1
                 INTERSECT
                 SELECT COLUMN_A, COLUMN_B, COLUMN_C
                   FROM LAUREN1 L2)
                MINUS
                (SELECT COLUMN_A, COLUMN_B, COLUMN_C
                   FROM (select * from LAUREN1 L1
                         intersect
                         select * from LAUREN2 L2)
                )
               ) Z
        WHERE x.column_a = z.column_a
          and y.column_a = z.column_a
          and x.column_b = z.column_b
          and y.column_b = z.column_b
          and x.column_c = z.column_c
          and y.column_c = z.column_c
       )
/

COLUMN_A   COLUMN_B   COLUMN_C   LAUREN1_D  LAUREN2_D
---------- ---------- ---------- ---------- ----------
L2A        L2B        L2C        L2D        L2Diff
L4A        L4B        L4C        L4Diff     L4D
Let us know if this is what you wanted.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
BTW, Lauren, your instincts to use SET OPERATORS versus SUBQUERIES is excellent: Oracle SET OPERATORS are amazingly fast and much, much more efficient than any other method of querying/joining contents from multiple tables. Good onya!

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
I found them by accident, and now I am trying structure my queries to use set opertions, just because how fast they are.

Lauren McDonald, B.Sc.
Programmer Analyst
Saskatchewan Healthcare Employees' Pension Plan

 
Lauren,

I can only second Santa's sentiments, to make an allliterative allusion.

However, you said the tables were "very large" and contained 30,000 rows. In Oracle terms, that's usually trivial to the point of inconsequentiality, so can you check that you've got some indexes on these tables. If you're looking for performance gains at such low numbers, you may have serious trouble later on.

Just my three hap'orth.

Regards

Tharg

Grinding away at things Oracular
 
That's 30,0000 rows per year. These are annual tables that are populated on an yearly basis. I am just looking at them one year at a time.

Lauren McDonald, B.Sc.
Programmer Analyst
Saskatchewan Healthcare Employees' Pension Plan

 
I must be missing something. Why do you need to use subqueries or set operations when you can just join the tables:
Code:
SELECT x.COLUMN_A, x.COLUMN_B, x.COLUMN_C, x.COLUMN_D "LAUREN1_D", Y.COLUMN_D "LAUREN2_D"
FROM   lauren1 x, lauren2 y
WHERE x.column_a = y.column_a
AND   x.column_b = y.column_b
AND   x.column_c = y.column_c
AND   NVL(x.column_d,'~') != NVL(y.column_d,'~')
/
That ought to work pretty quick (assuming you have indexes on appropriate columns) and it's a lot easier to see what's going on when you come back to this code 6 months down the line.

-- Chris Hunt
Webmaster & Tragedian
Extra Connections Ltd
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top