Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

New question on SQL help using JOIN

Status
Not open for further replies.

etishler

Programmer
Mar 18, 2004
5
US
I posted the Need SQL help on query yesterday and got it to work, but the performance is slow.

Someone from a different forum suggested the use of a JOIN to improve performance.

What they suggested was:

SELECT Subset.*
FROM Subset
LEFT OUTER JOIN Master ON Subset.ID = Master.ID
WHERE Master.ID IS NULL

Unfortunately my Master.ID is never NULL.

Any ideas on how to redo the query ...

SELECT * FROM Subset
WHERE
Subset.ID STARTING WITH 'V' AND
(
NOT EXISTS
(
SELECT * FROM Master WHERE
Master.ID = Subset.ID
)
)

... using a JOIN (instead of the embedded SELECT) to improve performance?

Thanks,

Eric

Eric Tishler
Software Architect
Resolute Partners, LLC
 
Have you tried this ?
SELECT * FROM Subset
WHERE
Subset.ID STARTING WITH 'V' AND
(
Subset.ID NOT IN
(
SELECT Master.ID FROM Master
)
)


Hope This Help, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884
 
I just tried this and althogh the query ran in less than one second (wow) is returned 0 rows :-(

I modified the query (this time with real table and field names) as ...

SELECT * FROM "CCardTransactions"
WHERE
"CCardTransactions"."CCardTransID" STARTING WITH 'V' AND
(
"CCardTransactions"."CCardTransID" NOT IN
(
SELECT "Transactions"."CrossReference"
FROM "Transactions"
WHERE
"Transactions"."CrossReference" STARTING WITH 'V'
)
)

... and got the one row back I expected, but this took 7 seconds to run. This is just a test database and I could be appying this same query on potentially much larger databases, so I need to improve on this performance ...


Eric Tishler
Software Architect
Resolute Partners, LLC
 
Have you tried the JOIN that you got on the other forum? You state that "... Master.ID is never NULL ..." but that's not really what the query is about. In that query
[tt]
SELECT Subset.*
FROM Subset
LEFT OUTER JOIN Master ON Subset.ID = Master.ID
WHERE Master.ID IS NULL
[/tt]

The LEFT JOIN retrieves ALL records from table "SubSet" and, for any record in SubSet that does not having a matching ID in Master, a NULL Master.ID will be returned. That NULL ID is not in the Master table but is generated by SQL as something to match to the record in SubSet. The query equates in words to "... return all the records in SubSet that do not have a matching record in Master ...". This (assuming that your fields are indexed) should be much faster than any sub-query based solution.
 
I had to add an additional qualifier in the last where clause which probably slowed things down. Still, it worked and took about 5.5 seconds.

Eric Tishler
Software Architect
Resolute Partners, LLC
 
Just another note on the performance implications here. Some of the more potent DBMSs have quite sophisticated query optimizers (e.g. SQL Server, Oracle) and they may be running pretty much the same code "under the covers" regardless of whether you used a join or a sub-query. To really figure out what the system is doing with your query you should look at the system's ability to provide an execution plan for it. Each system has it's own way of doing that and you will need to check out how its done in your DBMS.
 
If you have an additionnal where clause this means that you are restrincting the records tyou retrieve.
It's a lot better to do this before the join than after as this will reduce a lot the size of the work the DBMS has to do:

select
xxx
from
table A
join
table B
on yyy
where test is true

have to be done if possible

select
xxx
from
(table A where test is true) as testedTableA
join
table B
on yyy

Assume you have 1000 records retieved in tableA, 1000 in tableB and the check retrieve 10% of the records previously found in tableA
You have in the first solution 1000x1000 problem
In the second 100x1000
As the work is usually in a N*ln(N) (at least) calculation work size, a division by 10 implies a division by nearly 35 times!

So the best is to use golom suggestion but add the where clause in the derived query!


--------------------------------------------------
[highlight]Django[/highlight] [thumbsup]
bug exterminator
tips'n tricks addict
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top