Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help making SQL run more efficiently 2

Status
Not open for further replies.

VickyC

Technical User
Sep 25, 2010
206
0
0
CA
hi to all

I have SQL code that works, but slowly. Because I have to run this code very many times, I'm hoping to get it to work more efficiently (ie: faster).

Here's a mini-version of the question that captures my problem:

Table TEST has the following records. [tt]

K KVal
1 10
2 25
3 25
4 25
5 37[/tt]

I want to write SQL to find all DISTINCT PERMUTATIONS of the 5 KVal values.

Here's how I do this now...[tt]

SELECT DISTINCT Q.K0, Q.K1, Q.K2, Q.K3, Q.K4
FROM (
SELECT T0.KVal AS K0, T1.KVal AS K1, T2.KVal AS K2, T3.KVal AS K3, T4.KVal AS K4
FROM TEST as T0, TEST AS T1, TEST AS T2, TEST AS T3, TEST AS T4
WHERE (
((T0.K<>T1.K) AND (T0.K<>T2.K) AND (T0.K<>T3.K) AND (T0.K<>T4.K)) AND
((T1.K<>T2.K) AND (T1.K<>T3.K) AND (T1.K<>T4.K)) AND
((T2.K<>T3.K) AND (T2.K<>T4.K)) AND
((T3.K<>T4.K))
)
) AS Q
ORDER BY Q.K0, Q.K1, Q.K2, Q.K3, Q.K4; [/tt]

The inner SELECT gives 5! = 120 records, but there are 6 of each (because the three values of 25 come in 3! = 6 permutations, which are otherwise indistinguishable). The outer SELECT eliminates these duplicates to give the desired 20 records.

So, what's the problem? My actual table TEST has 10 rows, not 5. This means that the inner SELECT must compute 10! = 3628800 records. If, say, I had a 'triple' and two 'doubles' in the values of KVal, each record would be shown 3!*2!*2! = 24 times. After these duplicates are removed by the outer SELECT, there would be 3628800/24 = 151200 records remaining.

My machine runs this query in just over 4 minutes, but I need to run it for thousands of different sets of 10 KVal values, so I'm looking to improve the SQL's efficiency.

I'd really appreciate some assistance with this!

Thanks, Vicky C.

 
What about this ?
Code:
SELECT T0.KVal AS K0, T1.KVal AS K1, T2.KVal AS K2, T3.KVal AS K3, T4.KVal AS K4
  FROM TEST as T0, TEST AS T1, TEST AS T2, TEST AS T3, TEST AS T4
 WHERE T0.K<>T1.K AND T0.K<>T2.K AND T0.K<>T3.K AND T0.K<>T4.K
   AND T1.K<>T2.K AND T1.K<>T3.K AND T1.K<>T4.K
   AND T2.K<>T3.K AND T2.K<>T4.K
   AND T3.K<>T4.K
GROUP BY T0.KVal, T1.KVal, T2.KVal, T3.KVal, T4.KVal
ORDER BY T0.KVal, T1.KVal, T2.KVal, T3.KVal, T4.KVal

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
hi PHV - Thanks for the attempt! When I run this code (using the actual problem with 10 KVal values), it took almost exactly the same time as the code I'd sent, about 4m 25s. It could well be that there is no good way to improve the speed by much.

Thanks
Vicky C.
 
And this ?
Code:
SELECT DISTINCT T0.KVal AS K0, T1.KVal AS K1, T2.KVal AS K2, T3.KVal AS K3, T4.KVal AS K4
  FROM TEST as T0, TEST AS T1, TEST AS T2, TEST AS T3, TEST AS T4
 WHERE T0.K<>T1.K AND T0.K<>T2.K AND T0.K<>T3.K AND T0.K<>T4.K
   AND T1.K<>T2.K AND T1.K<>T3.K AND T1.K<>T4.K
   AND T2.K<>T3.K AND T2.K<>T4.K
   AND T3.K<>T4.K

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
PH - thanks again for the effort. When I run this SQL on my actual problem (that uses 10 values of KVal), it takes 4m 27s instead of 4m25s for the original code. This surprised me because there was no subquery in your code, but I suspect the DISTINCT forces a check of records after each new 'candidate' record is produced. It turns out to be slightly faster to produce all the records in a subquery, then eliminate the duplicates in the 'outer' query. I suspect the original routine is about as fast as possible, unfortunately.

 
If you only have 1 number that is repeated, you can add a column to the test table called DupSeq (duplicate sequence), and set it to indicate the sequence order of the repeated numbers like this:

K KVal DupSeq
1 10
2 25 A
3 25 B
4 25 C
5 37

Then you can add criteria like these to ensure the repeated numbers appear only in the DupSeq order which will prevent the duplicate rows without needing to use Select Distinct:

... And NZ(T0.DupSeq,'') & NZ(T1.DupSeq,'') & NZ(T2.DupSeq,'') & NZ(T3.DupSeq,'') & NZ(T4.DupSeq,'') = 'ABC' ...

If you can have multiple repeated numbers, it would get more complicated. One idea would be to use integers for DupSeq and add criteria like this for each of the "not-equal" comparisons you already have:

... And ( (T0.KVal <> T1.KVal) Or (NZ(T0.DupSeq,0) < NZ(T1.DupSeq,100)) ) And ...

 
hey JonFer - That's a very interesting idea. I'm looking forward to trying this when I'm back home in a few days.

Thanks
Vicky C.
 
hi JonFer - this idea works well, but it is still a few seconds slower than the original SQL (when used in my actual problem that uses 10 values of KVal).

I'm becoming convinced that the fastest way to find large sets of DISTINCT permutations is to... 1) use SQL to find all permutations of the 10 values, then 2) use SQL to remove the doubles. This is still the fastest way of the 6 or so approaches I've used (thanks to much help from this forum).

Still, I wonder if there's a more efficient way. In an extreme case, I might have the 10 values of KVal (56, 56, 56, 56, 56, 56, 56, 56, 56, 7). My approach would find all 10! = 3628800 permuations, then divide by 9! = 362880, because the 9 values of 56 are indistinguishable from each other, leaving an answer of 10. Thats a lot of work given that the answer, in this case, was obvious from the start (the 7 can only be placed in the set of 56s in 10 ways). My real problem is that the sets of 10 values that I have to process can have any number of repeats.

Thanks for the assistance
Vicky C.
 
I think if you're starting with generating all permutations and then filtering out the duplicates, it will be difficult to improve. You want to think of a way to reduce the initial number of records generated. Maybe something like generating combinations of only the unique values first and then making the final combinations in a second step? The first part is simple but I imagine the second step would get complicated trying to handle multiple repeats and varying numbers of repeats.
 
It might be worth looking at temporary work tables. I will use pseudocode rather than pure SQL to illustrate the idea.

INSERT INTO TEMP1 (fields F1,F2) FROM TEST unique combinations where K is 1 or 2
INSERT INTO TEMP2 (fields F1, F2, F3) FROM TEST, TEMP1 unique combinations for F1, F2, and the Kval for K=3
INSERT INTO TEMP3 (fields F1, F2, F3, F4) FROM TEST, TEMP2 unique combinations of F1, F2, F3 and Kval for K=4
etc
Whilst this is not as elegant as a simple SQL solution it is potentially more efficient because duplicates are removed at an earlier stage. The last TEMP table is the answer. Between runs you could simply delete the contents of the temp tables. To avoid bloat you might want to compact before starting a run.

Ken
 
thanks JonFer and Cheerio - I made some good progress with both of these ideas, but was not able to reach the optimal time in the originally posted SQL.

Here's something interesting - I rewrote the original SQL (for my actual problem that uses 10 values of KVal) but used multiple Inner Joins with compound ON statements instead of using a Subquery with compound WHERE statements. To my surprize, the new routine took exactly the same time to run as the original SQL (4m 25s). I don't know much about the internal workings of Jet, but I wonder if it uses the same optimal plan regardless of whether my SQL is based on Inner Joins or a Subquery???

Just wonderin'
Vicky C.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top