Hi,
I have a table: SAMPLEDATES with one column THEDATE which contains every date in my database. I am looking for a way to find the missing ranges in my database.
The table contains 29,000 distinct dates and the table is indexed.
I found this script online which works great when running against 1000 dates or so, but when running against 4000 dates it takes 10 minutes before I just stop it. Running against the 29000 dates ran for 24 hours before I just stopped it.
Here is the script:
Running the first part finished in a second with 7500 results: SELECT t1.col1 AS startOfGap, MIN(t2.col1) AS endOfGap
FROM
(SELECT col1 = theDate + 1 FROM sampleDates tbl1
WHERE NOT EXISTS(SELECT * FROM sampleDates tbl2
WHERE tbl2.theDate = tbl1.theDate + 1)
AND theDate <> (SELECT MAX(theDate) FROM sampleDates)
Running the second part also runs in a second with 7500 results: SELECT col1 = theDate - 1 FROM sampleDates tbl1
WHERE NOT EXISTS(SELECT * FROM sampleDates tbl2
WHERE tbl1.theDate = tbl2.theDate + 1)
AND theDate <> (SELECT MIN(theDate) FROM sampleDates)
So I guess the delay is coming in the inner join?
Can anyone see a way to optimize my script or even come up with a better one?
Thanks!
Brian
I have a table: SAMPLEDATES with one column THEDATE which contains every date in my database. I am looking for a way to find the missing ranges in my database.
The table contains 29,000 distinct dates and the table is indexed.
I found this script online which works great when running against 1000 dates or so, but when running against 4000 dates it takes 10 minutes before I just stop it. Running against the 29000 dates ran for 24 hours before I just stopped it.
Here is the script:
Code:
SELECT t1.col1 AS startOfGap, MIN(t2.col1) AS endOfGap
FROM
(SELECT col1 = theDate + 1 FROM sampleDates tbl1
WHERE NOT EXISTS(SELECT * FROM sampleDates tbl2
WHERE tbl2.theDate = tbl1.theDate + 1)
AND theDate <> (SELECT MAX(theDate) FROM sampleDates)) t1
inner JOIN
(SELECT col1 = theDate - 1 FROM sampleDates tbl1
WHERE NOT EXISTS(SELECT * FROM sampleDates tbl2
WHERE tbl1.theDate = tbl2.theDate + 1)
AND theDate <> (SELECT MIN(theDate) FROM sampleDates)) t2
ON t1.col1 <= t2.col1
GROUP BY t1.col1
Running the first part finished in a second with 7500 results: SELECT t1.col1 AS startOfGap, MIN(t2.col1) AS endOfGap
FROM
(SELECT col1 = theDate + 1 FROM sampleDates tbl1
WHERE NOT EXISTS(SELECT * FROM sampleDates tbl2
WHERE tbl2.theDate = tbl1.theDate + 1)
AND theDate <> (SELECT MAX(theDate) FROM sampleDates)
Running the second part also runs in a second with 7500 results: SELECT col1 = theDate - 1 FROM sampleDates tbl1
WHERE NOT EXISTS(SELECT * FROM sampleDates tbl2
WHERE tbl1.theDate = tbl2.theDate + 1)
AND theDate <> (SELECT MIN(theDate) FROM sampleDates)
So I guess the delay is coming in the inner join?
Can anyone see a way to optimize my script or even come up with a better one?
Thanks!
Brian