Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Find missing date ranges - help with script

Status
Not open for further replies.

bmacbmac

IS-IT--Management
Jan 26, 2006
392
US
Hi,

I have a table: SAMPLEDATES with one column THEDATE which contains every date in my database. I am looking for a way to find the missing ranges in my database.

The table contains 29,000 distinct dates and the table is indexed.

I found this script online which works great when running against 1000 dates or so, but when running against 4000 dates it takes 10 minutes before I just stop it. Running against the 29000 dates ran for 24 hours before I just stopped it.

Here is the script:
Code:
SELECT t1.col1 AS startOfGap, MIN(t2.col1) AS endOfGap  
   FROM  
   (SELECT col1 = theDate + 1  FROM sampleDates tbl1  
      WHERE NOT EXISTS(SELECT * FROM sampleDates tbl2  
                      WHERE tbl2.theDate = tbl1.theDate + 1) 
      AND theDate <> (SELECT MAX(theDate) FROM sampleDates)) t1 
   inner JOIN  
   (SELECT col1 = theDate - 1  FROM sampleDates tbl1  
      WHERE NOT EXISTS(SELECT * FROM sampleDates tbl2  
                      WHERE tbl1.theDate = tbl2.theDate + 1) 
      AND theDate <> (SELECT MIN(theDate) FROM sampleDates)) t2  
   ON t1.col1 <= t2.col1 
   GROUP BY t1.col1

Running the first part finished in a second with 7500 results: SELECT t1.col1 AS startOfGap, MIN(t2.col1) AS endOfGap
FROM
(SELECT col1 = theDate + 1 FROM sampleDates tbl1
WHERE NOT EXISTS(SELECT * FROM sampleDates tbl2
WHERE tbl2.theDate = tbl1.theDate + 1)
AND theDate <> (SELECT MAX(theDate) FROM sampleDates)


Running the second part also runs in a second with 7500 results: SELECT col1 = theDate - 1 FROM sampleDates tbl1
WHERE NOT EXISTS(SELECT * FROM sampleDates tbl2
WHERE tbl1.theDate = tbl2.theDate + 1)
AND theDate <> (SELECT MIN(theDate) FROM sampleDates)

So I guess the delay is coming in the inner join?

Can anyone see a way to optimize my script or even come up with a better one?

Thanks!

Brian
 
Just to be clear, suppose you had the following data:


1/1/2010
1/2/2010
1/3/2010
1/7/2010
1/10/2010
1/11/2010
1/12/2010
1/14/2010
1/15/2010

The output you would like to see is...

1/4/2010 - 1/6/2010
1/8/2010 - 1/9/2010
1/13/2010 - 1/13/2010

Do I understand this correctly?




-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
The table contains 29,000 distinct dates

That spans approximately 80 years. Are you sure this is correct?

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Try this query.

Code:
Select  StartDates.StartDate, EndDates.EndDate
From    (
        Select  DateAdd(Day, 1, A.TheDate) As StartDate,
                Row_Number() Over (Order BY A.TheDate) As RowId
        From    SampleDates As A
                Left Join SampleDates As B
                  On A.TheDate = DateAdd(Day, -1, B.TheDate)
        Where   b.TheDate Is NULL
        ) As StartDates
        Inner Join (
            Select DateAdd(Day, -1, A.TheDate) As EndDate, 
                   Row_Number() Over (Order By A.TheDate) - 1 As RowId
            From   SampleDates As A
                   Left Join SampleDates As B
                     On A.TheDate = DateAdd(Day, 1, B.TheDate)
            Where  B.TheDate Is NULL
            ) As EndDates
            On StartDates.RowId = EndDates.RowId
Order By StartDates.StartDate

I tested this with a SampleDates table that had 22,000 distinct dates spanning Jan 1970 to February 2052. The query (with my sample data) returned 5,878 rows in 0.2 seconds.

Your original query is compatible back to SQL2000 (and probably versions before that). The query I show here requires SQL2005 or newer.

I am reasonably certain that this will produce the correct output, but I strongly encourage you to double and triple check that the results are correct.

Please let me know if this query generates the correct output, and also the time it takes to run. I'm super curious. If everything appears to be working well, I will respond with a full explanation.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Oh ya that worked perfectly and ran in a second. I compared to the original individual script results and they were spot on.

Thanks!!!!
 
Well... I guess it's time to explain how this works. Let's use the example data I showed earlier:

[tt]1/1/2010
1/2/2010
1/3/2010
1/7/2010
1/10/2010
1/11/2010
1/12/2010
1/14/2010
1/15/2010[/tt]

Since we want to find a range of dates with a gap, it will help if we separate this in to 3 separate problems.

1. Find the date where a gap starts
2. Find the date where a gap end.
3. Combine this data

1. Let's start by finding where the gap starts. With the data shown above, the first gap would start on 1/4/2010. To accomplish this, we can join this data back to itself based on the date joined to the date -1 day, like this:

Code:
Set NOCOUNT ON

Declare @Temp Table(TheDate DateTime)

Insert Into @Temp Values('1/1/2010')
Insert Into @Temp Values('1/2/2010')
Insert Into @Temp Values('1/3/2010')
Insert Into @Temp Values('1/7/2010')
Insert Into @Temp Values('1/10/2010')
Insert Into @Temp Values('1/11/2010')
Insert Into @Temp Values('1/12/2010')
Insert Into @Temp Values('1/14/2010')
Insert Into @Temp Values('1/15/2010')

Select  A.TheDate
From    @Temp As A
        Left Join @Temp  As B
            On A.TheDate = DateAdd(Day, -1, B.TheDate)
Where   b.TheDate Is NULL

The results are:
[tt]
2010-01-03
2010-01-07
2010-01-12
2010-01-15
[/tt]

Notice that this is actually returning the day before the gap begins, so we should add 1 to the date.

Code:
Select  [!]DateAdd(Day, 1,[/!] A.TheDate[!])[/!]
From    @Temp As A
        Left Join @Temp  As B
            On A.TheDate = DateAdd(Day, -1, B.TheDate)
Where   b.TheDate Is NULL

[tt]
2010-01-04
2010-01-08
2010-01-13
2010-01-16
[/tt]

This is now returning the first date of the gap. Now, let's number the rows, like this:

Code:
Select  DateAdd(Day, 1, A.TheDate),
        [!]Row_Number() Over (Order By A.TheDate)[/!]
From    @Temp As A
        Left Join @Temp  As B
            On A.TheDate = DateAdd(Day, -1, B.TheDate)
Where   b.TheDate Is NULL

[tt]
2010-01-04 1
2010-01-08 2
2010-01-13 3
2010-01-16 4
[/tt]

Notice how this is showing 4 rows, but we only have 3 gaps. The last row exists because there is no more rows after the last row, causing it to be included in the result set. Let's ignore this for now (we will get back to this shortly).

2. Let's find where the gap end. The process is similar to the previous query, but instead of joining the table to itself based on Date -1 day, we'll join on day +1 day, like this:

Code:
Set NOCOUNT ON

Declare @Temp Table(TheDate DateTime)

Insert Into @Temp Values('1/1/2010')
Insert Into @Temp Values('1/2/2010')
Insert Into @Temp Values('1/3/2010')
Insert Into @Temp Values('1/7/2010')
Insert Into @Temp Values('1/10/2010')
Insert Into @Temp Values('1/11/2010')
Insert Into @Temp Values('1/12/2010')
Insert Into @Temp Values('1/14/2010')
Insert Into @Temp Values('1/15/2010')

Select  A.TheDate
From    @Temp As A
        Left Join @Temp  As B
            On A.TheDate = DateAdd(Day, 1, B.TheDate)
Where   b.TheDate Is NULL

[tt]
2010-01-01
2010-01-07
2010-01-10
2010-01-14[/tt]

Notice that now we are returning the row after the day the gap ends, so let's subtract 1 day, like this:

Code:
Select  [!]DateAdd(Day, -1, [/!]A.TheDate[!})[/!]
From    @Temp As A
        Left Join @Temp  As B
            On A.TheDate = DateAdd(Day, 1, B.TheDate)
Where   b.TheDate Is NULL

[tt]
2009-12-31
2010-01-06
2010-01-09
2010-01-13
[/tt]

Now we will number the rows, like we did in the previous section, like this:

Code:
Select  DateAdd(Day, -1, A.TheDate)[!],
        Row_Number() Over (Order BY A.TheDate)[/!]
From    @Temp As A
        Left Join @Temp  As B
            On A.TheDate = DateAdd(Day, 1, B.TheDate)
Where   b.TheDate Is NULL

[tt]
2009-12-31 1
2010-01-06 2
2010-01-09 3
2010-01-13 4
[/tt]

Notice that there are 4 rows in the output where the first row is before the first day we care about. Thinking about this another way, the numbering here is off by 1. The first gap ends on the second row (2010-01-06), so let's subtract one from the numbering, like this:

Code:
Select  DateAdd(Day, -1, A.TheDate),
        Row_Number() Over (Order BY A.TheDate)[!] -1[/!]
From    @Temp As A
        Left Join @Temp  As B
            On A.TheDate = DateAdd(Day, 1, B.TheDate)
Where   b.TheDate Is NULL

[tt]
2009-12-31 0
2010-01-06 1
2010-01-09 2
2010-01-13 3
[/tt]

3. Combine the data. All we really need to do now is to join the 2 result sets based on the row numbering we came up with, like this:

Code:
Set NOCOUNT ON

Declare @Temp Table(TheDate DateTime)

Insert Into @Temp Values('1/1/2010')
Insert Into @Temp Values('1/2/2010')
Insert Into @Temp Values('1/3/2010')
Insert Into @Temp Values('1/7/2010')
Insert Into @Temp Values('1/10/2010')
Insert Into @Temp Values('1/11/2010')
Insert Into @Temp Values('1/12/2010')
Insert Into @Temp Values('1/14/2010')
Insert Into @Temp Values('1/15/2010')

Select	StartDates.StartDate, EndDates.EndDate
From    (
        Select  DateAdd(Day, 1, A.TheDate) As StartDate,
                Row_Number() Over (Order By A.TheDate) As RowId
        From    @Temp As A
                Left Join @Temp  As B
                    On A.TheDate = DateAdd(Day, -1, B.TheDate)
        Where   b.TheDate Is NULL
        ) As StartDates
		Inner Join (
        Select  DateAdd(Day, -1, A.TheDate) As EndDate,
                Row_Number() Over (Order BY A.TheDate) -1 As RowId
        From    @Temp As A
                Left Join @Temp  As B
                    On A.TheDate = DateAdd(Day, 1, B.TheDate)
        Where   b.TheDate Is NULL
        ) As EndDates
		On StartDates.RowId = EndDates.RowId

Note that on list had row numbers 1 - 4 and the other had row numbers 0 - 3. By inner joining the 2 lists, the 0 row and the 4 row were excluded, leaving just 1 - 3.

Hopefully, I have explained this well enough to make sense, but if there's any part that you are confused about, please let me know.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top