Group Consecutive numbers in a range 1

bmacbmac · Dec 13, 2011

Hi,

Have submitted this before, but having and issue and want to re-evaluate my procedure.

I have nearly 200,000 bigint numbers in a table:

2005041330
2005041331
2005041332
2005041333
2005041334
2005041337
2005041338
2005041339
2005041341
2005041342
2005041343
2005041347
2005041350
2005041351
2005041352
2005041353

I am looking for a way I can group this list into ranges... like this:
BegNumber EndNumber
2005041330 2005041334
2005041337 2005041339
2005041341 2005041343
2005041347 2005041347
2005041350 2005041353

In my previous post I gathered the following query:

Code:

insert into #temp2
SELECT MIN(docno) AS range_start, 
       MAX(docno) AS range_end
FROM (
SELECT docno,
      (SELECT MIN(B.docno)
       FROM #temp AS B
       WHERE B.docno >= A.docno
         AND NOT EXISTS
                (SELECT *
                 FROM #temp AS C
                 WHERE C.docno = B.docno + 1)) AS grp
FROM #temp AS A) AS T

GROUP BY grp

It's a very slow process. It seems to work with small lists < 10,000. But if I try running it on my entire 200,000 rows it will never complete even after running overnight.

I had this in SQL 2000 before, but have since upgraded to 2008 R2. Is there a better way to go about this now that I have 2008 R2?

Thanks!

Brian

gmmastros · Dec 13, 2011

This type of "problem" is known as an "island".

I suggest a google search on "SQL Server gaps and islands". You will get a lot of hits with a lot of useful information.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

bmacbmac · Dec 13, 2011

Thanks gmmastros! I ran the search and came up with the following:

Code:

select convert(varchar(10),imin.docno) + ' - ' +  convert(varchar(10),
(select min(docno) from (select docno = docno from #temp where not exists (select * from #temp a2 where #temp.docno-1 = 
a2.docno) and #temp.docno <> (select min(docno) from #temp)) as imax where imax.docno > imin.docno))
from
(select docno = docno from #temp where not exists (select * from #temp a2 where #temp.docno+1 = a2.docno) and #temp.docno <> 
(select max(docno) from #temp)) as imin

This runs MUCH faster. It ran on my test db of 22,000 rows in 13 seconds. My original query took several minutes.

gmmastros · Dec 13, 2011

That's great.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

bmacbmac · Dec 13, 2011

Nevermind.. that didn't work at all.. upon further review of the data it is not giving the correct ranges... back to the drawing board!

gmmastros · Dec 13, 2011

Can you try this?

Code:

Select	MinData.DocNo As BegNumber, MaxData.DocNo As EndNumber
From    (
        Select A.DocNo, Row_Number() Over(Order By A.DocNo) As RowId
        From   #temp As A
               Left Join #temp As B
                 On A.DocNo = B.DocNo - 1
        Where  B.DocNo Is NULL
        ) As MaxData
        Inner Join 
        (
        Select A.DocNo, Row_Number() Over (Order BY A.DocNo) As RowId
        From   #temp As A
               Left Join #temp As B
                 On A.DocNo = B.DocNo + 1
        Where  B.DocNo Is NULL
        ) As MinData
          On MaxData.RowId = MinData.RowId

I'm curious to know if this returns the correct data, and also how the performance compares to your existing method.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

markros · Dec 13, 2011

Try

Code:

DECLARE @t TABLE (num INT)
INSERT INTO @t 
SELECT
2005041330
UNION ALL SELECT
2005041331
UNION ALL SELECT
2005041332
UNION ALL SELECT
2005041333
UNION ALL SELECT
2005041334
UNION ALL SELECT
2005041337
UNION ALL SELECT
2005041338
UNION ALL SELECT
2005041339
UNION ALL SELECT
2005041341
UNION ALL SELECT
2005041342
UNION ALL SELECT
2005041343
UNION ALL SELECT
2005041347
UNION ALL SELECT
2005041350
UNION ALL SELECT
2005041351
UNION ALL SELECT
2005041352
UNION ALL SELECT
2005041353

;WITH cte AS (SELECT Num, ROW_NUMBER() OVER (ORDER BY num) AS Rn
FROM @t),
cte2 AS (SELECT Num, ROW_NUMBER() OVER (ORDER BY num) - DENSE_RANK() OVER (PARTITION BY Num-Rn ORDER BY Num) AS grp
FROM cte)

SELECT MIN(num) AS StartNum, MAX(num) AS EndNum FROM cte2 
GROUP BY grp

PluralSight Learning Library

markros · Dec 13, 2011

Same idea, but simpler solution (I almost went to bed, but had to go back to post it):

Code:

;WITH cte AS (SELECT Num, ROW_NUMBER() OVER (ORDER BY num) AS Rn
FROM @t)
SELECT MIN(num) AS StartNum, MAX(num) AS EndNum FROM cte
GROUP BY Num - Rn

PluralSight Learning Library

bmacbmac · Dec 14, 2011

Thanks for the replies everyone. The two options by markros and the one by gmmastros both worked perfectly.

My original script took about 7 minutes to run against 25,000 rows. Both of your scripts ran in milliseconds.

I ran all three scripts against my 200,000 rows and they all came up with the same results in less than a second!

markros · Dec 15, 2011

It is actually turned out to be a bit of revelation to me. I even decided to write a new blog on this topic - I hope to do this on Sunday.

PluralSight Learning Library

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Group Consecutive numbers in a range 1

bmacbmac

IS-IT--Management

gmmastros

Programmer

bmacbmac

IS-IT--Management

gmmastros

Programmer

bmacbmac

IS-IT--Management

gmmastros

Programmer

markros

Programmer

markros

Programmer

bmacbmac

IS-IT--Management

markros

Programmer

Similar threads

Part and Inventory Search

Sponsor