Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SQL Puzzle #15: Continuity 2

Status
Not open for further replies.

vongrunt

Programmer
Mar 8, 2004
4,863
HR
Suppose there are chronologically sorted records in a table:
Code:
create table sampleData( dateVal smalldatetime primary key, someColumn varchar(32) )

insert into sampleData (dateVal) 
select '19990218' union all
select '19990301' union all
select '19990302' union all
select '19990307' union all
select '19990315' union all
select '19990323' union all
select '19990330' union all
select '19990401' union all
select '19990408'

update sampleData
set someColumn = 'blah ' + convert(varchar(10), dateVal, 101)
Here dateVal column dictates chronology. Objective: for any specified N (integer) display all from-to groups of records that are more than N days apart. For N=7 expected results are (dates in US format):
Code:
dateFrom   someColumnFrom    dateTo     someColumnTo
----------.----------------.-----------.----------------
02/18/1999 blah 02/18/1999   02/18/1999 blah 02/18/1999
03/01/1999 blah 03/01/1999   03/07/1999 blah 03/01/1999 
03/15/1999 blah 03/15/1999   03/15/1999 blah 03/15/1999 
03/23/1999 blah 03/23/1999   04/08/1999 blah 04/08/1999

I had one very similar problem a year ago (membership continuity analysis)... needless to say dude who made it before was while-not-eof happy and code speed was somewhere between 'continental drift' and '[snail]'.

Once you get it done, run this code (only once) to get... 4600+ sample rows and try again. Best exec time wins.

Code:
declare @dmin smalldatetime, @delta int
select @dmin = min(dateval), @delta = 8 + datediff(dd, min(dateval), max(dateval)) from sampleData
declare @t table( N int )
insert into @t values (1)
while @@rowcount <256 insert into @t select N+(select count(*) from @t) from @t

insert into sampleData (dateVal, someColumn)
select dateVal + @delta*N, 'blah ' + convert(varchar(10), dateVal + @delta*N, 101)
from sampleData cross join @t

Although this problem is relatively simple in procedural languages, non-SQL folks are also welcomed to participate... say, shortest code wins.

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
Probably a stupid question, but what did I miss to get these results for N = 7?

[tt]dateFrom someColumnFrom dateTo someColumnTo
----------.--------------- . ----------.---------------
1999-02-18 blah 02/18/1999 1999-03-01 blah 03/01/1999
1999-03-07 blah 03/07/1999 1999-03-15 blah 03/15/1999
1999-03-15 blah 03/15/1999 1999-03-23 blah 03/23/1999
1999-03-23 blah 03/23/1999 1999-03-30 blah 03/30/1999
1999-04-01 blah 04/01/1999 1999-04-08 blah 04/08/1999[/tt]

Roy-Vidar
 
Take a look at last five rows:

select '19990315' union all
select '19990323' union all
select '19990330' union all
select '19990401' union all
select '19990408'

03/15 is lone wolf - more than 7 days away from closest dates (03/07 and 03/23). That makes one group (row in final result)
Remaining four rows also make one group because distance is 7, 1 and 7 respectively - all less or equal N=7.

Basically: break group when distance between two subsequent dates is >7, calculate min/max dates per group (dateFrom/dateTo), join someColumn* stuff.



------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
Typo... distance is 7, 2 and 7 respectively... March has 31 days [blush]

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
OK, got

[tt]dateFrom someColumnFrom dateTo someColumnTo
----------.----------------.-----------.---------------
1999-02-18 blah 02/18/1999 1999-02-18 blah 02/18/1999
1999-03-01 blah 03/01/1999 1999-03-07 blah 03/07/1999
1999-03-15 blah 03/15/1999 1999-03-15 blah 03/15/1999
1999-03-23 blah 03/23/1999 1999-04-08 blah 04/08/1999
1999-04-16 blah 04/16/1999 1999-04-16 blah 04/16/1999
1999-04-27 blah 04/27/1999 1999-05-03 blah 05/03/1999
1999-05-11 blah 05/11/1999 1999-05-11 blah 05/11/1999
1999-05-19 blah 05/19/1999 1999-06-04 blah 06/04/1999
1999-06-12 blah 06/12/1999 1999-06-12 blah 06/12/1999
1999-06-23 blah 06/23/1999 1999-06-29 blah 06/29/1999
1999-07-07 blah 07/07/1999 1999-07-07 blah 07/07/1999
1999-07-15 blah 07/15/1999 1999-07-31 blah 07/31/1999
1999-08-08 blah 08/08/1999 1999-08-08 blah 08/08/1999
1999-08-19 blah 08/19/1999 1999-08-25 blah 08/25/1999
1999-09-02 blah 09/02/1999 1999-09-02 blah 09/02/1999
...[/tt]

kludgy code, though, probably breaking all possible rules - four temptables, relying on identity fields ...

Roy-Vidar
 
>... probably breaking all possible rules - four temptables, relying on identity fields

SELECT/INSERT INTO.. table with identity column, yes?

As long as code works I'm not fundamentalist :>. Post it here...

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
Well, I guess one way to try to learn a bit of SQL, is to expose ones approaches to public scrutiny (also the usage of dummy row) [blush]

Code:
[COLOR=white]declare @diff int
set @diff = 7

set nocount on

create table #tmp1 (pk int identity, dateVal smalldatetime, someColumn varchar(32))

insert into #tmp1 
	select dateVal, someColumn from sampleData order by dateval

create table #tmp2 (pk int identity, dateVal smalldatetime, someColumn varchar(32))

insert into #tmp2 (dateVal) 
	values (getdate())
insert into #tmp2 
	select dateVal, someColumn from sampleData order by dateVal

create table #tmp3 (pk int identity, dateFrom smalldatetime, someColumnFrom varchar(32),
	dateTo smalldatetime, someColumnTo varchar(32))

insert into #tmp3 
	select Null, Null, t2.dateVal, t2.someColumn
	from #tmp2 t2 inner join #tmp1 t1 on
		t2.pk = t1.pk and datediff(d, t2.dateVal, t1.dateVal) > @diff

create table #tmp4 (pk int identity, dateVal smalldatetime, someColumn varchar(32))

insert into #tmp4
	select t1.dateVal, t1.someColumn
	from #tmp1 t1 inner join #tmp2 t2 on 
		t1.pk = t2.pk and abs(datediff(d, t2.dateVal, t1.dateval)) > @diff

update t3 set t3.dateFrom = t4.dateval, t3.someColumnFrom = t4.someColumn 
	from #tmp4 t4 inner join #tmp3 t3 on
	t4.pk = t3.pk 

select dateFrom, someColumnFrom, dateTo, someColumnTo from #tmp3

drop table #tmp1
drop table #tmp2
drop table #tmp3
drop table #tmp4[/color]

Roy-Vidar
 
With short sample data and this code I got:

Code:
1999-02-18 00:00:00	blah 02/18/1999	1999-02-18 00:00:00	blah 02/18/1999
1999-03-01 00:00:00	blah 03/01/1999	1999-03-07 00:00:00	blah 03/07/1999
1999-03-15 00:00:00	blah 03/15/1999	1999-03-15 00:00:00	blah 03/15/1999
Looks like last group is missing (?)

Btw. general idea is OK - insert all group starts into identity table, ditto with group ends, join both together on identity - but implementation is kind of clunky [smile]. This can be done with 2 temp tables at worst (hint: left outer join).

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
Apparently I'd tweaked the (lack of) logic a bit more than I thought after inserting more sample rows [blush], this bit below would go just before creating the fourth temptable
Code:
[COLOR=white]insert into #tmp3
	select top 1 Null, Null, dateVal, someColumn
	from sampleData order by dateVal Desc[/color]
since I've started, here's an "at worst attempt", but without outer join
Code:
[COLOR=white]declare @diff int
set @diff = 7

set nocount on
declare @tmpStart table
	(pk int identity, dateVal smalldatetime, someColumn varchar(32))

insert into @tmpStart
	select top 1 dateVal, someColumn from sampleData order by dateVal	
insert into @tmpStart
	select s1.dateVal, s1.someColumn from sampleData s1 
		where datediff(d, (select max(s2.dateVal) from sampleData s2 
			where s2.dateVal < s1.dateVal), s1.dateVal) > @Diff

declare @tmpEnd table (pk int identity, dateVal smalldatetime, someColumn varchar(32))

insert into @tmpEnd 
	select s1.dateVal, s1.someColumn from sampleData s1 
		where datediff(d, s1.dateVal, (select min(s2.dateVal) from sampleData s2 
			where s2.dateVal > s1.dateVal)) > @Diff
insert into @tmpEnd 
	select top 1 dateVal, someColumn from sampleData order by dateVal Desc	

select 
	s1.dateVal as dateFrom, s1.someColumn as someColumnFrom,
	s2.dateVal as dateTo, s2.someColumn as someColumnTo
from
	@tmpStart s1 inner join @tmpEnd  s2 on s1.pk = s2.pk[/color]
aren't anyone else going to try out this one?

Roy-Vidar
 
Here is variation with 3 temp tables:
Code:
[white]declare @diff int; set @diff = 7

select identity(int, 1, 1) as seq, A.* 
into #tmp1
from sampleData A
order by A.dateval

select identity(int, 1, 1) as seq, t1.dateVal as dateFrom, t1.someColumn as someColumnFrom
into #tmp2
from #tmp1 t1
left outer join #tmp1 t2 on	t1.seq = t2.seq+1 
where datediff(d, t2.dateVal, t1.dateVal) > @diff
	or t2.dateVal is null

select identity(int, 1, 1) as seq, t1.dateVal as dateTo, t1.someColumn as someColumnTo
into #tmp3
from #tmp1 t1
left outer join #tmp1 t2 on	t1.seq = t2.seq-1
where datediff(d, t1.dateVal, t2.dateVal) > @diff
	or t2.dateVal is null

select  t2.dateFrom, t2.someColumnFrom, t3.dateTo, t3.someColumnTo
from #tmp2 t2
inner join #tmp3 t3 on t2.seq=t3.seq
order by t2.seq

drop table #tmp1, #tmp2, #tmp3[/white]

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
An access attempt
Code:
[COLOR=white]
I created a query named qrySampleDataRank:
SELECT A.dateVal, A.someColumn, Count(*) AS Rank
FROM sampleData AS A INNER JOIN sampleData AS B ON A.dateVal>=B.dateVal
GROUP BY A.dateVal, A.someColumn;

And now the SQL to get the from-to groups:
PARAMETERS [Enter N ] Integer;
SELECT F.dateFrom, F.someColumnFrom, T.dateTo, T.someColumnTo
FROM (SELECT A.dateVal AS dateFrom, A.someColumn AS someColumnFrom, A.Rank
FROM qrySampleDataRank AS A LEFT JOIN qrySampleDataRank AS B ON A.Rank=B.Rank+1
WHERE A.dateVal-Nz(B.dateVal,#1900-01-01#)>[Enter N ]
) AS F INNER JOIN (SELECT A.dateVal AS dateTo, A.someColumn AS someColumnTo, A.Rank
FROM qrySampleDataRank AS A LEFT JOIN qrySampleDataRank AS B ON A.Rank=B.Rank-1
WHERE Nz(B.dateVal,#2049-12-31#)-A.dateVal>[Enter N ]
) AS T ON F.Rank <= T.Rank
WHERE T.Rank=(SELECT Min(Rank) FROM (
SELECT A.dateVal AS dateTo, A.someColumn AS someColumnTo, A.Rank
FROM qrySampleDataRank AS A LEFT JOIN qrySampleDataRank AS B ON A.Rank=B.Rank-1
WHERE Nz(B.dateVal,#2049-12-31#)-A.dateVal>[Enter N ]
) AS X WHERE Rank>=F.Rank);
[/color]

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Here is SQL2005 implementation. Thanks to ranking function and the fact vanilla CTEs (Common Table Expressions) can be referenced more than once, no temp tables are needed - and no dummy/sentinel values:
Code:
[white]declare @N int; set @N = 7;

with Dates( seq, dateVal, someColumn )
as
(	select row_number() over (order by dateVal) as seq, dateVal, someColumn
	from sampleData
)
select L.dateFrom, L.someColumnFrom, H.dateTo, H.someColumnTo
from 
(	select row_number() over (order by A.dateVal) as seqFrom, A.dateVal as dateFrom, A.someColumn as someColumnFrom
	from Dates A
	left outer join Dates B on A.seq = B.seq+1
	where ( datediff(dd, B.dateVal, A.dateVal) > @N or B.dateVal is null )
) L
inner join 
(	select row_number() over (order by A.dateval) as seqTo, A.dateVal as dateTo, A.someColumn as someColumnTo
	from Dates A
	left outer join Dates B on A.seq = B.seq-1
	where ( datediff(dd, A.dateVal, B.dateVal) > @N or B.dateVal is null )
) H
on L.seqFrom = H.seqTo
order by L.seqFrom[/white]
This one is interesting: run WITH DATES()... SELECT * FROM DATES to see how rows are prepared with full outer join. Results are OK but kind of "dislocated". Flipped column names in CTE interface and inner join do the trick though:
Code:
[white]declare @N int; set @N = 7;

with Dates (seq, dateTo, someColumnTo, dateFrom, someColumnFrom )
as
(	select row_number() over (order by A.dateVal), A.dateVal, A.someColumn, B.dateVal, B.someColumn
	from 
	(	select row_number() over (order by dateVal) as seq, dateVal, someColumn
		from sampleData
	) A
	full outer join 
	(	select row_number() over (order by dateVal) as seq, dateVal, someColumn
		from sampleData
	) B
	on A.seq = B.seq-1
	where isnull(datediff(dd, A.dateVal, B.dateVal), @N+1) > @N
)
select L.dateFrom, L.someColumnFrom, H.dateTo, H.someColumnTo
from Dates L
inner join Dates H on L.seq=H.seq-1
order by L.seq[/white]

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
Stars for both...

Here are results (SQL2000)
Code:
Code              CPU Reads  Duration (s)
-----------------------------------------
RoyVidar #1       172 22271         3.203
RoyVidar #2        94 23375         0.123
Me, 3 temp tables  94  1481         0.113
Both SQL2005 codes were class for themselves - 89ms (0.098s) best time, a bit less logical reads (>30%) than best result so far.

I tried PHV's code in Access (Office 2002) - qrySampleDataRank itself ran 19 seconds, everything together... ack. Same code upsized to M$SQL (query->view, NZ()->ISNULL(), modified date math) didn't scale well - I cancelled query after five minutes... three times. Maybe I did something wrong, so feel free to correct me.

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
My solution is slightly slower than RoyVidar's #2. I chose this recursive approach because there wasn't much left to try, and what do you know, multiple updates are slower (duh). I had a single-query version which worked but took about 100 times as long... laugh.

Code:
[white]SELECT
	FromDt = D.dateVal,
	ToDt = D.dateVal
INTO #t
FROM
	SampleData D
	LEFT JOIN (
		SELECT DISTINCT
			D1.dateVal
		FROM
			SampleData D1
			INNER JOIN SampleData D2 ON D1.dateVal BETWEEN D2.dateVal + 1 AND D2.dateVal + 7
	) X ON D.dateVal = X.dateVal
WHERE
	X.dateVal IS NULL

WHILE @@RowCount > 0
	UPDATE T
	SET ToDt = X.XDt
	FROM
		#t T
		INNER JOIN (
			SELECT
				T.ToDt,
				XDt = Max(D.dateVal)
			FROM
				#t T
				INNER JOIN SampleData D ON D.dateVal BETWEEN T.ToDt + 1 AND T.ToDt + 7
			GROUP BY
				T.ToDt
		) X ON T.ToDt = X.ToDt

SELECT
	X.FromDt,
	SomeColumnFrom = F.someColumn,
	X.ToDt,
	SomeColumnTo = T.someColumn
FROM
	#T X
	INNER JOIN SampleData F ON X.FromDt = F.dateVal
	INNER JOIN SampleData T ON X.ToDt = T.dateVal

DROP TABLE #T[/white]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top