Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Retrieve current row values when next row meets specified criteria 2

Status
Not open for further replies.

waubain

Technical User
Dec 13, 2011
200
US
I am using SQL Server 2012
I have data that looks like this ORDER BY ID, LabDate DESC
Code:
ID	LabDate		LabResult
1	2012-05-18	11.2
1	2011-02-03       8.3
1	2010-10-01	 7.5
1	2009-03-04	 8.6
2	2013-06-12	10.5
2	2011-02-03       7.5
2	2009-03-04	 8.9
3	2014-01-31       9.5
3	2009-03-04	 8.9
4	2008-11-15	 9.8
I need to move back through time looking for the next row where LabResult < 8. If found I need to retrieve the current row values for ID, LabDate and LabResult. If the there is only one value then or there are no values then retrieve the earliest or only value

For the above data the resultant set would be 2 rows
Code:
ID	LabDate		LabResult
1	2011-02-03       8.3
2	2013-06-12	10.5
3	2009-03-04       8.9
4	2008-11-15	 9.8

I read about LAG and LEAD and self-join to try to solve this problem, but I cannot retrieve just one row per ID. I added a ROW_NUMBER column, but not sure what to do with it.
I really don't know where to start so, any help would be appreciated.
Thank you.

You don't know what you don't know...
 
Code:
SELECT Test1.*
FROM (SELECT *, ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Id, LabDate DESC) AS RowN
             FROM YourTable) Test1
LEFT JOIN (SELECT *, ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Id, LabDate DESC) AS RowN
             FROM YourTable) Test2 ON Test1.Id = Test2.Id AND Test1.RowN = Test2.RowN-1
WHERE (Test2.Id IS NULL AND test1.id = 1)
   OR Test2.LabResult < 8

NOT TESTED !!!

Borislav Borissov
VFP9 SP2, SQL Server
 
Here tested version :)
Code:
--- Preparing test data, you don't need this
DECLARE @test TABLE (Id int, LabDate date, LabResult Numeric(5,2))
INSERT INTO @test VALUES (1, '20120518', 11.2)
INSERT INTO @test VALUES (1, '20110203', 8.3)
INSERT INTO @test VALUES (1, '20101001', 7.5)
INSERT INTO @test VALUES (1, '20090304', 8.6)
INSERT INTO @test VALUES (2, '20130612', 10.5)
INSERT INTO @test VALUES (2, '20110203', 7.5)
INSERT INTO @test VALUES (2, '20090304', 8.9)
INSERT INTO @test VALUES (3, '20140131', 9.5)
INSERT INTO @test VALUES (3, '20090304', 8.9)
INSERT INTO @test VALUES (4, '20081115', 9.8)
--- End 

;WITH Cte_Test (Id, LabDate, LabResult)
AS
(
        SELECT test1.Id,
               test1.LabDate,
               test1.LabResult
        FROM (SELECT *, ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Id, LabDate DESC) AS RowN
                     FROM @test ) Test1

        LEFT JOIN (SELECT *, ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Id, LabDate DESC) AS RowN
                     FROM @test ) Test2 ON Test1.Id = Test2.Id AND Test1.RowN = Test2.RowN-1 
        WHERE (Test2.Id IS NULL AND test1.RowN = 1)
           OR Test2.LabResult < 8
)



SELECT * FROM Cte_test
UNION ALL
SELECT tst.Id,
       tst.LabDate,
       tst.LabResult
FROM (SELECT test.*, ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Id, LabDate) AS RowN
             FROM @test  Test
      WHERE NOT EXISTS(SELECT Id FROM Cte_Test WHERE Id = Test.Id)) tst
WHERE RowN = 1
ORDER BY Id, LabDate

Borislav Borissov
VFP9 SP2, SQL Server
 
Borislav,
I had begun to work on your untested version and noticed a few things and had to quit. I see that you fixed them in your tested version. I was reading about self-joins but I am too new to sql to get it correct. Thank you for the solution.

You don't know what you don't know...
 
Borislav,
This worked perfectly with the scenarios I provided, but I scanned the >100,000 rows for different variations and found a few iterations that don't work. I added Id 5 and 6 to your you INSERT statement.

SQL:
-- Enter test data
DECLARE @test TABLE (Id int, LabDate date, LabResult Numeric(5,2))
INSERT INTO @test VALUES (1, '20120518', 11.2)
			,(1, '20110203', 8.3)
			,(1, '20101001', 7.5)
			,(1, '20090304', 8.6)
			,(2, '20130612', 10.5)
			,(2, '20110203', 7.5)
			,(2, '20090304', 8.9)
			,(3, '20140131', 9.5)
			,(3, '20090304', 8.9)
			,(4, '20081115', 9.8)
			,(5, '20110203', 7.8)
			,(6, '20111231', 7.2)
			,(6, '20101001', 8.2)
			,(6, '20090222', 6.9)
--- End

For Id #5 and #6 if the first value is <8 then the LabDate and LabResult need to return NULL. I tried to add another table element and use a CASE WHERE > 8, but was only handled Id #5.

You don't know what you don't know...
 
Don't get it.
What you want as a result from this data?

Borislav Borissov
VFP9 SP2, SQL Server
 
I am sorry, this is confusing. Having a LabResult > 8 is unhealthy. The researcher wants to know how far back in time LabResult was > 8. If it was not > 8 at the first date then that subject would be excluded and anything before that is unimportant. The comparator index date to this LabDate is done somewhere else in another CTE.

Although maybe not the best solution this was my beginner solution. I just filtered out the persons with a LabResult of <8 at RowID = 1 prior to running your code. In your code I changed @test to getgroup and it worked.

SQL:
;WITH getgroup --(ScrSSN, LabDate, LabResult)
AS
(
SELECT *
FROM
(
SELECT	SCrSSN
		,LabResult
		,LabDate
		,ROW_NUMBER() OVER(PARTITION BY ScrSSN ORDER BY ScrSSN, LabDate DESC) AS RowID
		FROM @test
		) AS G1
WHERE RowID = 1 AND LabResult >=8

Since I am just learning I am open to a better method. I am still trying to move through and understand your code to see if I could have filtered out these out without adding another level of CTE. Maybe one more level of nesting. This is just one piece of a very long CTE that pulls in a large dataset for a research project.

Thank you for helping.

You don't know what you don't know...
 
My filter did not work like I thought, so I am back to figuring out how to eliminate anyone with a first value at RowN=1 when LabResult <8.

You don't know what you don't know...
 
RowID is the row number (ROW_NUMBER() .. as RowID), not the ID, so when you filter for RowID=1 you get the rows with the highest Labresult for each ID.

You are seeking the row before the first row with a Labresult <8, that can be any rownumber, not always 1. There is no particular order in the labresults, as you order by date, so you can't tell what rownumber you look for and that way to filter is not leading you to the result.

You correctly pointed to LAG and LEAD, why not learn about them?

Code:
--drop table #temp
create table #temp(id int, labdate date, labresult float)
insert into #temp values (1,'2012-05-18',11.2)
insert into #temp values (1,'2011-02-03', 8.3)
insert into #temp values (1,'2010-10-01', 7.5)
insert into #temp values (1,'2009-03-04', 8.6)
insert into #temp values (2,'2013-06-12',10.5)
insert into #temp values (2,'2011-02-03', 7.5)
insert into #temp values (2,'2009-03-04', 8.9)
insert into #temp values (3,'2014-01-31', 9.5)
insert into #temp values (3,'2009-03-04', 8.9)
insert into #temp values (4,'2008-11-15', 9.8)

;With cte as
(
Select ID, LabDate, LabResult, LEAD(LabResult) OVER (PARTITION BY ID ORDER BY LabDate DESC) as PreviousLabResult From #temp
)
Select ID, LabDate, LabResult from cte Where PreviousLabResult<8

Now the tricky part are the groups (IDs) with no such LEAD value. if there is no next row LEAD will result in NULL, so using ISNULL(PreviousLabResult,0)<8 will include the last rows for ID 3 and 4, but also for IDs 1 and 2. We need to get a bit more complicated here, getting a TOP 1 record per ID:
Code:
drop table #temp
create table #temp(id int, labdate date, labresult float)
insert into #temp values (1,'2012-05-18',11.2)
insert into #temp values (1,'2011-02-03', 8.3)
insert into #temp values (1,'2010-10-01', 7.5)
insert into #temp values (1,'2009-03-04', 8.6)
insert into #temp values (2,'2013-06-12',10.5)
insert into #temp values (2,'2011-02-03', 7.5)
insert into #temp values (2,'2009-03-04', 8.9)
insert into #temp values (3,'2014-01-31', 9.5)
insert into #temp values (3,'2009-03-04', 8.9)
insert into #temp values (4,'2008-11-15', 9.8)

;With previouslabresults as
(
Select ID, LabDate, LabResult, LEAD(LabResult) OVER (PARTITION BY ID ORDER BY LabDate DESC) as PreviousLabResult 
  From #temp
), 
nextresult as
(
Select *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LabDate DESC) as Rownumber From previouslabresults
 Where ISNULL(PreviousLabResult,0)<8
)

Select ID, LabDate, LabResult, PreviousLabResult from nextresult Where Rownumber=1
LEAD looks a bit confusing, as it gives the next record result, but as you order descending by date, LEAD is giving the previous lab result in date order, not the next result. The second cte query "nextresult" on the other hand is that next result after the previous result. Last not least we arrive at retrieving all Rownumber 1 records, but this time this has another reasoning, as the nextresult cte would give too much records, if not limited to the top 1 record for each ID.

By the way: If you partition the data, each partition has a constant value for the fields defining a partition, so the partition doesn't need to include the field defining the partition in its order by clause, that means the ORDER BY LabDate Desc is sufficient.

Bye, Olaf.
 
Olaf,
Thank you for an alternative approach. I did try LAG and LEAD but it was unclear how to compare rows. I think between your example and Borislav's that is a little clearer.

Unfortunately the first mock data did not completely represent the data. The researcher now states that if the most current LabDate (DESC Row 1) has a LabResult is < 8 then drop the case. None of mock data started with a value < 8. In a subsequent post I added these values below where both ID5 and ID6 start with < 8 and thus should not belong in the final resultant dataset. ID#6 is trickly because it starts with <8 then goes to >=8 and then back <8 so both Borislav and your code returns the 8.2 which is not what the researcher wants. This was my fault for not seeing this possible iteration the first time around.

SQL:
--drop table #temp
create table #temp(id int, labdate date, labresult float)
insert into #temp values (1,'2012-05-18',11.2)
                        ,(1, '2011-02-03', 8.3)
			,(1, '2010-10-01', 7.5)
			,(1, '2009-03-04', 8.6)
			,(2, '2013-06-12', 10.5)
			,(2, '2011-02-03', 7.5)
			,(2, '2009-03-04', 8.9)
			,(3, '2014-01-31', 9.5)
			,(3, '2009-03-04', 8.9)
			,(4, '2008-11-15', 9.8)
			,(5, '2011-02-03', 7.8)
			,(6, '2011-12-31', 7.2)
			,(6, '2010-10-01', 8.2)
			,(6, '2009-02-22', 6.9)

I was able to query them out before running your code by creating a new set of ID's with only 1-4 and then JOINing them back to the original lab data. This took three queries.
I would be open to a more efficient method either before or after your code. My limited knowledge would be to do it at the beginning, since at the end, you don't know that ID#6 is incorrect. The lab dataset has 250,000 rows. In my example the final dataset should be:

SQL:
ID	LabDate	        LabResult	
1	2011-02-03	8.3	
2	2013-06-12	10.5	
3	2009-03-04	8.9	
4	2008-11-15	9.8

Thank you again.


You don't know what you don't know...
 
This is my final code

Code:
drop table #temp
create table #temp(id int, labdate date, labresult float)
insert into #temp values (1,'2012-05-18',11.2)
			,(1, '2011-02-03', 8.3)
			,(1, '2010-10-01', 7.5)
			,(1, '2010-09-01', 7.2)
			,(1, '2009-03-04', 8.6)		
			,(2, '2013-06-12', 10.5)
			,(2, '2011-02-03', 7.5)
			,(2, '2009-03-04', 8.9)
			,(3, '2014-01-31', 9.5)
			,(3, '2009-03-04', 8.9)
			,(4, '2008-11-15', 9.8)
			,(5, '2011-02-03', 7.8)
			,(6, '2011-12-31', 7.2)
			,(6, '2010-10-01', 8.2)
			,(6, '2009-02-22', 6.9)

;With newlabid
AS
(
SELECT ID
FROM
(
	Select ID, LabDate, LabResult, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LabDate DESC) as Rownumber 
	From #temp) AS s1
WHERE S1.Rownumber = 1 and LabResult >=8
)
,
newdata AS
(
SELECT n.ID, LabDate, LabResult 
FROM newlabid as n
	INNER JOIN #temp		ON n.ID = #temp.ID
)
,
previouslabresults as
(
Select ID, LabDate, LabResult, LEAD(LabResult) OVER (PARTITION BY ID ORDER BY LabDate DESC) as PreviousLabResult 
  From newdata
), 
nextresult as
(
Select *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LabDate DESC) as Rownumber 
From previouslabresults
Where ISNULL(PreviousLabResult,0)<8
)

Select ID, LabDate, LabResult from nextresult Where Rownumber=1

You don't know what you don't know...
 
Waubains soution works. I don't have the time to find out what's working better, but how about just adding the rownumber into the "previouslabresults", then later check for the LabResult>8 in Row 1 by a self join of that:

Code:
drop table #temp
create table #temp(id int, labdate date, labresult float)
insert into #temp values (1,'2012-05-18',11.2)
			,(1, '2011-02-03', 8.3)
			,(1, '2010-10-01', 7.5)
			,(1, '2010-09-01', 7.2)
			,(1, '2009-03-04', 8.6)		
			,(2, '2013-06-12', 10.5)
			,(2, '2011-02-03', 7.5)
			,(2, '2009-03-04', 8.9)
			,(3, '2014-01-31', 9.5)
			,(3, '2009-03-04', 8.9)
			,(4, '2008-11-15', 9.8)
			,(5, '2011-02-03', 7.8)
			,(6, '2011-12-31', 7.2)
			,(6, '2010-10-01', 8.2)
			,(6, '2009-02-22', 6.9)

;With previouslabresults as
(
 Select *, LEAD(LabResult) OVER (PARTITION BY ID ORDER BY LabDate DESC) as PreviousLabResult,
    ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LabDate DESC) as Rownumber
    From #temp
), 
nextresult as
(
Select t1.*, ROW_NUMBER() OVER (PARTITION BY t1.ID ORDER BY t1.LabDate DESC) as Rownumber2 From previouslabresults t1
 inner join previouslabresults t2 on t2.Rownumber=1 and t2.LabResult>=8 and t2.ID = t1.ID 
 Where ISNULL(t1.PreviousLabResult,0)<8 
)

Select ID, LabDate, LabResult, PreviousLabResult from nextresult Where Rownumber2=1

Bye, Olaf.
 
Olaf, I appreciate the time you and Borislav have taken already to help. I like your solution although I have never used the ON statement in that fashion. I was reading a post a few days ago where the author was describing the versatility of the ON statement. I have only used it to join the primary and foreign keys (t2.ID = t1.ID). I appreciate everyones help and examples.

You don't know what you don't know...
 
I just realize now, you were asking in the first place and posted your final code as roundup. :)

Indeed, the ON clause can do more than FK/PK matches. In case of an inner join you can also move all conditions into the where clause, it's more important in outer join cases, like a left or right outer join, as a condition on the joined table put in where makes it non optional condition of a joined record, thus a record must exist fullfilling both the join and the where condition in that case.

I think it's a good practice to move any conditions for the join in the ON condition, in this case I do this self join to filter the original "previouslabresults" by joining each record with the first record of the same partition, where that latest lab result is >=8 per your final condition, therefore I put all these conditions as join conditions. I don't include any t2 fields into the resuilt, I merely join to filter the data. Also it's an inner join, where it won't matter, if conditions are put to the join or the where clause, but putting these conditions to the join makes sure I only join one record per partition (per ID value), and only, if it fulfills the condition.

You are doing that as an extra step in your "newlabid" CTE, and you use an inline CTE (s1), so actually your additional CTE is two in 1.

Anyway, I don't have the amount of data to test what really is performing better, if you can test both and report back the timings, that would be nice to know.

Bye, Olaf.
 
Nice explanation Olaf.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top