Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

simplify partition query 2

Status
Not open for further replies.

peac3

Technical User
Jan 17, 2009
226
AU
Hi all,

I have historical data as per below:
Code:
Date	      Name
20160201	A
20160301	A
20160501	A
20160801	B
20161201	B

I would like to pick up the latest name and since when the latest name changed.
so the expected result would be

Date Name
20160801 B

Can I please have the simplest codes to get the expected result?

Thank you all,
 
Since you want to know when the name changes, the only way that I can think of it so match each row with it's subsequent row. To accomplish this, I would assign a row number to each row, ordered by date so that your data would look like this.

Code:
Date	      Name    RowId
20160201	A       1
20160301	A       2
20160501	A       3
20160801	B       4
20161201	B       5

I'm going to show some code. Please bear in mind that I am create a table variable so that I can dummy up your data. Obviously, in your actual solution, you wouldn't want to use the table variable.

Code:
Declare @Temp Table(Date Int, Name VarChar(10));

Insert Into @Temp Values(20160201,'A')
Insert Into @Temp Values(20160301,'A')
Insert Into @Temp Values(20160501,'A')
Insert Into @Temp Values(20160801,'B')
Insert Into @Temp Values(20161201,'B')

Select  *,
        Row_Number() Over (Order By Date) As RowId
From    @Temp

When you run the code above, you will see that it generates the data with an incrementing RowId. This is important because now we can join the table back to itself based on the row id, like this....

Code:
Declare @Temp Table(Date Int, Name VarChar(10));

Insert Into @Temp Values(20160201,'A')
Insert Into @Temp Values(20160301,'A')
Insert Into @Temp Values(20160501,'A')
Insert Into @Temp Values(20160801,'B')
Insert Into @Temp Values(20161201,'B')

; With Data As
(
  Select *,
         Row_Number() Over (Order By Date) As RowId
  From   @Temp 
)
Select *
From   Data As Before
       Inner Join Data As After
         On Before.RowId = After.RowId - 1

When you run the code above, you'll get results like this...

Code:
Date        Name       RowId                Date        Name       RowId
----------- ---------- -------------------- ----------- ---------- --------------------
20160201    A          1                    20160301    A          2
20160301    A          2                    20160501    A          3
20160501    A          3                    20160801    B          4
20160801    B          4                    20161201    B          5

Note that the first 3 columns are from the "before" table and the last 3 rows are from the "After" table.

The data you want would be from the 3rd row where a.name <> b.name. Specifically, you want the date and name from the after table.

Now all we have to do is to add a where clause and restrict the columns returned. Like this:

Code:
Declare @Temp Table(Date Int, Name VarChar(10));

Insert Into @Temp Values(20160201,'A')
Insert Into @Temp Values(20160301,'A')
Insert Into @Temp Values(20160501,'A')
Insert Into @Temp Values(20160801,'B')
Insert Into @Temp Values(20161201,'B')

; With Data As
(
  Select *,
         Row_Number() Over (Order By Date) As RowId
  From   @Temp 
)
Select After.Date,
       After.Name
From   Data As Before
       Inner Join Data As After
         On Before.RowId = After.RowId - 1
Where  Before.Name <> After.Name

When you run the code above, you'll see that you get exactly what you asked for in your original question.

Please note that this code will only work if the name only changes 1 time. If the name changes multiple times, you will get the date and new name for each time the name is changed. If you want to return only the last time the name is changed, then you could do this.

Code:
Declare @Temp Table(Date Int, Name VarChar(10));

Insert Into @Temp Values(20160201,'A')
Insert Into @Temp Values(20160301,'A')
Insert Into @Temp Values(20160501,'A')
Insert Into @Temp Values(20160801,'B')
Insert Into @Temp Values(20161201,'B')
Insert Into @Temp Values(20161202,'C')
Insert Into @Temp Values(20170101,'C')

; With Data As
(
  Select *,
         Row_Number() Over (Order By Date) As RowId
  From   @Temp 
)
Select [!]Top 1 [/!]After.Date,
       After.Name
From   Data As Before
       Inner Join Data As After
         On Before.RowId = After.RowId - 1
Where  Before.Name <> After.Name
[!]Order By Date DESC[/!]


-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
As this is some type of history of data table, I wonder if you only present part of the data, because it makes we wonder why the history is so detailed, if the name only changed once. I assume other things changed in the records, othe things you don't show. Therefore the concept is not showing.

If all these dates are seen as valid from dates you want to know the eaerliest validfrom date for the latest valid name B. This is a two step thing, first determining latest date and its name as latest name, then determine earliest date for that latest name. And that can be done much simpler:

Code:
with Latest as
(Select Name From History
 Where Date = (Select MAX(Date) From History)
)

Select Min(Date) as Firstdate, Min(Latest.Name) as Latestname 
From History
inner join Latest on History.Name=Latest.Name

Bye, Olaf.
 
With sample data:
Code:
Declare @History Table(Date Int, Name VarChar(10));

Insert Into @History Values(20160201,'A'), (20160301,'A'),(20160501,'A'),(20160801,'B'),(20161201,'B'),(20161202,'C'),(20170101,'C');

with Latest as
(Select Name From @History
 Where Date = (Select MAX(Date) From @History)
)

Select Min(Date) as Firstdate, Min(Latest.Name) as Latestname 
From @History
inner join Latest on @History.Name=Latest.Name;

The only problem with that is a name changes back to something it was earlier, eg name changes from A to B then A again, you get the earliest first A valid from date. To solve that, data from history would need to be limited to the latest two different name values at max. This plays no major role, it'll only become important, once you have such a situation, and it will become a very likely problem, if you watch over the history of a boolean (bit) column, which switches more than once.

Another totally independent solution would of course be to store histories of single columns, so your name history only had 2 records, the first A and the first B record.

Edit: An idea I just had: One major information about which columns changed in an update is within COLUMNS_UPDATED(). If you store that value in each history record, you can later limit data to be with a certain bit of the column of interest set, which would give you a pure history of these column changes only. That's the best way of determining single column history without needing ot have a history table for any single column. That won't help right now, unless you find that info in your current history table. It has to be changed wherever you now create such history data, eg in triggers.

Bye, Olaf.
 
Thank you all for the helps and supports...

However, one more request if I may, from the raw data as per below:

Code:
Date	      Name
20160201	A
20160301	A
20160501	A
20160801	B
20161201	B

we would like to extend the functional by having the historical data based on the name changed by generating expiry date by the next record minus 1 day then the last record would be 99991231.
so the result as per below:

Code:
Date	   Expiry_Date   Name  
20160201    20160731       A
20160801    99991231	   B

Please let me know how to achieve this.
Thank you in advance,
 
Continuing on from my previous example....

Code:
Declare @Temp Table(Date int, Name VarChar(10));

Insert Into @Temp Values(20160201,'A')
Insert Into @Temp Values(20160301,'A')
Insert Into @Temp Values(20160501,'A')
Insert Into @Temp Values(20160801,'B')
Insert Into @Temp Values(20161201,'B')

; With Data As
(
  Select *,
         Row_Number() Over (Partition By Name Order By Date) As RowId
  From   @Temp
), Rows As
(
  Select Date, Name, Row_Number() Over (Order By Date) As RowId
  From   Data
  Where  RowId = 1
)
Select A.Date,
       Coalesce(DateAdd(Day, -1, Convert(VarChar(10), B.Date)), '99991231') As Expiry_Date, 
       A.Name
From   Rows As A
       Left Join Rows As B
         On A.RowId = B.RowId - 1

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Sorry, I didn't catch you having a follow up question. You already have a good answer, of course. My turn on this would need 2012 or later, well, that's almost 5 years old legacy SQL :)

Code:
Declare @History Table(Date date, Name VarChar(10));

Insert Into @History Values('20160201','A'), ('20160301','A'), ('20160501','A'), ('20160801','B'), ('20161201','B');

With Fromdates as
( Select Min(Date) as ValidFrom, Name 
  From @History 
  Group by Name)

Select *, 
   Coalesce( 
      Lead(DateAdd(Day, -1, Fromdates.ValidFrom)) over (order by ValidFrom),
	  '99991231'
	       ) As ValidTo
from Fromdates
historspansdate_tv9ake.png


First part of it is much easier, you could also compute the end date client side without T-SQL at all, just taking the simple Min(Date), Name query.

In terms of datetime I would actually not subtract a day, the point in time of the expiration is exactly the same datetime as the next validfrom, even if you are pedantic about those being two different terms on the validity, two time spans are divided at one point in time.

So in datetime nomenclature:
Code:
Declare @History Table(Date date[highlight #FCE94F]time[/highlight], Name VarChar(10));

Insert Into @History Values('20160201','A'), ('20160301','A'), ('20160501','A'), ('20160801','B'), ('20161201','B');

With Fromdatetimes as
( Select Min(Date) as ValidFrom, Name 
  From @History 
  Group by Name)

Select *, 
   Coalesce( 
      Lead(Fromdatetimes.ValidFrom) over (order by ValidFrom),
	  '99991231'
	       ) As ValidTo 
from Fromdatetimes
historspansdatetime_tyepqj.png


The difference is in the point of view over the different date types, a date (precision day) stands for the whole day, so validto date is 1 less than the next validfrom, while looking at datetimes the point in time of the change is both the end of the previous time span and the begin of the next span.

Bye, Olaf.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top