13 Jun 12 11:14
How do you handle temporal data?
I have requirements that suggest we need to know not just what the value of an attribute is today, but what that value was at any point in time. For example, if you get married this June 10th, queries would show your last name as it was BEFORE you were married if they are run for any time before June 10th, and the queries would show your current last name for queries run any time ON OR AFTER June 10th. Of course, I can compare the dates in a BETWEEN clause, but how should the data be stored?
A coworker has proposed that each row have a BeginDate and an EndDate and if any attribute changes on that row, a new row be created with a new BeginDate and EndDate. They have also proposed the use of a "MagicDate" of 12/31/9999 for any records that are current. This seems woefully problematic to me, and goes against everything I hold true. For example, if you query how long ago was the end date for the records, you would end up with a result for some that says -7987 years ago--THAT clearly doesn't make any sense. I toyed with using a NULL for the end date for current records and using COALESCE in the BETWEEN clauses, but that requires developers to have an in-depth knowledge of the structure of the data to do queries (shouldn't they anyways?).
Another coworker suggested that we have a "CurrentPerson" as well as a "HistoricalPerson" table where CurrentPerson would only have a BeginDate and HistoricalPerson would have both a BeginDate and an EndDate. While I see the drawbacks of this solution, I find that in a way it better models reality because a CurrentPerson record doesn't have an EndDate, do they?
I'd LOVE to see any examples of how you guys handle this sort of thing, and I'd LOVE to hear your thoughts on this.
I've been reading Snodgrass, (http://www.cs.arizona.edu/people/rts/tdbbook.pdf
) but it's slow reading, and I am not sure I agree with it.
Why do today
that which may not need to be done tomorrow