Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Historic Data Model Design question

Status
Not open for further replies.

sonik

Programmer
Aug 3, 2000
11
US
I have an application which has a series of semi static tables that contain reference type data. I need to be able to able to create an historic picture of the tables and relations at any given point in time. One way of doing this is to use effective_from and effective_to columns in each of the tables to capture all historic changes, however this means that everytime a table is joined to another it also needs to restrict on a particular date, making queries sometimes complicated. I was wandering if there was another more elegant solution that people have used in transactional environments.

TIA
Sonik
 
The only solutions I can think of involve a lot of duplication. This might be acceptable if your users didn't need to look at anything in finer detail than values month by month but separate tables for every day of the year would be too much. Or would they? It's only disk space and that's cheap enough.

Can you do any pre-processing? Keep the bulk of the data in its current format but keep (say) 31 sets of daily records so that users could get quick results for the past month?



Geoff Franklin
 
This is a basic design feature of data warehousing slowly changing dimensions. Assign a surrogate (synthetic) key to the data which you use for the join to the fact (transaction data). Also have the natural key and effective and expiration dates in each dimension. This allows you to query by natural key or by the surrogate key, which is the effective key for a particular set of transactions. For more info, take a closer look at Slowly Changing Dimensions.


-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
I considered using data warehousing modelling but this presumes that the facts are changing daily. The changes I have are probably going to be once a month so to hold the snapshot each day would be overkill I think. Here is the data model I am playing with. Tried my best to model!

Grades Default_probabilities
grade_id grade_id
grade_desc default_prob_id
effective_from M------1 default_prob
effective_to effective_from
status effective_to
status

There are other reference tables related to the Default_probabilities table too with the same cardinality but omitted them for clarity. If either the grades or default_probabilities tables are modified there must be a way of knowing what the values where at that point in time. I was concerned that I would need a time dimension table linked to the default_probabilities table and have to insert the default probability every day which is overkill. I think by including the effective_from and effective_to fields in the default_probabilities that should remove my need for the time dimension I think. Sorry for waffling but does this sound reasonable. The contents of the default_probabilities table is going to be very small.

TIA

Sonik
 
DW does not presume that facts change daily or even at all. In this case, the default probabilities would seem to be your fact table and the Grades the Slowly Changing Dimension.

Let's get down to the dirty of dimensional modeling. What are your facts? What are you measuring? Facts are usually numbers. Then, what are your dimensions? Dimensions are qualitative variables across which business measures are compared. If you talk about something by year (time) by region (geography) by product, etc...anything which comes after the 'by' is a dimension.

Then, which dimensions are slowly changing (effective dates)? Are any dimensions static?

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Unless every time you change the data in a row of data you then write a new row and mark the old row inactive or write the old row to a historical table it might be kind of hard. This causes some serious design issues. Some databases use Multivalue fields, but that would make it non-first normal form and create tables inside of tables. All of which would seriously complicate things.

You can see that if historical data is kept it would cause problems in a table with it growing too large. For instanced if you had a giant catalog that you looked up the parts price in, if you quite often changed the price,then you could write a new record with the price or create a separatec price table with a first date and last date field.

For every benefit you can perceive, there is also a drawback. If you keep making new records the table may get too large and take longer to search. On the other hand if you dont save the historical price then you can not track price over time.

If you do not like my post feel free to point out your opinion or my errors.
 
Johnherman,

It would seem that probabillity of default would be the obvious fact. Thing I should mention is that this is a OLTP not an MIS system used for reporting so I don't want to use this design to run reporting type queries. It is just so that I can get a snapshot of what the data looked like at any point in time. I will not be running queries by the various dimensions and aggregating as the fact is not really additive in this case. Most of the dimension will change infrequently only grades is likely to change on a monthly basis but even then not many changes will be made.
Were you suggesting that I use a time dimension to model the effective_to and effection_from fields? I was hoping to get away from maintaining a time dimension and just using dates for the effective_from and effective_to columns.

ceh4702,
The size of the tables are very small and the number of changes is likely to be small, however the need to captute the state at any point of time is important. I think using the dimensional approach seems like the most flexible design. I don't use multi-valued attributes.
Cheers

Ketan
 
It appears that there are now temporal databases and a query language similar to SQL used to model historical data. However we don't use those DB packages.
 
Personally, I store the data inthe table I want to see it in rather than the code from the semi static table. Yes this is denormalized but it makes querying much more efficient and you don;t ever lose historical data. For instance you have a parts table - keep it up tp dat ewith the current parts and prices. when you create an order record, add the part number and price to it not a key to the lookup table. Then if the part goes away, you still know what the customer ordered and how much they paid for it.

Questions about posting. See faq183-874
Click here to help with Hurricane Relief
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top