Hi all,
I'm about to design a Time/Date Dimension (on Day level) for a customer and I'm convinced of doing it the Kimball way is as close as perfect you could get (ie creating a spreadsheet as source). Roughly. I'm not convinced that the use of a surrogate key is the best choice. In Kimball's "The Data Warehouse Toolkit 2ed" he suggests that a surrogate key is preferred.
One reason is size. But a smalldatetime (SQLServer 2k) and an Integer are both 4 bytes. Doesn't that mean that performance is the same?
Another reason is how to handle Unknowns. With Surrogate keys you'd have the value -1 for unknowns but why not have the value '1900-01-01' for unknowns and '1995-01-01' for "Early dates", etc? You'd still have the recommended DATE_TYPE {'Normal','NA',..} field to separate the unknowns from the normal dates.
A reason to use the smalldatetime is that it's much easier as a developer to deal with in the development process (where you don't have to write joins all the time) and, in the end, saves time.
Another benefit of using a smalldatetime instead of a surrogate key is that you don't have to worry about getting the keys in the correct order.
Now my question. Can you come up with any other reason why I should use the surrogate key on my Time Dimension? (I've already searched the forum for answers on this topic.)
Note, I'm all for using Surrogate keys in all the other kinds of dimensions.
Thanks
I'm about to design a Time/Date Dimension (on Day level) for a customer and I'm convinced of doing it the Kimball way is as close as perfect you could get (ie creating a spreadsheet as source). Roughly. I'm not convinced that the use of a surrogate key is the best choice. In Kimball's "The Data Warehouse Toolkit 2ed" he suggests that a surrogate key is preferred.
One reason is size. But a smalldatetime (SQLServer 2k) and an Integer are both 4 bytes. Doesn't that mean that performance is the same?
Another reason is how to handle Unknowns. With Surrogate keys you'd have the value -1 for unknowns but why not have the value '1900-01-01' for unknowns and '1995-01-01' for "Early dates", etc? You'd still have the recommended DATE_TYPE {'Normal','NA',..} field to separate the unknowns from the normal dates.
A reason to use the smalldatetime is that it's much easier as a developer to deal with in the development process (where you don't have to write joins all the time) and, in the end, saves time.
Another benefit of using a smalldatetime instead of a surrogate key is that you don't have to worry about getting the keys in the correct order.
Now my question. Can you come up with any other reason why I should use the surrogate key on my Time Dimension? (I've already searched the forum for answers on this topic.)
Note, I'm all for using Surrogate keys in all the other kinds of dimensions.
Thanks