Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Date dimension with multiple time zones

Status
Not open for further replies.

sgbrown

Programmer
Aug 5, 2011
4
0
0
GB
I am currently implementing a data warehouse and I'm following the Kimble approach of having a separate date dimension which is currently the UTC date. I have hit a slight wall when it comes to time zones as the clients will be wanting to view their data in each respective zone.

This is not a new problem as I have seen from searching the web including this forum:
For myself I am not so interested in the time as I will be storing a UTC timestamp in the fact table, this can be manipulated by the users at runtime. It is how best to represent multiple time zones in the date dimension in a data warehouse. The date dimension holds:

dateKey int,
dayOfCalendarYear int,
calendarQuarter int,
dayOfWeek text,
etc

I am currently thinking that I could combine all time zones into 1 date dimension table. Such that it would look like:

dateKey int,
utcDayOfCalendarYear int,
utcCalendarQuarter int,
utcDayOfWeek text,
utcPlus1DayOfCalendarYear int,
utcPlus1CalendarQuarter int,
utcPlus1DayOfWeek text,
utcPlus2DayOfCalendarYear int,
utcPlus2CalendarQuarter int,
utcPlus2DayOfWeek text,
etc

This will result in a large date table as I would repeat the columns for the 24 hours in the day. I was thinking that vertical partitions could be used here, also views could then be created onto the date dimension per time zone.

Thanks in advance for any feedback on this approach or suggestions of other approaches.
 
After thinking about this approach some more I have had to alter the concept. The only way it can be viable is if there is an hour column that holds the UTC hours for every day, so 0 to 23 per day. In this case then the date dimension would actually be replaced with a DateByHours dimension such as this:

dateByHoursKey int,
utcHour int,
utcDateValue int, (yyyymmdd)
utcDayOfCalendarYear int,
utcCalendarQuarter int,
utcDayOfWeek text,
utcPlus1DateValue int, (yyyymmdd)
utcPlus1DayOfCalendarYear int,
utcPlus1CalendarQuarter int,
utcPlus1DayOfWeek text,
utcPlus2DateValue int, (yyyymmdd)
utcPlus2DayOfCalendarYear int,
utcPlus2CalendarQuarter int,
utcPlus2DayOfWeek text,
etc

Problems with this would be that there would be a lot of repetition of values, 8760 rows to represent one year as apposed to 365 in a standard date dimension.

The advantages:
1) Only 1 foreign key in the fact table for a date.
2) The DateByHours dimension could start with just the UTC date and as time zones are needed they can just be added as extra columns or partitioned tables.
3) Views can be provided for users per time zone, they need not be concerned with the hours column just the date specific values
 
I would transform the local time to UTC during loading. Consider this "transforming the data to a standard". Standardization of codes is an important DW activity.

I question the validity of using local time in a source system as well. Too many things to screw up, like moving to/from Daylight Savings Time. If using local time, you could take an internet order during the switchover and ship it before the credit card authorization, for instance. Similar situations where the shipping location is in a different time zone from the order location.

For further insight (confusion), consider the Solar Eclipse of June 2011, which started in Siberia on June 2 and ended in Norway on June 1.

If using this solution, you might need a table to translate back to local time, depending on user needs.

The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Thanks for your reply, all dates in the operational database are already in UTC, I'm looking into ways of viewing a date dimension in multiple time zones. I suppose I'm looking for validation of the proposal I posted up in comparison to other techniques such as:

1) multiple date foreign keys in the fact table
2) multiple rows in the fact table, 1 per time zone
3) self referencing date dimension, 1 column per time zone - involves a join back onto itself to get the correct date.
 
I think you should be looking at a conversion table, not another date table. Decide what geographical or location variable can be used to provide the conversion factor. Zip code, state, etc. Worst case something like facility ID.

The conversion table would join to the fact table via this location attribute. And if performance seems to be an issue, could be de-normalized into the fact table during ETL.

The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
following approach worked fine in our case.

a. All transactions are recorded in GMT (UTC)
b. Create separate "Time" dimension with a grain oh hour, with a grain based on UTC.
c. Add time zones to "Date" dimension with separate zonal columns for each time zone- One column to store "time adjustment number" with respect UTC and other to denote daylight saving flag.

Anand
 
Thanks for the replies, sorry that it took me a while to reply back. I went for using a conversion table as johnherman had suggested and then created views onto each time zone, seems to be working well.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top