I am currently implementing a data warehouse and I'm following the Kimble approach of having a separate date dimension which is currently the UTC date. I have hit a slight wall when it comes to time zones as the clients will be wanting to view their data in each respective zone.
This is not a new problem as I have seen from searching the web including this forum:
For myself I am not so interested in the time as I will be storing a UTC timestamp in the fact table, this can be manipulated by the users at runtime. It is how best to represent multiple time zones in the date dimension in a data warehouse. The date dimension holds:
dateKey int,
dayOfCalendarYear int,
calendarQuarter int,
dayOfWeek text,
etc
I am currently thinking that I could combine all time zones into 1 date dimension table. Such that it would look like:
dateKey int,
utcDayOfCalendarYear int,
utcCalendarQuarter int,
utcDayOfWeek text,
utcPlus1DayOfCalendarYear int,
utcPlus1CalendarQuarter int,
utcPlus1DayOfWeek text,
utcPlus2DayOfCalendarYear int,
utcPlus2CalendarQuarter int,
utcPlus2DayOfWeek text,
etc
This will result in a large date table as I would repeat the columns for the 24 hours in the day. I was thinking that vertical partitions could be used here, also views could then be created onto the date dimension per time zone.
Thanks in advance for any feedback on this approach or suggestions of other approaches.
This is not a new problem as I have seen from searching the web including this forum:
For myself I am not so interested in the time as I will be storing a UTC timestamp in the fact table, this can be manipulated by the users at runtime. It is how best to represent multiple time zones in the date dimension in a data warehouse. The date dimension holds:
dateKey int,
dayOfCalendarYear int,
calendarQuarter int,
dayOfWeek text,
etc
I am currently thinking that I could combine all time zones into 1 date dimension table. Such that it would look like:
dateKey int,
utcDayOfCalendarYear int,
utcCalendarQuarter int,
utcDayOfWeek text,
utcPlus1DayOfCalendarYear int,
utcPlus1CalendarQuarter int,
utcPlus1DayOfWeek text,
utcPlus2DayOfCalendarYear int,
utcPlus2CalendarQuarter int,
utcPlus2DayOfWeek text,
etc
This will result in a large date table as I would repeat the columns for the 24 hours in the day. I was thinking that vertical partitions could be used here, also views could then be created onto the date dimension per time zone.
Thanks in advance for any feedback on this approach or suggestions of other approaches.