Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

?? Time Dimension ??

Status
Not open for further replies.

Nuffsaid

Technical User
Aug 3, 2001
374
CA
Our OLTP system records transactions with a UTC/GMT timestamp.
Business requirements state “Customers want reports based on their respective time zones”.
(daylight savings is a non issue at this point)

In the past what we have done is duplicated the “fact record” 8 times, once for each pertinent time zone.
(Currently we are only concerned with 8 time zones) then truncated the time off and joined the fact to a date dimension which holds all dates in all 8 time zones.
Sample of the Date Dimension;

date_key date time_zone
200701011 Jan 1 2007 GMT
200701012 Jan 1 2007 EST
200701013 Jan 1 2007 PST

The front end tool would then add a constraint based on the users local.

This worked pretty well as nobody was concerned about the time portion being truncated.

Things have changed however. We are getting more and more “time based” reporting requests.
We have thought about storing the original UTC timestamp in the fact and doing some fancy
time zone conversions in the reporting tool, however this doesn’t seem ideal.

My question is, how would you approach this given the above scenario?

Thanks to all who take the time to respond.



Nuffsaid.
 
We usually attach a time correction factor (+ 4 for instance), and Time zone (EST) in the user record. We retrieve this when the user signs on and carry it around in memory.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Actually, I think, that storing the time information in a standard (e.g. EST or GMT) and then do some time-zone correction in the reporting tool is the ideal situation.
You can either get the user's time zone from the client environment, from the database (based on the user) or from a parameter you give the report.

Correcting the output when you have the time offset is very easy. It may be a little more tricky to retrieve the user's time-zone.
 
Thanks for your input guys.

I'm a little surprised though. Both of you seem to think that an adjustment made at runtime is okay.

A little more to digest.

As it stands now the fact table has 8 times the number of records that truely need to be there. So does the date dimension.

Thinking now is, record the UTC/GMT timestamp in the fact.
Covert this datetime to a 'Localized Datetime' based on customers location and store in the fact. Derive a foreign key from the localized date to join to the date dim. All done during the ETL process.
Benefits would be; Reduced number of records in both the fact and dimension tables. No on the fly time zone calculations required. And finally, not having to worry that the customer selects the proper time zone in the front end tool. Reports that require HH:MM calculations can be achived by the report developer using the localized date in the fact table.

Comments appreciated.



Nuffsaid.
 
Well Nuffsaid,

I was surprised that doing some time adjustment at runtime could be a problem. I have been looking at your problem again and appreciate why you have your objections to doing on the fly timezone conversions. The queries including the join look pretty ugly. Performance will probaly an issue unless you have function based indexing available.
What environment are you working on?
How many records will be involved in a typical query?
How many of these typical time-related queries do you have/expect?
Do you have to correct for daylight saving time as well?

I agree that reducing the fact table by a factor of 8 is a good thing (and as a bonus the date dimension is reduced as well). On the other hand, you are introducing 7 extra foreign keys in your fact table that actually play the same role. Besides, these keys need to be indexed, so the storage gain is not as big as you might hope.

I still think I would try for the timezone conversion at runtime. But would use inline queries (or nested queries).
I would probably store a date adjustment factor for each timezone in every time record and use that in your queries.

sample time dimension (all times are considered to be GMT)
time_key time date_adj_GMT date_adj_EST date_adj_PST date_adj_CET ....
1 13:15 0 0 0 0
2 01:30 0 -1 -1 0
3 23:45 0 0 0 +1

A typical join would look like:
select
from (select fact.time
, fact.date + time_dim.date_adj_PST as fact_date
, other_facts
from fact
, time_dim
, other tables
, ...
where fact.time_key = time_dim.time_key
and ....
group by .... [if appropriate]
) subq
, date_dim
, other table
, ....
where subq.factdate = date_dim.date
and ....
group by ....

If you want to, you may be able to create several views on the time_dimension, showing only one date adjustment factor.

But most of all: Try it out. What will performance be on any given solution? What do query plans look like (what does the optimizer like best).
It still may be best to store 7 extra foreign keys.

Hope this helps a bit.
 
I would have a LocalTime & GMTTime FK in the fact table. This way, you need only one copy of each record in the fact table. You can have a view on the Time Dimension to create a Local Time role-playing dimension. It should be pretty easy to load the fact table in your ETL process to populate the LocalTime field. It should also be pretty easy to go back and populate the pre-existing records with simple math.
 
Well Riverguy,
it may be a little more complicated as timezone conversion can lead to a different date/week/month/year. So summaries could be off, or you can come up with a negative time.

@Nuffsaid.
Does this effect of timezone conversion really play in your situation?
 
Why would the difference in date/week/month or year matter? If the database were in New York with a date of 1/1/2007 1:00:00 AM, and I was in Los Angeles, then I would want a date of 12/31/2006 10:00:00 PM. I want December for the month, and 2006 for the year. There should not be any negative times as most date math functions that I know of will increment/decrement a date/month/year automatically.
 
Hello All,

Hmmmm….. interesting stuff.

Perhaps a little history to make things clearer. I don’t want to bore you so I’ll do my best to keep it short.

When the company started up it was decided that all reporting would be done in UTC/GMT which was acceptable with the customer base at that time. Over time the customer base has grown and the need for specific time zone reporting started to become quite apparent. At that time there was only a single ETL process in place as data warehousing was a fairly new concept to the company. As such, it was decided that in order to satisfy the time zone issue we would replicate every fact in every pertinent time zone, truncate off the HH:MM:SS as the time portion of the date was not required, then join this fact table to a date dimension which held every date 8 times, once for each time zone. A query of a particular day would return 8 records, one for each time zone. In order to combat this a constraint was added when the user selected his/her time zone in the front end tool. Basically this added the WHERE clause to the query. i.e. Where Time_Zone = ‘EST’. The thinking at the time was, “Although we’re replicating and storing somewhat duplicate data, we won’t have to adjust or modify our ETL process when a new customer comes along to adapt to whatever time zone they’re in. It will already be there.” Worked okay…….. back in the day!

So, here we are today where questions are being asked like “What are my peak hours?”

We’re doomed...[flush]

It goes without saying that we’ve got to update our process don’t ya think?

“We need to be able to answer these time related questions. What’s the best approach?”
(a quote from management)

Nuffsaid’s “War Cry”; [soapbox] “We don’t need 8 different time zones for every customer!! We need 2. The original UTC and the customer’s. That’s it!” [flame]

So, current thinking is;

1) Store the “COMPLETE” UTC/GMT timestamp in the fact.
2) Convert the entire timestamp to the customer’s time zone in the fact. (Localized Datetime)
3) Derive a DATE key from the Localized Datetime to join to a generic Date Dimension which IS NOT cluttered up with time zone crap. One Foreign Key.
4) For the handful of reports at this time, that require localized time, have the report developer use the localized date timestamp stored in the fact.

Benefits:

1) Reduced number of records in both fact and dimension tables.
2) No need to add a constraint using the front end tool.
3) Don’t have to worry about users selecting the CORRECT time zone. (don’t know how much time I spent doing reconciliations only to find out……[banghead])

Downside:

1) One ETL process won’t cover everyone. The process will have to adapt based on the customers time zone
2) ??

Questions And Answers:


Hans63 said:
What environment are you working on?

Backend is Oracle 10g

Hans63 said:
How many records will be involved in a typical query?

This can vary quite a bit. A lot of our reports are ad-hoc where users can specify date ranges. So the number of records can be as few as a couple of hundred or as many as a couple hundred thousand.

Hans63 said:
How many of these typical time-related queries do you have/expect?

Hard to say. So far it’s only been a handful, but who knows. My opinion, “Better Safe Than Sorry” as we can see from history.

Hans63 said:
Do you have to correct for daylight saving time as well?

This is what scares me the most. I’ve been a little obstinent(sp?) with management on this point. Don’t know how long I’ll get away with it though.

Hans63 said:
It may be a little more complicated as timezone conversion can lead to a different date/week/month/year. So summaries could be off, or you can come up with a negative time.

Exactly!! If the conversion isn’t done properly it creates havoc during the reconciliation process we go through.

In conclusion I still feel that storing 2 datetime stamps in the fact along with 1 FK derived from the localized datetime in order to join to a “generic” date dimension that IS NOT cluttered by time elements or time zone attributes is the best way to go…. for now?

What do you guy’s think?

Thanks again for taking the time…..



Nuffsaid.
 
Having read all of this post, one important conclusion is that dealing with timezones can create confusion and brings along several pitfalls.

I do not yet see how to really solve this.

So far I have seen 2 ways to approach this:
- Using multiple date-keys in the fact table.
- Doing some time-zone conversion at query time.

Both ways have their pro's and con's and both ways sound (to me, that is) workable.

In Oracle 10g you can create indexes on a function (e.g. a truncated timestamp) I am not sure about using a function in a foreign key relation. So the solution Nuffsaid proposes might for this situation be a good one.

riverguy said:
Why would the difference in date/week/month or year matter? If the database were in New York with a date of 1/1/2007 1:00:00 AM, and I was in Los Angeles, then I would want a date of 12/31/2006 10:00:00 PM. I want December for the month, and 2006 for the year. There should not be any negative times as most date math functions that I know of will increment/decrement a date/month/year automatically.
This is the case when you are handling timestamps, but when using date-dimensions and time-dimensions it does become an issue. Consider a question like: What are my peak hours?

If you don't compensate for negative time when correcting for time-zones you get a very strange report that may have hours ranging from -6:00 to 32:00. And the users expects a report with hours ranging from 0:00 to 24:00.
If you want this report to show the times per weekday, you are off by one day for several facts.
 
I have only quick scanned this topic, so apologies if this has already been mentioned but since Oracle has already been mentioned:
In Oracle 10g you can create indexes on a function
It should also be mentioned that Oracle's datatype of Timestamp with local Timezone will 'pick up' the current session's timezone and output the data recalculated to the appropriate time based on the current session's TZ. No requirement for offsets (which Timestamp with timezone can handle if need anyway)
 
Nuffsaid,
I don't think the ETL changes would be too difficult. You could have a big case statement, which would have to be maintained as customers in new areas come on board, or, as states like Indiana seem to change their time zone from time to time. Alternatively, if you had the logic stored in a database table, and performed a lookup in your ETL, then that would work too.

Hans, I think I know what you mean by the date/time issues. However, if the localized time is derived in the ETL stream before the date/time keys are looked-up, then it should work smoothly.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top