Four different data attributes for a dimension

teccum · Feb 23, 2005

Hi,

I am working on a Healthcare datawarehouse and have few modelling questions.

I have a claim dimension which has four different types of date fields. That is there is claim service start date, claim service end date, claim paid date, claim received date. I would like to know how to treat these day attributes. Should I include this in the claim dimension or claim fact table? If I include in claim dimension or claim fact table, then how does my time dimension (which is going to have day, month, quarter and year calender date information) link to all the four date attributes. should I need to link the claim fact table with the time key in the time dimension table four times or Shall I create four different claim time tables having the respective date attribute as description and an id for a primary key in that each claim time tables.

My another concern is that the users want to see rollup information for each dates for some reports. That is when they want to rollup to month and quarter level information, let say the report is - claim amount for Jan 2005 claim paid date and then the other report is claim amount for Jan claim received date etc.

So any ideas would help me understand more better.

thanks
teccum.

johnherman · Feb 24, 2005

The dates are facts associated with the claim. These kind of facts are often called event facts as they are associated with an event rather a traditional measure such as units or money. You may also see references to a "factless fact table".

The dates belong in the claim fact table and each date has a separate key to the time dimension (4 possible joins). After you have completed modeling, you will generally find that the Claim dimension becomes a degenerate dimension because of the need to denormalize much of the info associated with the claim to the claim line or claim item (also called "service") level.

Recommend you take a look at Ralph Kimball's "The Data Warehouse Toolkit", 2nd Edition, where he discusses Health Care claims in greater depth.

-------------------------
The trouble with doing something right the first time is that noboby appreciates how difficult it was.
- Steven Wright

teccum · Feb 24, 2005

Johnherman,

Thanks very much for your reply.

If I need to include these dates in the fact table and if each date is linked to the day dimension through the day key four times then I do understand there will be four links. But I still could not clearly visualize how the time dimension should be built for this case.

Let say a normal time dimension (denormalized - that is, there is no lookup table for month or quarter or year and all these information are in the same physical table)has fields

Day_Key (Number - datatype)
Day_description (Date (mm/dd/yyyy) - datatype)
Month_desc (Varchar(20) - datatype)
Quarter_desc (Varchar(20) - datatype)
Year_desc (Number - datatype)

Since there are four links to the time dimension should I have four day_keys in the time dimension with

Day_key1 - linked to one date in the claim fact table.
Day_key2 - linked to second date in the claim fact table.
Day_key3 - linked to third date.
Day_key4 - linked to fourth date in the claim fact table.

What should the claim fact table should have. Should it have all the four day keys.

Now if the dates are linked to the same day_key four times, then how does it roll_up automatically when the users require information by month level, year level for each different date analysis like Claim amount for all claim paid date in Jan 2004 and then for other requirement like Claim amount for all claim service start date for December 2004.

I guess, I have rather not understood or little bit confused. So could you kindly clarify this.

Thanks for all your help.

teccum.

johnherman · Feb 24, 2005

Depending on your database, system, and physical design, you may have 4 separate time dimensions, each with a 1:1 relationship to the key in the fact table.

Or you could have a single time dimension with 4 separate date keys in the fact table.

Consider your users analysis habits. Are they likely to want to analyze data by more than one date at a time? I doubt it. They may be interested in claim lag (time from date of claim to date of payment) or days admitted (discharge date minus admission date). Of they may want to roll up data by claim paid date, but when they do that, do they really care when the service was performed? The key is to use the proper date when querying.

Whatever works for you. If you provide more info, I can give a better opinion. BTW, did you get the Kimball book?

-------------------------
The trouble with doing something right the first time is that noboby appreciates how difficult it was.
- Steven Wright

teccum · Feb 24, 2005

Johnherman,

The problem with the users are they comeup with different analysis requirement each time I take my model
(after I decide I finalize on this) to them for their review and their sign-off.

Initially, I had the dates in the claim dimension, when they did not mention that they might be doing all type of date analysis. Then after their feedback and review, I changed it and moved it to a different time table with just the respective claim dates and a claim date key.

One of the reasons I moved all the seperate claim time date fields as four different claim time dimesnions is that I was not sure how the ETL developer will load the data if I keep all the four claim time dates in the claim fact table and have four different date keys for each claim date table in the fact table and linking them to the one date key in the time dimension.

I was also thinking how, suppose if they drill down from month to date for a particular claim date field,(eg claim paid date - which has date_paid_key linked to the date_key in the time dimension) will the SQL Engine automatically recognize the respective date key in the claim fact table
and select only those records. What happens if they want to see for claim service start date? etc..

I believe this falls in the attribute roles recognition area but not very clear.

My database is SQL Server 2000 and data modelling tool is Erwin.

And, I have Kimball book which I bought in 2000. I need to buy the latest version which I would be doing in a couple of days.

Thanks for all your opinion.

teccum.

johnherman · Feb 24, 2005

Example solution with one time dimension

Claim number 123
Svc Start 1/1/05
Svc End 1/1/05
Claim Date 1/3/05
Pay Date 1/31/05

CLAIM FACT TABLE
Claim ID 10101
Claim number 123

SERVICE FACT TABLE
Claim ID 10101
Service ID (or line sequence, etc)
PRocedure 3535
Svc Start 1001
Svc End 1001
Claim Date 1003
Pay Date 1031

TIME DIMENSION

DateID 1001 1002 1003 1031
Date 1-Jan-2005 2-Jan-2005 3-Jan-2005 31-Jan-2005

-------------------------
The trouble with doing something right the first time is that noboby appreciates how difficult it was.
- Steven Wright

johnherman · Feb 24, 2005

Example solution with four date dimensions

Claim number 123
Svc Start 1/1/05
Svc End 1/1/05
Claim Date 1/3/05
Pay Date 1/31/05

CLAIM FACT TABLE
Claim ID 10101
Claim number 123

SERVICE FACT TABLE
Claim ID 10101
Service ID (or line sequence, etc)
PRocedure 3535
Svc Start 1001
Svc End 2021
Claim Date 3002
Pay Date 4011292

SERVICE START DATE DIMENSION

DateID 1001 1002 1003 1031
Date 1-Jan-2005 2-Jan-2005 3-Jan-2005 31-Jan-2005

SERVICE END DATE DIMENSION

DateID 2021 2022 2023 2051
Date 1-Jan-2005 2-Jan-2005 3-Jan-2005 31-Jan-2005

CLAIM DATE DIMENSION

DateID 2999 3001 3002 3032
Date 1-Jan-2005 2-Jan-2005 3-Jan-2005 31-Jan-2005

PAY DATE DIMENSION

DateID 3304958 4726549 30325302 4011292
Date 1-Jan-2005 2-Jan-2005 3-Jan-2005 31-Jan-2005

-------------------------
The trouble with doing something right the first time is that noboby appreciates how difficult it was.
- Steven Wright

teccum · Feb 24, 2005

Johnherman,

I believe I understand the difference in the two methods and the points which you put forth.

Currently I have modelled with the four date dimensions with four date keys linking to a Month,quarter and year time hierarchy tables for drill up purpose. And these four date keys are present in the fact table.

Claim Dimension table:

claim_key
claim_id
claim_submitter_id
Claim_submitter_name
claim_adjustment_code
claim_status_code
claim_status_description
claim_clean_flag
claim_approved_flag
claim_electronic_flag
claim_encounter_payment_flag

Claim fact table:

claim_key
provider_key
speciality_key
vendor_key
authorization_key
member_key
Day_Key

Time Dimension tables:

Day_Dimension

Day_key (sequence no)
Day_date (date - calender date)
Month_key
Day_flag
Day_short_name

Lookup_Month

Month_key
Month_desc
Quarter_desc
Year_desc
Month_of_year.

(The reason for splitting the day from the month table is that there is another fact table called Capitation_fact which has the granularity of data at the month level, So I have a link(month_key) from the Month table to the Capitation_fact table.

I do see the One date dimensions and four date keys in the fact table for each claim date linking to the same date key in the time dimension.

The point which i notice is I need not have four links from the four different claim date dimensions linked to the Month lookup table hierarchy if I go with the first option.

Then the other point which strikes my mind is, if the query retrieves the data from the fact table and then if it goes to the Time dimension table to retrieve the respective claim dates, I believe the data should be good or valid.

The Only point is, i am not sure if the data will be good or valid or the SQL Link will be right if the SQL first retrieves data from the Time dimension and then goes to the Fact table to retrieve the corresponding claim date
data.

Is this way of modelling o.k?

Thanks for your suggestions.

Teccum.

teccum · Feb 24, 2005

Johnherman,

I am sorry I did not give the few fields in the fact table and the claim time tables. Here it is

Claim Dimension table:

claim_key
claim_id
claim_submitter_id
Claim_submitter_name
claim_adjustment_code
claim_status_code
claim_status_description
claim_clean_flag
claim_approved_flag
claim_electronic_flag
claim_encounter_payment_flag

Claim fact table:

claim_key
provider_key
speciality_key
vendor_key
authorization_key
member_key
Claim_received_date_key.Day_Key.
Claim_paid_date_key.Day_Key.
Claim_service_start_date_key.Day_Key.
Claim_service_end_date_key.Day_Key.
claim_amount
claim_billed_amount
claim_medicare_amount
claim_deductible_amount
claim_net_amount

Claim Date Time dimension tables----

Claim_received_date_dimension:

claim_received_date_key (sequence no)
claim_received_date_description (day date field)
Month_key

Claim_paid_date_dimension:

claim_paid_date_key (sequence no)
claim_paid_date_description (Day date)

Claim_service_start_date_dimension:

claim_service_start_date_key
claim_service_start_date_description

claim_service_end_date_dimension:

claim_service_end_date_key
claim_service_end_date_description

Time Dimension tables are ----

Day_Dimension

Day_key (sequence no)
Day_date (date - calender date)
Month_key
Day_flag
Day_short_name

Lookup_Month

Month_key
Month_desc
Quarter_desc
Year_desc
Month_of_year.

(The reason for splitting the day from the month table is that there is another fact table called Capitation_fact which has the granularity of data at the month level, So I have a link(month_key) from the Month table to the Capitation_fact table.

Is this way of modelling o.k?

Thanks for your suggestions.

Teccum.

johnherman · Feb 25, 2005

I assume your question relates to the circumstances where some of the dates on the claim, perhaps all dates, have the same value and same key. (such as the svc start and svc end dates in my first example). You are correct that the key value from the date table appears twice in the fact table, however, you are overlooking the obvious. You will be telling the database how to do the join via your JOIN or WHERE clause criteria in your SQL statement.

In addition, the database is "smart" enough to realize the relationship between the fact and dimension tables and that there are many facts associated with a single date record, and that could apply to multiple svc start dates as well as services which start and end (and maybe are even billed or paid) on the same day

Failure to provide the linkage in the SQL statement will result in a cartesion or cross-product join in a star schema just as it will in a traditional relational data model.

-------------------------
The trouble with doing something right the first time is that noboby appreciates how difficult it was.
- Steven Wright

teccum · Feb 25, 2005

Johnherman,

Thanks onceagain.

I understand that the ETL programmer should be clearly explained on how he should retrieve the data from and load into the fact table by comparing the DAY_DW dimension four times for each load for all the four different claim date fields in the fact table. I think this will handle the solution apart from designing as per the first method suggested by you. You are right and this should work.

I have remodelled my design with all the four claim dates in my claim fact table and I have four key linked from the DW_DAY dimension to the fact table.

The other point which I have difficulty is convincing my manager (who I believe has complete knowledge of DW) of telling him that the flags and the event attributes will generally fall in the fact table. Since currently I have flag fields (which if you see the claim dimension) in claim dimension which should be in the claim fact table. My manager insists and he is of the idea that only the numerical fields will be in the fact table. whereas he wants those fields to be in the claim dimension tables itself.

Thanks

teccum.

johnherman · Feb 25, 2005

There is a concept in DW called a factless fact table. These tables concern events, such as Hospital Admission, Automobile Accident, etc. And it is necessary to collect various info regarding that event. These are "facts" although they may not be "measures". Measures are your traditional facts, they are numeric and can be summed or averaged or counted. Event facts are concerned with the event and may not be able to be summed, averaged, etc.

Again, see the Kimball book.

-------------------------
The trouble with doing something right the first time is that noboby appreciates how difficult it was.
- Steven Wright

teccum · Feb 25, 2005

johnherman,

Thanks much.

I will look at the kimball book especially on how to explain my manager.

I tried explaining him him about this factless fact table and that the flags and the dates etc will generally fall in the fact table. But he is of the opinion that keeping the claim flag fields will only add to the size of the fact table and since the claim flag fields are more related to the claim dimension table and the claim flag is with respective each claim and since the claim dimension has all the claim data, there would be less records if we keep the claim flag fields in the claim dimension table than in the claim fact table where it is duplicating many times.

And he also says that if the user require any analysis it is always possible to link to the dimension table and get the claim flag data like how it is done for customer email, customer gender etc in customer dimension where the primary key is the customer key.

teccum.

sh4kil · Feb 28, 2005

Basically, the date key will then be joined to the four keys in the fact table, but in each case you will alias the Time Dimension (day level), and Time Dimension (month level).

You would then have associated columns being returned from the aliased Time dimension tables, and in your SQL you would give them meaningful names. Or if you are using a end user reporting tool then you would do it in that.

teccum · Feb 28, 2005

Thanks Sh4kil and Johnherman.

I have modelled with the day key from the fact tables linked to the day dimension tables for all the various dates which comes into three different fact tables. I will have to take care at the ETL phase for those dates now.

Coming to another clarifiction which I would like to continue with the same thread.

I have a dimension specialty and provider. I have a claim fact table where I have the specialty key and the provider key. The user wants to find out speciality for a group of providers for a particular report. The point to be noted is the there are specialty for the providers but they do not have claim. (i.e) they just need to find out the different specialty for a set of providers. In the current model, I can find out the specialty for the set of providers where there was claim since I am storing the combination of specialty and provider in the claim fact table.

I understand that this falls in the concept of Coverage tables or Bridge table (as per Ralph Kimbal) But I would like to know which is the best way of handling this.

1. create a coverage table containing the provider key, specialty key, vendor key etc and keep it as a seperate table. If this is the case then for each combination of reports between dimensions then there would be a coverage table. Then my schema will look like a hub and spoke model and will not look like (physically atleast looking at the diagram) star schema.

2. Load all the data into the claim fact table for all combinations of speciality, provider with the value of 0 whereever there is no claim.

3. Create a Parent child relationship between the Provider dimension table specialty dimension table and allow the OLAP tool to take care when those typeof report are requested.

which way is the best and Is there any other way?

thanks

teccum.

johnherman · Mar 1, 2005

Having done medical billing in the past, I believe the proper solution is to store the Provider key in the claim fact table. There should be anothter reference (or lookup) table for Specialty and it should have a Specialty key. Next, you need to determine the cardinality of the relationship which brings to the table one very critical question:

Can a provider have more than one specialty? It is my understanding that the answer to this question is "yes". Therefore, you need a bridge table between Provider and Specialty to allow for multiple specialites. However, you could also put multiple columns in the Provider table, for instance Specialty1, Specialty2, etc, knowing that even a super duper doctor won't have more than 4(?) specialities.

Then there is the other aspect. The claim fact is specific to one event, namely the services that were performed on a patient during a visit or admission. Therefore, the servicing Provider who might have multiple specialities could only have been using ONE of his specialities when performing the procedure associated with the claim in question, and if you want, could store the Provider Key and Specialty Key directly in the claim fact.

It's really up to you. It depends on how your providers operate, how many specialties they have, and how the company wants the results reported.

Hope this helps.

-------------------------
The trouble with doing something right the first time is that noboby appreciates how difficult it was.
- Steven Wright

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Four different data attributes for a dimension

teccum

Technical User

johnherman

MIS

teccum

Technical User

johnherman

MIS

teccum

Technical User

johnherman

MIS

johnherman

MIS

teccum

Technical User

teccum

Technical User

johnherman

MIS

teccum

Technical User

johnherman

MIS

teccum

Technical User

sh4kil

MIS

teccum

Technical User

johnherman

MIS

Similar threads

Part and Inventory Search

Sponsor