Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

schema design question

Status
Not open for further replies.

scottcabral1979

Programmer
Mar 16, 2012
30
US
Hi,

I'm designing a monthly periodic snapshot fact table that will have every homeowner's insurance policy that is active at the end of each month.

Each policy has many attributes including policy number, effective date, expiration date, agent, territory, distance from coast, number of family, usage type, New or Renewal Code, and others. The measures in the fact table will include written premium, earned premium, coverage A/B/C/D amounts, deductible amount, and total exposure.

My question is should I have the Policy Number as a Degenerate Dimension in the fact table and create dimensions for all of the policy fields listed above, or should I create a policy dimension that will store certain attributes that may not need their own dimension?

I'm not sure what the best approach would be given the grain of the fact table. Looking for some advice.

thanks

 
It depends.

Topic 1: Do you have a higher dimension than policy? Like "Household", or "Customer"? As the customer/household could have multiple insurance policies.
Topic 2: Do policy numbers change? If I leave your company for a year and come back, is it a new policy number? Does the business even care about returning customers/policies?
Topic 3: You mentioned that this would be a monthly snapshot (summary) table. Is the table recreated every month? Or are more rows added each month? This could affect how to distribute the dimensions or whether to denormalize some of them in the fact records. If the table is recreated anew each month, I would denormalize most of the dimensions.

Hope this helps, or at least gets you thinking.

====================================
Sometimes the grass is greener on the other side because there is more manure there - original.

 
Hi Johnherman,

thanks for your input.

1. We do have a Customer Number for each policy and yes, a customer could have multiple policies with our company. I was thinking of building a Customer dimension to hold each customer number, address, and name information. This could be an SCD type 2 if the address or name changes, but probably very minimally. The business might not even care and just want to see the most recent data. I'll have to find that out on my end.

2. Policy numbers can change if a policy is cancelled and then rewritten. When the policy is rewritten, a new number is given. I'm not sure if the business needs to know if the policy number changed, but it may be beneficial in the future. Also, we are currently running 2 systems right now, with system 1 going away by the end of the year. We are manually migrated all policies from system 1 to system 2 at renewal. At this point, the policy will get a new number issued by system 2. The data warehouse still retains the "Old" number from system 1 in a field called InceptionPolicyNumber. This way we can tell what the policy number was before is was migrated to system 2.

3. I was going to create the snapshot table by appending the policies that were active each month, so we would be adding more rows to the table every month. There will be a Snapshot Date field that will hold the last day of each month so that we can tell for which month the data is for. So in Jan 2013, we may have 100K active policies, so 100K rows will be added with snapshot date of 01-31-2013. In Feb 2013, we may have 110K active policies, so 110K rows will be added to the table with a Snapshot Date of 2-28-2013. At this point the table will contain a total of 210K rows.

Let me know if the above answers help with the decision of creating a separate Policy dimension to hold some of the policy attributes or to just have the Policy Number as a Degenerate Dimension and separate dimension tables for all other fields.

thanks
Scott
 
Scott, your answers were well thought out.....I hope you can get your business folks to be as accurate as you when they tell you their plans for the future.

Based on your description, I would denormalize Customer into the degenerate policy dimension. Without address of course, although zip code - maybe.

If your company isn't real concerned with the cancellation aspect, then the policy number changes are probably not important.

You are wise to use a full date for the snapshot date (rather than month/year). This allows you to use the same structure to do weekly (or even daily) snapshots if your customer moves in that direction.

At some point, you may want to consider an archive process to take rows from the "active" snapshot table to an "archive" snapshot table. Maybe a weekend batch process. This will help preserve the good performance of the summary table.

====================================
Sometimes the grass is greener on the other side because there is more manure there - original.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top