Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Fact Data as Dimension

Status
Not open for further replies.

Gurudattap

Technical User
Jan 25, 2002
8
IR

Hi,

We have a requirement where account balances (numeric data) needs to be stored across different balance slabs. E.g. if the bank has 5 customers with following balances

Customer Balance

1 100
2 200
3 200
4 100
5 250

Then we need to generate info like

Size of Acc Balance No of Customers Total Acc Balance
0 - 99 0 0
100-199 2 200
200-299 3 650

What would be the best way to store this summerized data. I have not given here some other dimensions like customer type and time, as they are not important for my question here. Based on my understanding of Facts and Dimensions, Account Balance is the Fact, but the slabs of Account bAlance (Fact Value) is acting like a dimension, how do we handle this situation - are there any good articles that describe more on handling such cases.

Regards,

Gurudatta




 
Hello Gurudatta,

There is nothing wrong with creating groups out of the balance values, but to get the information about number of customers that belong to a CERTAIN slab can still be done from the original dataset:

Select '0-99',count(Customers),Sum(balance)
from Table
Where Balance >= 0 and Balance < 100

will give you the result for one slab. If you have just a few categories, you could easily perform some union queries to get the whole range in a dataset:

Select '0-99',count(Customers),Sum(balance)
from Table
Where Balance >= 0 and Balance < 100
UNION
Select '100-199',count(Customers),Sum(balance)
from Table
Where Balance >= 100 and Balance < 200
UNION
Select '200-299',count(Customers),Sum(balance)
from Table
Where Balance >= 200 and Balance < 300

But you have to question yourselve if you want to do this. At reporting level you either would have to know your SQL or have a good reporting tool to make such reports from the seperate balances. If you are going to use such categories on a day to day bases and you have no advanced reporting tool, you are perhaps better off with a pre-aggregated table to work on........... T. Blom
Information analyst
tbl@shimano-eu.com
 
Thanks, We will be needing this data on day to day basis. In that case, how do we handle it in dimensional modelling?

Regards,

Gurudatta


 
What are you using to report from the DWH? I am used to building Universes with Busobj, but I have a hard time imagening how to fit such a table with slabs into a model.
Can you relate other data to the slabs....? Since you have just this made-up dimension in the table there is little to relate on..... T. Blom
Information analyst
tbl@shimano-eu.com
 
Hi Gurudatta,

What you are trying is not that uncommon. It's often used in demographic analysis and is also called &quot;banding&quot;. Other examples include age ranges and income ranges. See pp 155-157 in Kimballs, Data Warehouse Toolkit (Second Edition) for more details on demographic (or what he callse &quot;mini&quot;) dimensions.

As for your particular case, you are probably storing your account balances as a series of periodic snapshots (e.g. once per month or once per day). You would reasonably want to have the same frequency for your balance demographic table.

As for creating the table (which I would recommend), create a calculated item as a case (decode, whatever) statement to define the bands. For example, A = 0-99, B= 100-199, etc. and group by that.

If you can't do that in a single SQL statement, stage the data first with just the case calculation then access the staged table and group by the band.

Hope that gives you enough detail. I'm being called to lunch so I can't tell more at the moment.
Good luck,
Matt :)
 
Hello:

This scenario in very common in customer segementation and householding etc. To handle this you need to create a look up or account range dimension table with columns
RANGE_ID (Key)
RANGE_HIGH_VAL
RANGE_LOW_VAL
RANGE_DESC

This table will be populated everytime you do your batch load and a new range_id generated for the a acc. value that does not fall in any of the range. (Same way you would generate a surrogate key)

During the ETL you mark each OLTP record with appropriate range. Then you sum up the acc balance and count the number of customers for each range grouping on the range_id.This will give the target table you want.

The process can be further simplified by having a fixed number of ranges (a static dimension table with say 1000 ranges including a &quot;dummy range_id like 9999&quot; for outlier data points). This will help in speeding up the DW load cycle.

Hope this helps!
DWTECH
 
Thanks folks for all the inputs!!
This has given me a lot of clues!


 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top