Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Sometimes a time for MDX other times for SQL, which one is this??

Status
Not open for further replies.

eo

MIS
Apr 3, 2003
809
SSAS 2005:

I have quite a unique challenge, where I analysts wants to analyse claims in a cube by having the option to slice off values over certain levels. For example if the sum of ClaimsValue is 1,000,000 but there where 4 claims in excess of 50,000, the analysts want to slice the sum of those off the total for further analysis, so this would mean 1,000,000 - (50,000 x 4) = 800,000

To complicate matters, there are valious slicing levels they might want to select from (10,000 or 50,000 or 100,000). Now the easy route would mean to create 3 new measures in the SQL feeding the cube that shows any values over those specified levels, but I would have hoped that this could have been accomplished in MDX, using perhaps a dimension, etc. This could then be used within MDX calculations further on...

...but am I missing a trick here? Any advise greatly appreciated.

EO
Hertfordshire, England
 
You have multiple needs here although they may be closely related.

1) The need to remove some slice of data from your totals.
Q: How is this numbered determined? does it change? if it changes on what and when can it change?
i: business rule
ii: user determines

2) you need to allow the user the ability to slice on a range of values that are related to the claim amount?
Q: Who determines this range? does it change? if it changes on what and when can it change?
i: business rule
ii: user determines

3) the need to alert users when data an amount has exceeded a predetrmined value?
Q: Who determines this value? does it change? if it changes on what and when can it change?
i: business rule
ii: user determines


by clearly identifying your exact needs it is easier to map out the solution.

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
Paul,

1) The slice values are pre determined thus via a business rule

2) Pre determined

3) Alerts not a requirement at this stage

For example: These are some claim values:

ClaimNumber; ClaimValue;
1;1000
2;20000
3;50000
4;90000
5;1500
6;60000
7;40000
8;250000
9;1000000
10;1600

The total ClaimVal is therefore 1514100

If the user selected the 10K slicer, I want to remove the total of all claims greater than 10K from this total (totals of Claim Numbers 2, 3, 4, 6, 7, 8, 9) thus 1514100 - 1510000 = 4100
Similarly using the 100K slicer would deliver 1514000 - 1250000 = 264100

Currently I have to hardcode in the SQL datasource "slicer" columns, which delivers the result as a feed to the cube:

ClaimNumber;ClaimValue;At10KSlice;At50KSlice;At100KSlice
1;1000;0;0;0
2;20000;20000;0;0
3;50000;50000;50000;0
4;90000;90000;90000;0
5;1500;0;0;0
6;60000;60000;60000;0
7;40000;40000;0;0
8;250000;250000;250000;250000
9;1000000;1000000;1000000;1000000
10;1600;0;0;0
Totals;1514100;1510000;1450000;1250000

But I am not sure that SQL is the most appropriate place to do this, I have never been a fan of hard coding measures into SQL if we cannot arrive at the same result within SSAS (i.e. through the use of MDX)

EO
Hertfordshire, England
 
SQL is not the Best Place to accomplish this. This is possible through MDX Calculations stored in the cube along with a new dimensions.

The tricky part is you want to use the Slicer as the elimination element where it is typically the inclusive element.

Fisrt step is to build out your dimension table that has your ranges. You want to build this in a natural hierarchy design.

10000
1001
1002
....
1999
20000


or whatever the breakout is. Yes this is going to be a huge table in the number of rows but the MB size is going to be small as we should only need these 2 tables. As for the large number of leaf members we won't expose this to the user so don't worry. Build this out to fit your business rules. You should easily be able to build a stored procedure and a couple tables to autobuild this table.



Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
Just so as I understand, should the range dimension table contain all possible Claim Values? In real terms they can go up to the billions?And also the ClaimValues are not categorised by rangle (for example the ClaimValue 40000 does not so much fall in the range 10K to 50K, but rather fall in the category above 10K, and so does 60000 fall under that category, and 60000 also falls in the category above 50K, but 40000 does not fall in that category)...or am I being dim here

EO
Hertfordshire, England
 
I would build the process so that it only builds into your dim table values that actually exist in your data and will add new values as they are encountered in the new data.

when you start with n Is included in the group greater than X and also the group less than Y then you are getting into a many to many relationship which is a bit more effort to build but still very possible in the 2005 world. can you build a quick data diagram showing how your various values should map.

lets set these limits

Low Range 40K
High Range 100K

Now I don't want 60K in records but pick 6 values and show me how they need to be grouped in a list format so it is easier to read.

Please,

Thanks


Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
I hope this was what you were after?

The two TopSlice values to remove for further analysis are (for example) Any claims over 40000, and any claims over 100000

ClaimNumber; ClaimValue
1 ; 42000
2 ; 12000
3 ; 116000
4 ; 58000
5 ; 3000
6 ; 102000

Total claim values are therefore (42000 + 12000 + 116000 + 58000 + 3000 + 102000 = 333000)

Claims over 40K to be removed from 333000 in further analysis are (42000 + 116000 + 58000 + 102000 = 318000)

Claims over 100K to be removed from 330000 in further analysis are (116000 + 102000 = 218000)

Are you suggesting I create a table/ view which lists only the ClaimValues (and perhaps a key such as ClaimNumber), and create seperate columns for each TopSlice criteria, specifying whether they fall into that TopSlice or not, example...

ClaimNumber; ClaimValue; >40K; >100K
1 ; 42000; YES; NO
2 ; 12000; NO; NO
3 ; 116000; YES; YES
4 ; 58000; YES; NO
5 ; 3000; NO; NO
6 ; 102000; YES; YES

...and then to use this in conjunction with a dimension containing merely YES and NO members?

EO
Hertfordshire, England
 
I'm thinking more of a table that lists at the top level the upper values of your ranges defined by the business requirements. If you have a secondary level you would include that into the table, I probably would as a just in case and a column with the leaf value. Although not bes practice you could use the leaf value as the key Since 100 will always be 100 and has no SCD change possibilities.

Code:
Value     MajorGrouping
=====     =============
3000      3000
....      ....
3999      3000
4000      4000
4001      4000
....      ....
4999      4000
5000      5000
....      ....
5500      5000
5999      5000
....      ....
9999      9000
10000     10000
.....     .....
10900     10000

If you had a sub grouping
Code:
Value     MajorGrouping  SubGrouping
=====     =============  ===========
3000      3000           3000
....      ....           ....
3999      3000           3900
4000      4000           ....
4001      4000           ....
....      ....           ....
4999      4000           ....
5000      5000           ....
....      ....           ....
5500      5000           5500
5999      5000           ....
....      ....           ....
9999      9000           9900
10000     10000          ....
.....     .....          ....
10900     10000          10900

I under stand the math of what you are trying to do so we don't need to focus on that what the focus needs to be is building the structure to support the business needs. You will find that when building SSAS cubes it is the 80/20 rule.

80% of your time will be spent planing and designing your cube and the support objects.
20% actually building the cube.

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
Hi Paul,

I am still missing the essense of what you are asking for. My confusion lies with the MajorGrouping and SubGrouping. Do they relate in any way to what I term the TopSlicer (as the Top Slice can not have an upper limit, only a lower limit)?

I proposed to suggest some ClaimValues, would it be possible to define the Major and SubGroupings on this sample set? Note we will assume the two Top Slices are at 40K and 100K

Code:
Claim Value
===========
1000
28000
64000
108000
109000
360000

EO
Hertfordshire, England
 
MajorGrouping is the Same as Top slice I understand that the goal is to have all values >= the top slice removed from the value but as you can see you can't have a dimension level that is 4K and greater that omits whose members are are all values are >= 4K which would be 4K thru infi then the same for 5K. It is much easier orginizationally to have the children on 4K be the values 4K - 4999K. I was tryning to avoid mixing the implementation and design elements hoping to do away with some possible confusion. Since we are getting to a point where the design isn't making sense to your implementation I'll go into the details a bit more.

The actual implementation of this will be a calculation that will be a measure. This will take you claims value and OMIT the values of your slicer. This is possible using built in MDX functionality. At this point the most likely will be [Dimension].[Heirarchy].CurrentMember : [Dimension].[Heirarchy].CurrentMember.LastSibling.

How this gets built in to the cube is going to be the difficult part, because as I said dimensions are typically inclusive and not exclusive by nature. So if implemented in the standard way you would get the correct value you were after but any additional dimensions in the query would produce the correct numbers but possibly the incorrect dimension members of other dimensions being shown. Depending on what tool users will use to access and interact with the cube this can be controlled.

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
Hi,

So then the sample data would look like:

Code:
Claim Value MajorGrp
=========== ========
1000        0 
28000       0
64000       40000
108000      100000
109000      100000
360000      100000

That implying that the value 1000 and 28000 is less than the smalles TopSlice of 40000, only 64000 falls in that MajorGroup and the rest in the 100000 MajorGroup...?

EO
Hertfordshire, England
 
I have now created a VERY simple cube with such sample data, and a dimension with Member values 0,40000 and 1000000

But you are correct, as dimensions are conventionally inclusive, I yet have NO IDEA how to do this?

EO
Hertfordshire, England
 
This is the solution I came up with:

I created Categories on the SQL end to flag (using 1 or 0) whether a ClaimsValue is considered above, or below the top slice (one category column for each top slice required).

This creates a FactTable with an additional 3 columns (1 per top slice @ >100, >250, and >500) that shows the ClaimValues that must be excluded as 0 and those remaining to be included as 1).

I then created a new dimension with only two rows (Key 0 and 1; and Descriptive fields "Exclude from top slice", and "Include from Top Slice").

This new dimension then links to each Category column in the fact, giving the user the ability to slice the data, and exclude ClaimValues that fall above the selected top slice.

The down side to this is that it adds three columns to the fact table, but they are very wide tables with many columns anyway, so 3 columns contraining only 1 or 0 integers will not have a significant down side.

The increase of intersections by which the cube can be sliced could also have a negative impact, but each new dimension only has two members (1 and 0) so not many new intersections in the grand scheme of things.

The upside is that no additional measures are required as the slicers will work with all existing measures, and it is not affected by slowly changing dimensions and fact history.

I hope this is an appropriate solution, further testing will tell.

Thanks for making me think about this one...

[2thumbsup]



EO
Hertfordshire, England
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top