Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Distinct Count in PowerPlay Transformer

Status
Not open for further replies.

monsky

Programmer
Jul 3, 2002
89
0
0
AP
Several users have attempted to find a solution on this "classic" problem. So here is another attempt hoping someone would be able to address this finally.

I have a transaction file composed of agency no., agent no., and policy.

Agency No. Agent No. Policy No.
001 001 000001
001 002 000001
002 003 000002
002 004 000002

Agency No and Agent No are my dimensions. I use Policy No for my measure in counting the policies for each of the dimension. I need to count the policy only once dynamically for a dimension.

For example:

Agency No. Policy Count
001 1
002 1

Agent No. Policy Count
001 1
002 1
003 1
004 1

Total Policy Count
2


Using Category Count to create the Policy Count may pose problems. I was hoping that Tranformer has a roll-up function for a distinct count but when I checked it I did not find anything.
 
AFAIK Category Count is the only way to get a dynamic distinct count. What problems are you concerned it will pose?

I am what I am based on the decisions I have made.

DoubleD [bigcheeks]
 
Thanks for the reply DoubleD. Let me give more details:

I modified this and created a level hierarcy:

Level
------
Territory
Region
Sector
Agency
Agent Class
Agent No
Policy No


Measure
Policy Count (Category Count)

I've read somewhere that Cognos Transformer has an issue when it comes to one to many relationships when doing distinct count. Our data is a many-to-many relationship that is a policy can be shared among several agents and an agent can have several policies. I tried doign the category count but the calculation has errors in it.

I'm getting my data using an IQD file. Basically it consists of the following columns.

Territory
Region
Sector
Agency
Agent Class
Agent No
Policy No

I'd appreciate any suggestions you may provide.
 
Since the category count needs a unique field to count, and that the inclusion of the agent level prevents the policy number level being unique, you will not be able to do it with the hierarchy you propose and category count.

A concatented level of agent no. and policy no. should, on the data supplied, provide a unique field for counting, but give erroneous counts for higher levels.

I suspect you'll either have to add a database views to get the distinct policy count per level or consider the addition of another dimension.


soi la, soi carre
 
drlex is right.
If a Policy can apply to multiple agents, and an agent can have multiple policies. Then Policy No. does not belong in the level you show above. It needs to be a separate dimension. Then discount counts on Policy No. will give you accurate counts for any level of the dimenson you listed above (minus Policy No.).

I am what I am based on the decisions I have made.

DoubleD [bigcheeks]
 
Therefore, I cannot drill down to an agent level to show how policies are under him?

The solution I'm thinking right now is to create a drill-through to an Impromptu report showing the agent detail and count distinct on the policies.
 
You could drill down, but the distinct count could neither be set, nor be correct in higher levels. Your plan for a drill-through sounds sensible to keep model layout and user requirement.

soi la, soi carre
 
If you make PolicyNo it's own dimension, and keep your Disctinct Category Count measure, you can show accurate distinct counts of Policies for any one of your other levels.

This will increase the size of your MDC though, since your PolicyNo dimension will consist of every single Policy. But it gets you the numbers you're asking for.

I am what I am based on the decisions I have made.

DoubleD [bigcheeks]
 
DoubleD,

Wouldn't making PolicyNo it's own dimension without any previous level exceed the 65,000 row category limit? We have over 1 million policies.
 
OH! That is a problem. Looks like you'll need to come up with something a little more creative.
Well, it would still make your MDC huge, but you could categorize your policies. Although with 1 million policies, I don't know that I would suggest this solution anymore, but...
Let's say your Policy Number always starts with a letter of the alphabet. If you added a level to your Policy dimension that consisted of the first letter of the Policy #, then 1,000,000/26 = 38,462.

Another option would be to calculate your distinct counts in your IQD by each level, and have each distinct count as a separate measure. The problem with this is that you're trusting the user to always look at the appropriate measure, and I hate putting users in that situation.

Sorry, no brilliant ideas at this point. I'll let you know if I come up with something.

I am what I am based on the decisions I have made.

DoubleD [bigcheeks]
 
I plan to get the first five digits to create a policy group then place the policy no under this level.
 
...or maybe I should I used six since our policy numbers consist of 10 digits. This should limit my categories on each group at four digit length which should be at 9999 per policy group. Using only five would still allow up to 99999.

Thanks guys for the inputs. If you still have any suggestion please let me know.
 
You wrote: Agency No and Agent No are my dimensions. If I take that as that they are different dimensions (lets say they have a n..n relation) and not two levels in single hierarcy, it could be thought that here there is 3 different measures: Agency Policy Count, Agent PC and Enterprise PC(each policy counted once). Each which would have their own .iqds (just counts, sql shouldn't take too long, if you have a flat file then this is "not possible"). Measures would be allocated as constanst for the non-relating dimensions. It must be noticed Agent PC doesn't change, when you filter on Agency. I know this might sound a bit naive approach and maybe does not apply in your situation.

Cheers
 
Part two

I am myself still trying find solutions to this classic problem. I just think monsky, you might be running into performance issues if you are going to have 1+ million categories on a level even if you have a "help level" in between ... PowerPlay (or any OLAP) is not designed to that kind of approach. But if you get it running that's great.

Cheers
 
What if we precalculate this distinct count in SQL and add this as externally rolled-up measure.

This should save us space.

On the other hand if you are going to have drill-down to policy no level, the cube is going to be big anyway.

By the way, does anyone know whether we have the possibility to "hide" the dimension used only to store categories used by category counts? I am not sure whether "omit dimansion" in cube properties will work.

Rgds,

seweryng
 
seweryng,
I've never been able to find a way to Omit Dimension on a dimension used for category count. I'm not saying it can't be done, I've just never found a way to do it.

I am what I am based on the decisions I have made.

DoubleD [bigcheeks]
 
I think you cant omit the dimension totally, but you can right-click suppress on the level. This though leaves the dimension folder visible in the cube, but it shows no categories. Users have to make understand there is nothing to filter with that dimension, which could be named something like Number of customers to make it sound like a title or a measure.

Back to the original problem, external rollups is an alternative, but I understand then you have to produce rollups to all levels.

One essential question is that where in your hierarchy in your data the participation to a policy turns "unique". In the following example, one policy can be handled by many agent in a single agency. But is a policy unique concerning agencies, in other words, can one policy be handled only in one agency? If so and taking the assumption that one agent works for only one agency, then an agency's policy count can be rolledup (sum) to upper levels, in this example sector. Then the only problem lies between agent and agency.

Sector
Agency
Agent No
Policy No

One possible solution to that what I have used, is to calculate agency count and agent count numbers separately as I presented in my earlier message. We can let them normally rollup if we add a measure which subtracts agent PCs from corresponding agency PCs, which gives as a result the agency PC for that and upper levels(see above). Sounds a bit complicated but it is not really that difficult.

Here is how to do that: Make a level between agency and agent, with suppressing blank categories. Then bring with a another query the sums of Agent PC but not with agent numbers but the corresponding agency numbers. This query must also have column with blank values (can do in transformer or already in iqd), which is the source for that extra level. Now you have three measures and a new calculated measure = Agent PC + Agency PC - extra level PC. Of course you exclude those three from cube leaving only the calculated one. The intermediate level wont be visible in the cube at all.


Cheers
 
Hi guys,

Sorry, I wasn't able to reply to any of these.

yksvaan,

Yes, performance IS an issue. With just over 60,000 policies, loading of the cube takes time not to mention that size is already at 24Mb. I assumed that loading over 1million will take about about 800Mb! Actually, the "help level" solves the 65,000 limit problem however it contributes to the enormous size of the cube.

What I did was create separate dimension for the policies. So here how it looks like:


Levels Policy Age Group Policy Group

Territory Policy Age Policy Group (help level)
Region Policies
Sector
Agency
Agent Class
Agent

Do you suggest that I put the Policy Group under the Levels dimension directly beneath Agent?

Policies don't cross agency boundaries so a policy should belong to only one agency.
 
Sorry, maybe I wasn't clear enough that my suggestion has nothing to do with other persons suggestions in this thread, by this I do not mean that they are bad or anything.

I thought that you actually don't need single policies in your cube (meaning not having a level Policy nor Policy Group level) but you just want to get the counts. My idea is based on that and that you do the counting with seperate queries, and of course then you don't have category count-based measure at all.

My idea was to put intermediate level above Agent level, but the level Agent Class is not clear to me. If Agent 001 and 002 are of different agent class (lets say A and B), how do you want to handle policy 000001.
A count of policies 1
B count of policies 1
Or ... ?

Cheers
 
Ok, Agent Class is just a classification of an agent.

The basic idea is still there. I drill down to the agent level and show the counts of policies that belong to him/her. A policy can be shared among several agents. So your understanding is correct. If A and B are agents sharing policy 000001 drilling down to either agents will show one count of the policy.

I used the suggestion of separating the policies under a different dimension is so that I get the correct count upto agent level. And it works. What we hope to achieve is to find an alternative solution that will work the same way but keep the cube smaller and load faster.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top