Category Count

Pr07 · Aug 6, 2003

Hello Everyone,

I need a measures to give distinct counts over various dimensions. The column(lets call it col_id) on which the counts are based on is not required as a dimension/level. But I embed it as a level only for using category counts and later suppress that complete category. Also, because there is no direct Activity Measure associated with this, I create a dummy column in IMR/IQD to hold constant value '1'.
Now the problem is that this col_id field has over 200k values and it grows rapidly. So the cube generation is slow and also violates the category ratio rule. Is there any efficient way of creating this distinct count.
(I went through the archives and there were suggestions of doing this is through Impromptu; if thats the answer, please explain how?)

Regards,
Prashar

carlodeleon · Aug 6, 2003

Hi Pr07,

I'm experiencing the same problem and it looks like that using a category count measure to distinctly count the number of categories that exceed 65,536 in a dimension will not work. Even worse for my case, I have to distinctly count over 1M categories (distinct customers) in a customer dimension and even if I suppress the customer_id level which is the basis of my category count measure, transformer says that the children of a category count measure cannot exceed 65,536. I suggest that you create separate cubes based on rolled-up transactions for the distinct column you want to measure. I know this is not the best way to do it but I'm running out of options. Hope someone else can helps us though

nagrajm · Aug 6, 2003

Could you gentlemen give a layout of your dimension map plan and the measures. Basically category counts are easy to establish by simply assigning a count of "1" in the underlying IMR.

Example

Country Region State Customer Customer Count

USA West CA ABC 1
USA East NJ XYZ 1
USA East NC DEF 1

In this instance, Country, Region and State would form the dimensions and "Customer Count" will be the measure. "Customer" per se will NOT be part of your dimension plan or a level in any other dimension.

The measure "Customer Count" will give you the category counts for any dimesion depending upon the level of rollup being viewed.

I am sure you would have known all this. Perhaps I am wrong in assessing you problem. Let me know.

Pr07 · Aug 7, 2003

Thank you guys for your input.
Correct me if I am wrong but I think the solution mentioned above will work( without the need of category count) if the data source is specifically customer file. But things are different if the data source(IQD) is in the form of fact table. The col_count(field containing 1s) will not return the distinct count and thus show higher value than the category count i.e. by putting the col_id field as a level or dimension.
I apologize for missing this point previously. Any help would be greatly appreciated.

regards

nagrajm · Aug 7, 2003

There are many ways to achieve a distinct count.

1) Select the "Eliminate Duplicate Rows" in the filter tab of the report, if your report does not have FACTs but only dimensional data.

2) Use Count(Distinct Cust_ID) in the count column grouped at the lowest level.

3) Use the Running-Count summary function to mark only the first line of the row with "1".

if ( Running-Count ( Cust_ID ) for Country, State, Region ) = 1 ) then ( 1 ) else ( 0 )

Hope that helps.

Nchd15 · Aug 7, 2003

Hi Nagrajm: Can we also use the running count for our situation. Right now, we use category count on Case_id in a similar manner as in this problem.
We are trying to build a cube with data from two tables sharing Master-Detail relationship. The master table stores the static information like Case_type, Received_date, Origin etc. about 'Cases' which go through different phases. The changes in the phase along with Date_Phase_change are stored in the child table.

Dimensions:
*Time: Year- Quarter- Month
* Case Type: Various Types
* Case Phase:Various Phases( Open,... Trial,... Closed etc.) (sequences of these phases is not fixed)
*Origin: State- Dist- City
*Zone: Zone- Sector (another geographical location totally independent from Origin)

The measures requested is:
*Number of Cases ( for any combinations of the dimension selection)

We build a staging table(source for IQD) from the above two tables.The catch is if Case is "Opened" in Mar 2002 and "Trial" in Jun 2002 : it should be counted under "Opened" in Mar,Apr & May 2002 and "Trial" from Jun 2002 till date until the file is closed. We generate extra records in the staging table to show the phase Case is in for intermediate months.

Thanks in advance

nagrajm · Aug 8, 2003

Hi Nchd15

Ordinarily this report is simple and bookish except when you consider the catch

"The catch is if Case is "Opened" in Mar 2002 and "Trial" in Jun 2002 : it should be counted under "Opened" in Mar,Apr & May 2002 and "Trial" from Jun 2002 till date until the file is closed."

But when I read

"We build a staging table(source for IQD) from the above two tables......We generate extra records in the staging table to show the phase Case is in for intermediate months."

I said you had the solution figured. This is the only possible way, in my opinion, to count those cases during the intermediate months. Running-Count will not give you the fuctionality to achieve this.

Does that answer your question?

leathern · Aug 21, 2003

nagrajm,

In one of your earlier posts, you suggest the following:
2) Use Count(Distinct Cust_ID) in the count column grouped at the lowest level.

I have a similiar situation as is being discussed in this thread and I have tried doing a count(distinct ss_num) in Impromptu when I create the IQD. But when I use that column in Transformer as a measure, I do not get a distinct count. It counts all the occurrances. What am I doing wrong?

Neal

benbotektips · Aug 21, 2003

The way I would do this, although indirect:
pick one field that has a decent spread in the group, like city or customer manager, and break the cube on that field, via a cube group . This will create several cubes, identical in every way, except for that one field (and obviously the measure values). You'll be able to keep your category count, and it's a great selling point to the users that a cube is split on some logical pivotal piece of data. Look under the options in Transformer Admin for creating a cube, or search the documentation for 'cube group'.

nagrajm · Aug 22, 2003

Hi Neal

Does the IMR return distinct values correctly?

Else, try the Running-Count function to mark the first occurance of ss_num with 1 and other occurances with 0. This way you will avoid counting multiple occurances.

Nagraj

mzahl · Sep 14, 2003

Hi !
I also have the same problem!
My solution is to use external meatures. I build a special table only for this meature, where I calculate the distinct count of customers directly by the oracle database. There ist a special function, called CUBE(), to calculate all combinations of different columns.

Example:
select count(distinct cust_id),sex,region
from table1 group by cube(sex,region)

It will generate you a list of all values and all combinations of sex and region.
So my solution is not the simplist one, but the cube is very small and fast, also for more then 1 Million customers/lines to count !!

sln007 · Sep 24, 2003

Nagrajm's response from Aug 7, # 3), is the best method I've found so far.
We "count" the number of specific dimensions on several levels. This is kinda complex and
limited to only the way the "count" 1's are placed, especially in a CUBE. Because you can't drill deeper than
what you calculated the "count" 1's or you risk losing the 1's in their correct place.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Category Count

Pr07

MIS

carlodeleon

MIS

nagrajm

MIS

Pr07

MIS

nagrajm

MIS

Nchd15

Programmer

nagrajm

MIS

leathern

Programmer

benbotektips

MIS

nagrajm

MIS

mzahl

Programmer

sln007

MIS

Similar threads

Part and Inventory Search

Sponsor