Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Category Count

Status
Not open for further replies.

Pr07

MIS
Aug 5, 2003
10
CA
Hello Everyone,

I need a measures to give distinct counts over various dimensions. The column(lets call it col_id) on which the counts are based on is not required as a dimension/level. But I embed it as a level only for using category counts and later suppress that complete category. Also, because there is no direct Activity Measure associated with this, I create a dummy column in IMR/IQD to hold constant value '1'.
Now the problem is that this col_id field has over 200k values and it grows rapidly. So the cube generation is slow and also violates the category ratio rule. Is there any efficient way of creating this distinct count.
(I went through the archives and there were suggestions of doing this is through Impromptu; if thats the answer, please explain how?)

Regards,
Prashar
 
Hi Pr07,

I'm experiencing the same problem and it looks like that using a category count measure to distinctly count the number of categories that exceed 65,536 in a dimension will not work. Even worse for my case, I have to distinctly count over 1M categories (distinct customers) in a customer dimension and even if I suppress the customer_id level which is the basis of my category count measure, transformer says that the children of a category count measure cannot exceed 65,536. I suggest that you create separate cubes based on rolled-up transactions for the distinct column you want to measure. I know this is not the best way to do it but I'm running out of options. Hope someone else can helps us though :p
 
Could you gentlemen give a layout of your dimension map plan and the measures. Basically category counts are easy to establish by simply assigning a count of "1" in the underlying IMR.

Example

Country Region State Customer Customer Count

USA West CA ABC 1
USA East NJ XYZ 1
USA East NC DEF 1

In this instance, Country, Region and State would form the dimensions and "Customer Count" will be the measure. "Customer" per se will NOT be part of your dimension plan or a level in any other dimension.

The measure "Customer Count" will give you the category counts for any dimesion depending upon the level of rollup being viewed.

I am sure you would have known all this. Perhaps I am wrong in assessing you problem. Let me know.
 
Thank you guys for your input.
Correct me if I am wrong but I think the solution mentioned above will work( without the need of category count) if the data source is specifically customer file. But things are different if the data source(IQD) is in the form of fact table. The col_count(field containing 1s) will not return the distinct count and thus show higher value than the category count i.e. by putting the col_id field as a level or dimension.
I apologize for missing this point previously. Any help would be greatly appreciated.

regards
 
There are many ways to achieve a distinct count.

1) Select the "Eliminate Duplicate Rows" in the filter tab of the report, if your report does not have FACTs but only dimensional data.

2) Use Count(Distinct Cust_ID) in the count column grouped at the lowest level.

3) Use the Running-Count summary function to mark only the first line of the row with "1".

if ( Running-Count ( Cust_ID ) for Country, State, Region ) = 1 ) then ( 1 ) else ( 0 )

Hope that helps.
 
Hi Nagrajm: Can we also use the running count for our situation. Right now, we use category count on Case_id in a similar manner as in this problem.
We are trying to build a cube with data from two tables sharing Master-Detail relationship. The master table stores the static information like Case_type, Received_date, Origin etc. about 'Cases' which go through different phases. The changes in the phase along with Date_Phase_change are stored in the child table.

Dimensions:
*Time: Year- Quarter- Month
* Case Type: Various Types
* Case Phase:Various Phases( Open,... Trial,... Closed etc.) (sequences of these phases is not fixed)
*Origin: State- Dist- City
*Zone: Zone- Sector (another geographical location totally independent from Origin)

The measures requested is:
*Number of Cases ( for any combinations of the dimension selection)

We build a staging table(source for IQD) from the above two tables.The catch is if Case is "Opened" in Mar 2002 and "Trial" in Jun 2002 : it should be counted under "Opened" in Mar,Apr & May 2002 and "Trial" from Jun 2002 till date until the file is closed. We generate extra records in the staging table to show the phase Case is in for intermediate months.

Thanks in advance

 
Hi Nchd15

Ordinarily this report is simple and bookish except when you consider the catch

"The catch is if Case is "Opened" in Mar 2002 and "Trial" in Jun 2002 : it should be counted under "Opened" in Mar,Apr & May 2002 and "Trial" from Jun 2002 till date until the file is closed."

But when I read

"We build a staging table(source for IQD) from the above two tables......We generate extra records in the staging table to show the phase Case is in for intermediate months."

I said you had the solution figured. This is the only possible way, in my opinion, to count those cases during the intermediate months. Running-Count will not give you the fuctionality to achieve this.

Does that answer your question?
 
nagrajm,

In one of your earlier posts, you suggest the following:
2) Use Count(Distinct Cust_ID) in the count column grouped at the lowest level.

I have a similiar situation as is being discussed in this thread and I have tried doing a count(distinct ss_num) in Impromptu when I create the IQD. But when I use that column in Transformer as a measure, I do not get a distinct count. It counts all the occurrances. What am I doing wrong?

Neal
 
The way I would do this, although indirect:
pick one field that has a decent spread in the group, like city or customer manager, and break the cube on that field, via a cube group . This will create several cubes, identical in every way, except for that one field (and obviously the measure values). You'll be able to keep your category count, and it's a great selling point to the users that a cube is split on some logical pivotal piece of data. Look under the options in Transformer Admin for creating a cube, or search the documentation for 'cube group'.
 
Hi Neal

Does the IMR return distinct values correctly?

Else, try the Running-Count function to mark the first occurance of ss_num with 1 and other occurances with 0. This way you will avoid counting multiple occurances.

Nagraj
 
Hi !
I also have the same problem!
My solution is to use external meatures. I build a special table only for this meature, where I calculate the distinct count of customers directly by the oracle database. There ist a special function, called CUBE(), to calculate all combinations of different columns.

Example:
select count(distinct cust_id),sex,region
from table1 group by cube(sex,region)

It will generate you a list of all values and all combinations of sex and region.
So my solution is not the simplist one, but the cube is very small and fast, also for more then 1 Million customers/lines to count !!
 
Nagrajm's response from Aug 7, # 3), is the best method I've found so far.
We "count" the number of specific dimensions on several levels. This is kinda complex and
limited to only the way the "count" 1's are placed, especially in a CUBE. Because you can't drill deeper than
what you calculated the "count" 1's or you risk losing the 1's in their correct place.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top