Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Automatic aggregation design query based optmization wizard/query log 1

Status
Not open for further replies.

unclejimbob

Technical User
Oct 12, 2005
128
AU
I have a number of clients at different physical locations, all of whom use the same set of cubes but very likely for different reasons. I initially set up each cube's aggregation based upon what I thought were the most appropriate aggregation usages, i.e. each dimension used in the cube would have the aggregation usage property set to Bottom Level Only, Top Level Only etc. This resulted in the Query Optimization Wizard deciding that no aggregations were necessary - oops. Now I think that I would like to set the aggregation usage to Standard and see if it is possible to use the Query Optimization Usage/Query Log information to automatically derive the best aggregation decisions given that each site may have widely differing reporting/querying requirements. Has anyone tried to do something similar to this, and if so, what detailed reference material is there out there on how to go about it ?
 
the main thing when preparing to use usage based optimization is to lower the number in the query log. by deafult I believe it is set to 10 meaning every 10th query gets logged , reducing this number to 1 for a period will ensure that a queries are logged and therefore you will have a larger base for the optimization engine to work with.


As for setting the aggregation properties, this is something I really advise against doing in most cases, there are exceptions the main exception being if you have a level in a heirarchy that has no business meaning. For example say you just has a list of last names and you build a dimension but use the first letter or 3 letters to build a heirarchy where you can alphabetize the names. since a level of A,b,c,d really servers no analytic purpose you would want to exclude this from aggregations.

Good Luck

Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
Yes, thanks MDXer I have followed whats in the MSAS Performance Guide as far as agg design goes - and pretty much wound up with what I expected. As a first step as I really have no idea how the information is actually being used I guess I could get some of my sites to turn logging on for a certain period (and in my case I am fortunate that this will not adversely impact performance) and get them to send the logged results to me - so I can then build some aggregations and see how it pans out.

I think the table I need is querylog so I am going to try and see what I can do with it.
 
I good rule of thumb I was taught is to agg by MB and to start at 2-3 times the datasize. So if you look in your directories where the cube files go and your cube is 50MB then set your aggs to 100 to 150 MB.

Another factor in how your aggs perform is if you have any distinct count measures and if so how you implemented them.

The agg engine works kind of like this

3 dimensions 3 levels each

1 1 1 All Level
1 2 1
1 3 1
1 1 2
1 1 3
2 1 1
2 2 1
2 3 1
..
3 3 3 Leaf Level

There are tools which will allow you to go in and define custom aggs by allowing you to create the levels required, one of these is a small app called partition manager. You can also write a small app using vb and dso that will allow you to do this.

The way the usage based optimizer works is it logs not the actual query but the levels used within a query and it's execution time. it then sees if any of these logged executions meet or exceed your threshold.

A thing to remember is that in AS your queries can cover eachother if you do a query for:

[DATE].[2004].[Jan]
[DATE].[2004].[Feb]
[DATE].[2004].[Mar]

then someone else comes along and wants to compare jan 05 with jan 04 it will used the cached jan 04 if it still exists in cache.

Setting a default set of Aggs is a good start but I would advise trying to have all customers with the same agg definitions. Aggs are something that need to be tailored to a specific enviroment and that enviroments data volume and reporting needs.

Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
Yup, I have viewed the output of the query usage log and understand how it works. Yes, I currently use partition manager, all sites use time dimension so I have 'eliminated' this dimension by partioning along the time dimension using MSAS and then I've had a look at how the partition manager works and I think I've got a handle on how it can be put to good use.

"Setting a default set of Aggs is a good start but I would advise trying to have all customers with the same agg definitions. "
Yes, very good point, they are all in the same line of business but haev subtle reporting differences.

Have a star for your efforts.
 
Thanks. Glad I could help and good luck!

Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top