Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Can OLAP provide this?

Status
Not open for further replies.

joacker

Programmer
Sep 26, 2008
22
0
0
AT
We are thinking about using OLAP for our project, therefore here is a question we need to answer: Can OLAP handle these requirements?

We need 3 dimensions (axis):
-Organisation (~300)
-Time1 (365 days)
-Time2 (24 Hours)

Questions:
1) Can a cell hold more data than one? For example can i save in orga A, Time1 B and Time2C the values a and b?

2) Is there no problem to have such a high number of categories on an axis (for example 365 days)?

3) If the answer for 2) is no. Is it also realistic to have 8760 (days * hours) on an axis? Hierarchy on this dimension is then: hours - days - weeks - months - year

Thanks for any help.
 

project0708 said:
1) Can a cell hold more data than one? For example can i save in orga A, Time1 B and Time2C the values a and b?

The answer to the first part is NO but reading the second part I believe you may be a bit confused on how an OLAP Cube operates. The data is stored at a specified grain along your various dimensions. If the date grain is the day level and you have a hierarchy that is Year / Month / Day the the hierarchical relationship will handle the rollup.

Where your dimensions intersect is up to you and enforced in the design of your star schema. If you maintain a Dimensional relationship to a dimension in your fact table then you will be able to slice, dice and filter your data by these dimensions. Multi Dimensional means that yuo can infact analyze by more than 1 diemnsion or dimension attribute at a time.

2) Is there no problem to have such a high number of categories on an axis (for example 365 days)?

365 members is a small dimension large dimensions run in to the Millions.

Is this to much to display at any one time? probably yes. This is the reason for constructing your hierarchies. Fortunately date has a couple natuural hierarchies and is a fairly easy one to build.

3) If the answer for 2) is no. Is it also realistic to have 8760 (days * hours) on an axis? Hierarchy on this dimension is then: hours - days - weeks - months - year

This comes down to design. I am a beliver in that Date and Times are 2 seperate areas of analysis. If Hour is a child of Time then it is hard to trend say sales across time if you know mdx and the reporting tool it is possible but for the non olad mdx savvy it is difficult to say show me AVG Sales for 12pm for all days. This is because that you have a 12pm for every day.

Now Seperate out your Date and Time into 2 dimensions and the ability for this type of analysis becomes much easier and your underlying data is reduced in that you only have 24 Time members as opposed 24* Number of days.

Take some time and research OLAP design concepts and methodologies. Remember when you design you build Dimensions, heirarchies, facts and cubes. Axis is an element of the built cube and what is placed on an axis is determined by the user and changes frequently on the fly.

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
thx for your answer Paul. I followed your advice and now i have some new questions:

To build the cube in the SQL Business Intelligence Studio i need 1-5 "fact tables" and n "dimension tables". From that tables i choose the measures and the dimensions.

So the question is: Does this mean i have to create fact & dimension tables when i update the data (let's say 1x week). For example when i update i have a huge table with many values in it like time or region. Now i create the dimension tables (time, region) and create a fact table where i put the relations in. Is this the right way to go?


Unfortunatly the performance questions are still left:

2) Is there no problem to have such a high number of members on a dimension (for example 365 days)? (After all i read i would say no here)

3) If the answer for 2) is no. Is it also realistic to have 8760 (days * hours) members on a dimension? Hierarchy on this dimension is then: hours - days - weeks - months - year.

thx for answers
 
Typically you will create seperate Fact and Dimension tables in your Data Warehouse or Datamart. The population of these tables would be part of your ETL process. You ETL process needs to be well thought out and handle Slowly Changing Dimension values, Duplicate data, early arriving facts, orphaned data to name but a few.

365 Dimension members is a srop in the hat. Very Large Dimensions are typically in the 100s of thousands if not million members. Remember members in this context are leaf members for example Day. All other elements are Attributes of your leaf member even if they are members of a different level or hierarchy.

Again in regards to placing time and date within the same dimension. I am not a fan of this and NEVER do it. I will spend hours if not days teaching a client why sperating them can accomplish what they believe they are after and can't get with a seperate dimension for time, alog with improved usability.

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
Paul, let me say a big thx for your help here. You answered some questions i couldn't find in the books and helped me a LOT with your professional answers.

Also i have to apologize. I realized today that you answered my questions in a quoted text (post 2). I am used to the style that quoted text = previous posted text, therefore i have ingonred it. So let me say sorry that i asked the questions twice.

But another question i am thinking about is: For my dirty design i calculated a cube size of 18 GB if all cells are filled (elements of dimension1 x elements of dimension2 x ..... x sizeof(int)). So it is possible to work with online requests (answer time not more than 5 seconds) on a 18 GB cube? (on an average server with 8 gb ram)
 
What you are asking is POSSIBLE but may not be diable if this makes sense. It all depends on your design for a cube of this size you want your design to be spot on to meet query requirements. With that said let me also say I have worked on cubes that were in the 250GB range and many queries were sub-second with some really complex one taking 3-5 minutes but the underlying data was 9 BILLION transactions and the queries were like give me the top or bottom n based on 3 specific very large dimensions.

When calculating your size remember your cube will typically be a fraction of your DW or DM. The reason for this is that SSAS does not store anything at a NULL datapoint. Adding to this as the closer you get to the leaf level of the data the more sparse the data becomes. Yes you will have data somewhere at the leaf level but TYPICALY not every dimension intersects every fact. So in the example of sales Not every person will buy every product for every day. Now some people will by some product every day.

One key to reducing query time is to use as few calculations as possible in the cube so rather than saying Price * QTY - COST = Profit in the cube you can calculate it prior to loadin in the cube because it will be that value regardless of dimensional intersects.

Another item that will help with not only query times but also in processing your cube is your partitioning strategy. And how you handle the connections between your partitions and fact tables. I prefer to use views that have the criteria to limit the data to just the data that should be in that partition. This would allow you to place the profit calculation in the view and it would not have to calculate when queried from the cube. This however introduces an overhead while processing the cube, so why not push this calc to the etl instead.

As for hardware it also depends. Are you going to run the whole Solution on one server? DW, ETL, OLAP and Reporting? If so then this server would be completely overwhelmed resource wise. Standard practice is to split your SSAS out onto a different server, which does get expensive. That said SSAS like SQL Server is a major resource hog, especially memory. SSAS Has an excellent caching engine and well cache as much as possible, in theroy you could put a 2GB cube on your 8 GB server and the whole cube could exist in memeory. When it comes to SSAS I will always sacrafice CPU for more RAM. I would say 8GB might be a place to start but have enough in your budget to go 16 or even 32 based upon the number of users. For CPU servers are becomming more affordable so 64bit is my only choice.



Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
Sorry I also forgot to mention that when calculating storage remember that your leaf level is an aggregate value so if your grain is Date, Customer, Product and I went to the store 5 times and bought the same item say Beer (it was a rough day at work) then I would not be storing 5 records at the leaf but rather a single record.

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top