Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

OLTP or OLAP

Status
Not open for further replies.

joacker

Programmer
Sep 26, 2008
22
0
0
AT
As Paul helped me a lot to understand OLAP i think i finally got it now (with much reading too).

But i am at a deciding point where i have to know: is olap actually the right system to use here?

There is the following situation:
I have a table (~400.000 rows) (no data mart or data warehouse) with the following information:

ID HOMEPAGE DATE_ARRIVAL TIME_ARRIVAL DATE_LEAVING TIME_LEAVING

-> This table shows the user-visits on a specific homepage (when they arrive and leave) and is not changeable.

So the mission is to analyse this. For example by drawing a diagram at the front-end that shows the visits during the day xy on homepage z: Then there is an x-axis (Time) and an y-axis (Visitors). As there are many visitors on one day i have to cluster them into groups so that i can draw a nice graph. Let's say i cluster them every half hour. So i sum up all visitors that are on the page from 15:00 to 15:30.

Now i can do many things with this information for example I can draw the same diagram also with the informations of a week/month/year where i have to use an average value for the clusters (sum(visits)/days) . Then there is also the possibility to distinguish the graph between months/years, homepages and so on.

As there are just monthly updates on the original table and i need a good response rate (< 5 sec) i thought OLAP will be the way to go. But the dimensions Homepage, Time, Date lead me to a fact table with 30 millions rows.

Do you agree or disagree on using OLAP here?
 
Sounds like you are trying to do click stream analysis. Am I correct?

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
Hello Paul,

yes you are correct. It is a part of a click stream analysis. Fortunatly i just have to focus on the things i wrote in the first post.
 
I think with your data size the real questions to ask regarding what method is going to be best.

1) Growth, will the data volume grow? will the historic period grow? I don't know anything about your system but people some times look at period on period growth so if you have 1 month is there a possibility someone will say what was this value last month? last year? Growth can also be the usability of the system. When you produce a nice well maintained system that works as people need it to typically they start to see the potential and new functionality is requested and such.

2) User interation. Will users be doing a lot of AD-Hoc queries or will it be a lot of prebuilt reports that get delivered. Again I don't know your data but the sizes you have stated doesn't sound like a complex system.

I've never tried to build a cube in EXCEL although I have used excel many times as an interface to a cube.

With a well built star schema you can probably support your needs without a Cube.

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
Answer to Question 1: The data volume will not grow but it will change. Which means for a new month an old month has to go (For example: Data of Nov 2008 comes - Data of Nov 2005 goes). Otherwise i think the data would become too large for a fast queries.

Answer to Question 2: Just a few users will use this system infrequently during a week. Which means there will be some AD-Hoc queries but the queries will always be the same, just the variables change (date, homepage, time).

Again, thx Paul for your expert-help. I really appreciate your information.
 
I would do this in the relational star schema and allow excel pivot tables to handle the ad-hoc. If you design it correctly you should have very few issues with query performance, the key is going to be correctly indexing your system, Remember OLTP and OLAP indexing strategies are nothing a like as the desired goals are really opposites of one another.

Now if you wanted to store history then I would suggest a MOLAP tool. A MOLAP system such as SSAS can easily handle volumes of data. My frst OLAP project was 9 billion fact records the cube was over 250GB and all but 5 very complex queries had sub-second response times.

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
thx Paul :)

so i will try it with a relational star schema first and if that doesn't work properly (response time) i will switch to OLAP.
 
Good luck!

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top