Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Cube Build time

Status
Not open for further replies.

spicysudhi

Programmer
Nov 10, 2003
575
FR
Hi

I got to build a cube from a comparitively big table, around 20 million records to be processed.

Can someone suggest me the aproximate build time for this. This is just to set a benchmark. I know this depends on the no. of dimensions, measures, hardware and so on. With a normal setting and on a good hardware, wht will be the acceptable time range for this amount of data?

thanks in advance

-sudhi
 
sudhi,
There is no good way to answer your question. It is completely dependent on number of dimensions, measures, hardware. How much will you be summarizing the data. How fast will your SQL run against your database.

I am what I am based on the decisions I have made.

DoubleD [bigcheeks]
 
First, you should have a summary table/view in the database in that case that your big table is of transactional nature and/or itself is not summed up to the lowest-level of the dimensions.

I know it might not be "easy" to have dba to provide such a summary table, but at least ask for a view that does the sum up. At lest then you know the database part of the cube build is not wasting time. The view would have no-locking option so your dba would be happy too. Even greater if the view would be a so called materialized view.

Second, concerning hardware, cube build is VERY disk i/o oriented. I'll bet (if yoo don't have that oldest server in the corner of tech room) your processor and ram are sufficient. Disk systems like RAID5 are horrible for sequential writes, which the cube build is mainly about. I've seen horrible build times with otherwise up-to-date hardware because of this (write-cache even worsenes things!). If your server has a fast-write disk system (maybe just an IDE disk) with free space, you could configure Transformer to use this disk as temp disk.
But I realize you might not have any control over hardware.

What I dream of when making 1 gig cubes, is a server with 8 G ram, which of 6 G designated as good old RAM disk, and Transformer configured to use them. Build would be fastttt.

One more thing. If you a planning to have a measure with distinct count of a transactional id (eg. order number/ line number), then 20 mil recs if starting to sound quite much.

Cheers


[blue]Backup system is as good as the latest recovery[/blue]
 
yksvaan said:
...when making 1 gig cubes...
Yikes! I know most of us try to fulfil users' requests, but surely access times on such monsters must frustrate them?

As to everything else above, agreed. If you only need summary information on 20 million records, then a summary view or table is best (Having said that, I have a model with a SQL Server source of 10M records and read time is under 10 minutes); if you need transactional details, why not go for drill-through?

soi la, soi carré
 
yksvaan said:
...when making 1 gig cubes...
Ouch! I'm sure you have your reasons for building a cube of this size, but I agree with drlex.

IT needs to better understand what PowerPlay was designed to do, "Multi-dimensional Analysis". This tool was not designed to be a replacement for prompted reports. Cubes should be designed to identify trends. Then you drill through to the transactional detail.

Unfortunately its all to easy to fall into the trap of giving the end user what they ask for, instead of what they actually need to be effective. This is the challenge of a good Business Intelligence Developer.

<rant over>

I am what I am based on the decisions I have made.

DoubleD [bigcheeks]
 
many thanks for your inputs.

I just noted one line from yksvaan, saying RAID5 is NOT good for transformer. So the solution is the place the transformet TEMP folder settings to a non-raid5 disk...?

My records, the 20 million is a summary.

thanks again.

regards,
sudhi
 
I thougt you got it that I was exaggurating a little :)
However 100-300 Meg cubes are not that rare.

I understand and couldn't agree more that OLAP is not made for several dimensions heving ten-thousands of lowest-level categories. But what can you do, when the customers want to buy such cubes, because they love ui of PowerPlay and not of prompted reports like Impromptu. Of course I always try to make the customer realize that a too complex cube might not be comprehendable to cube consumers, a human brain can only handle 2-3 dimensions+measures.

Yes, 10-20 million isn't too much rows in it's own right(?), my point in fact was not to focus on number of rows. The other things than that are wastly more important concerning build duration.

Cheers


[blue]Backup system is as good as the latest recovery[/blue]
 
yksvaan,
You're points regarding speed of cube build time are excellent.
Glad to hear you were exaggerating, but I have talked to people who had 1GB cubes, so I never assume someone is exaggerating when they make that statement.


I am what I am based on the decisions I have made.

DoubleD [bigcheeks]
 
ditto DoubleD's comments from me.
Without speech inflexion or emoticon use, it's tough to determine exaggeration, sarcasm, irony from plain talking.
(Coincidentally, just built a cube from 15M records and it took under 20 mins on not-special hardware - the joy of high level summaries [smile])

soi la, soi carré
 
all,
to give an update on situation here with me, below are lines from log.

Timing, OPEN DATA SOURCE,00:00:01
End processing 8716003 records from data source....
Timing, READ DATA SOURCE,01:13:45


The oracle query opened in no time. But processing them took 1hr 13minutes.

Is it because of the network or bcoz of the model structure (dimension/measures/etc)?

Will there be any increase if we upgrade the link between database and cube build servers?

regards,
Sudhi
 
I believe the OPEN DATA SOURCE line is just how long it takes to connect to Oracle.
Reading the log lines you put in, it took 1hr 13minutes to get the data from Oracle. I'm concerned that with summarization you are still processing 8M records. Looks like it's going to be a horrendously slow cube build.

Someone please correct me if I'm wrong.

I am what I am based on the decisions I have made.

DoubleD [bigcheeks]
 
yksvaan,

coming back to ur pint earlier in the thread, you said "RAID5 is horrible" for the cube build process.

If so, can you please suggest which is good, RAID1, or stripping or mirroring or what kind of configuration you are suggesting.

thanks.

regards,
Sudhi
 
thanks DoubleD.

Its 7 years of historical data and growing. Data summarized to the max possible level.

Does 1hr 13 minutes is due to summarization or due to netwrok?

regards,
Sudhi
 
Take your SQL from your IQD and run it in a tool like SQL+. Check out the timing to verify whether it's oracle, or the transfer rate across the network.

I am what I am based on the decisions I have made.

DoubleD [bigcheeks]
 
Sudhi,
I agree with DoubleD's first point - that 73 minutes is the time taken to read in the records. Whether this can be improved must depend on the nature of the connection to the database, the query itself and the structure of the database - how many tables are accessed and the indexing and joins between them.

I have a SQL server database on a desktop PC which is a mirror of the most useful tables on our trading system (Progress 9.1D running on a good-sized clustered HP Unix box). A query and read by the cognos server takes 10 minutes using SQL server data yet takes 5 hours using ODBC to the Progress DB! The build is 30 minutes in both cases.

In my experience a simple model (few dimensions, no 1:1 record:category relationship, no deep or wide level category counts) can be built from 10 million records in a few minutes on reasonable hardware.

soi la, soi carré
 
I think it also depends on hardware setup (and on manufacturer), it might be that on the same machine raid1 wouldn't be significantly better. But your 73 mins, as other members stated, is the overall time that oracle answers, gets the result set, sends it over network and Transformer saves the data on local disk. So there might not be a single cause. It might an oracle view to get the result set. Don't trust just on that you can nicely open the query with some Winsql. Try SELECT COUNT(*) FROM (your query) DERIVED.

Do you have two processors? Enable multi-procs in Transformers data source and see if that helps (probably not much). See in Transformer preferences what your data temp and model temp dirs are, and from configuration manager, where your UDA temp is. I don't even have to say that any of these must not be a network share volume.


Try copying a big file on your disk, how the speed feels.

If everything above is working fine, it's probably the network.

There are also some number_of_records (or something) parameters for Transformer in trnsfrmr.ini, but I'm not really familiar with them.

Also in Transformer, PowerCube properties have Optimization setting, what do you have there. I think default is the best choice any way.

Cheers


[blue]Backup system is as good as the latest recovery[/blue]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top