Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Further explanation

Status
Not open for further replies.

BillDHS

Programmer
Apr 26, 2001
207
US
In another thread dnoeth stated:

Never use "csum(1,1)" on a large data set, this will result in a skewed spool with all data on a single AMP.

I would like to know why this is true. CSUM(1,1) returns different values on each row; i.e. 1,2,3,4; why would all rows go to the same AMP?

Also, if it is not the PI or the first column, why would it be used for ditribution at all?

thanks
 
Just run a
"select csum(1,1) from sys_calendar.calendar;"
and check dbc.ampusage for used CPUTime.
If you can, submit "exec dbc.clearpeakdisk" before and check dbc.diskspace.peakspool after the query.

I don't know how those OLAP functions are implemented internally, but there's a distribution (looks similar to the sort in FastExport?):
All rows (within the same partition) with the same value are distibuted to the same AMP.

So CSUM(1,1) without GROUP BY results in a totally skewed spool. With GROUP BY it's usually not that bad, but there's no advantage of using it...

Dieter
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top