Further explanation

BillDHS · Jul 15, 2003

In another thread dnoeth stated:

Never use "csum(1,1)" on a large data set, this will result in a skewed spool with all data on a single AMP.

I would like to know why this is true. CSUM(1,1) returns different values on each row; i.e. 1,2,3,4; why would all rows go to the same AMP?

Also, if it is not the PI or the first column, why would it be used for ditribution at all?

thanks

dnoeth · Jul 15, 2003

Just run a
"select csum(1,1) from sys_calendar.calendar;"
and check dbc.ampusage for used CPUTime.
If you can, submit "exec dbc.clearpeakdisk" before and check dbc.diskspace.peakspool after the query.

I don't know how those OLAP functions are implemented internally, but there's a distribution (looks similar to the sort in FastExport?):
All rows (within the same partition) with the same value are distibuted to the same AMP.

So CSUM(1,1) without GROUP BY results in a totally skewed spool. With GROUP BY it's usually not that bad, but there's no advantage of using it...

Dieter

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Further explanation

BillDHS

Programmer

dnoeth

Instructor

Part and Inventory Search

Sponsor