Best place to do an Aggregation? 1

FlaBusInt · May 16, 2007

I am building a SSIS package to pull in and aggregate job cost data from our SQL Server OLTP system to our data warehouse. I have built a view on the OLTP server that will pull JobNumber, MtlQty, and MtlCost for each material item associated with the job. In the data warehouse, I just want to store total material cost for each job. As I see it, I have two options:

1.Tweak the view to provide:

Code:

SELECT
  JobNum,
  IsNull(Sum(MtlQty * MtlCost),0) as TotalCost

-OR-

2. Add a Derived Column transform in my SSIS package to perform the (MtlQty * MtlCost) calculation for each row, then use an Aggregate transform to sum them by job.

Will one method typically be faster than the other, or is it hit and miss depending on the usual variables (server memory, server CPU configuration, etc.)?

MDXer · May 16, 2007

Since this isn't really ann aggregate function but a mathematical function you shouldn't have any issue using the derived column and just multiply the needed column together. If you Data volume isn't to great you may look at using the Aggregate Transform in your SSIS packages, rather than your view. Doing this would mean you hit the source system for a smaller period of time as the SUM() won't pass records to the data flow immediately. THe Aggregate transform is a blocking transform meaning that until all records for the grouping columns are at the aggregate transform the records will not proceed to the next step. With the addition of the need to get the total cost you could do it one of 2 ways

1 Aggregate and the Multiply This may infact take longer as you typically want the aggregate to be one of the last steps in your data flow.

2 Multiply QTY then Cost in a derived column and then apply the aggregate transform and sum these 3 columns.

I run about 38K records through an Aggregate transform and have it group by customer and date avg is about 1800-2500 customers a day and the over head is leass than 2 minutes.

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
- Daffy Duck

FlaBusInt · May 17, 2007

Paul -

The paper you linked to in your other thread inspired me to run a test. I set up three parallel threads in an SSIS data flow. The first used a SQL Statement in the OLE DB Source to pull aggregated data (about 1M rows). The second used a SQL Statement in the OLE DB source to pull unaggregated data (47M rows) and pass it to an Aggregation transform. The third used a view to aggregate the data; the data source simply read from the view. In each case I terminated the data flow with a Trash destination.

The results:

SQL Statement Aggregate: 25 minutes
Aggregate in Data Flow: 37 minutes
Aggregate in View: 50 minutes

Incidentally, the aggregation task itself in the data flow took about 4 seconds; the remainder of the time was pulling the 47M rows.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Best place to do an Aggregation? 1

FlaBusInt

IS-IT--Management

MDXer

Technical User

FlaBusInt

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor