Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Faster .Compute() 1

Status
Not open for further replies.

link9

Programmer
Nov 28, 2000
3,387
US
I have a class library that does calculations (usually AVERAGE) on (sometimes) large sets of data. The nature of the application is that it normally does several calculations (5-50 or more depending on user requests) on the same set of data, so the basic implementation is that I grab all the data out into a DataTable and then the library takes arguments and uses the .Compute() method of the DataTable to return results. In most cases, I'm doing an average of a column on a subset of rows. Subsequent calls do averages on different subsets of rows.

This all works fine until the DataTable grows to 10k+ rows. Once I pass that threshold (or thereabouts), the performance goes into the toilet. It can literally take minutes to pull reports that would take < 10 seconds on a dataset half the size.

The problem is that I don't want a single request to result in 50+ queries to my database, yet I must do 50+ calculations. I can add application servers pretty easily, but I only have the 1 database server. That's my scarce resource so I want to minimize his work (even though he's better at doing aggregation that my app server) and put my app servers to work.

I am well aware of the limitations of the DataTable and I am also well aware that if I run these same queries directly against the database, I get much better performance (orders of magnitude better). I'm just not sure how to overcome this.

So my question is does anyone have an idea on how to pull out possibly large sets of data and do calculations on the application servers so as not to overload my database server but avoid this huge performance penalty that I get when working with large DataTables? I'm willing to give up some performance for smaller sets of data if necessary (the tradeoff would be worth it, I'm sure, if I can figure out how to optimize the top end).

I have a small benchmarking application that I'm working with and I have yet to find any method that out-performs DataTable.Compute(). I'm working under the assumption that I need to start w/ a DataTable because I will need to access the data several times (maybe I'm hosed right there - that's why I'm asking this question).

Here is what I've tried thus far:

1) Manually call DataTable.Select() and get a DataRow[] object - brute force iterate and do the average
2) Use LINQ against an EnumerableRowCollection<DataRow> pulled from the DataTable and call .Average()
3) Brute force iterate over the DataTable myself and do the average
4) Call DataTable.CreateDataReader() and brute force iterate on the DataReader to do the average

Of all of these methods, #1 results in the closest performance numbers to .Compute() (and it's pretty darn close) and #2 results in the worst (almost twice as bad) with #3 & #4 falling solidly in the middle of those two.

Does anyone out there have an idea on how to improve upon .Compute()?

Many thanks -
Paul Prewett


penny.gif
penny.gif

The answer to getting answered -- faq855-2992
 
do you need the data in the datatable anywhere else? except to get subsets. Do you show it to the user?

because if you don't then getting the data from the databaseserver and doing the math locally will take more resources from the database server then doing the math there and returning the result.

Perhaps a little trick. Turn the app servers into temp database servers and store you subsets in there. I bet that would be faster then the overhead a datatable creates. Something like mysql or posgre would do.

Christiaan Baes
Belgium

My Blog
 
I do not show the user the entire resultset, no.

The temp database idea had crossed my mind. In fact, to keep it all in the same camp (the db server is SQL Server), I could even use SQL Express.

That would take some amount of doing, but perhaps I could use something like sync services to replicate the data locally and keep those details out of the app.

Hmmm... Yes, a good thought, Christiaan.

penny.gif
penny.gif

The answer to getting answered -- faq855-2992
 
If you use sqlserver you could just do an select into and make up a tablename, perhaps something with a guid in it to make it session specific. and then do an average on that or subsets of that.

Life can be so easy.

For simple calculations like average and sum I think the databaseserver will beat the compute().


Christiaan Baes
Belgium

My Blog
 
Indeed, the db server will most certainly beat .Compute().

Setting this up will require a significant re-tooling of the library (which of course is not out of the question - I think I'm looking at a rewrite either way), so I'm still hoping that someone else out there has an idea on how to overcome this overhead in memory.

The more I think about it, the more I'm thinking that once I put it into the DataTable, the performance game is over right there. But there's always hope.

penny.gif
penny.gif

The answer to getting answered -- faq855-2992
 
I'm hoping with you. But i think the datatable is much to bloated for that.

Christiaan Baes
Belgium

My Blog
 
What about reading the data into an embedded database via a data reader, a popular one is SQLLite and it just requires you include the dll. You can then do the SQL on an embedded database that requires no installation as the dll goes with your app. The claim is that it is faster than SQL Server? When your done just drop the data.

Age is a consequence of experience
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top