Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to Optimize Mass Data Queries

Status
Not open for further replies.

SetoKaiba

Programmer
Jun 16, 2004
28
US
I am trying to optimize the speed of a query that calculates bias/mean absolute error of forecasts against observations. The forecasts are issued hourly, and make hourly forecasts to the 48th hour. Furthermore, there are about 20 different forecast methods. On top of that, there are 12 different sites. What this adds up to is a really large database. The forecast table has 400K rows, while the observation table as 25K. In order to calculate the above statistics for a given time period, one must pull a lot of data based on the date range. The worst case scenario is that 60,000 rows must be pulled from the forecast table (all stations, methods, and 30 days of fcsts). The problem doesn't seem to be the speed of the query (<.1 sec) as the query is only on the Primary Key (time,station,method), but rather the throughput of the data. I am using PHP to handle the data and present it to the end user. I would estimate that about 15Mb has to go through the pipe before it's processed (ie. converted to an array, and compared to the observations - note, an observation would correspond to say, f010 of a forecast 10 hours earlier, so a parallel join is impossible). The machine I am using is forced to swap in order to execute the script, but it is an old machine (500Mhz processor, 128Mb Ram) that's used for development.

Are there any ways of possibly compressing the inbound data, especially while it is being processed? Right now, I create the big 4-D array before I do the statistics (which are grouped by method), would it help to split up the data feed by method? PHP seems to be most sluggish during the query mysql_query(...), rather than the mysql_fetch_array() looping.
 
...15Mb has to go through the pipe...
Which pipe? Sending 15MB over the internet is gonna be noticable.

When you display your results on the web page, how many lines are shown? Showing 500 rows in a table is gonna take noticable time for your browser to build.

Are you computing the same results everytime a person requests your web page? If so, could you re-organize the computation so that results are only calculated one-time? Are there intermediate stages of calculation that could be saved for use by the reports?

Is this a query optimization issue, or is it a process organization issue?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top