I am trying to optimize the speed of a query that calculates bias/mean absolute error of forecasts against observations. The forecasts are issued hourly, and make hourly forecasts to the 48th hour. Furthermore, there are about 20 different forecast methods. On top of that, there are 12 different sites. What this adds up to is a really large database. The forecast table has 400K rows, while the observation table as 25K. In order to calculate the above statistics for a given time period, one must pull a lot of data based on the date range. The worst case scenario is that 60,000 rows must be pulled from the forecast table (all stations, methods, and 30 days of fcsts). The problem doesn't seem to be the speed of the query (<.1 sec) as the query is only on the Primary Key (time,station,method), but rather the throughput of the data. I am using PHP to handle the data and present it to the end user. I would estimate that about 15Mb has to go through the pipe before it's processed (ie. converted to an array, and compared to the observations - note, an observation would correspond to say, f010 of a forecast 10 hours earlier, so a parallel join is impossible). The machine I am using is forced to swap in order to execute the script, but it is an old machine (500Mhz processor, 128Mb Ram) that's used for development.
Are there any ways of possibly compressing the inbound data, especially while it is being processed? Right now, I create the big 4-D array before I do the statistics (which are grouped by method), would it help to split up the data feed by method? PHP seems to be most sluggish during the query mysql_query(...), rather than the mysql_fetch_array() looping.
Are there any ways of possibly compressing the inbound data, especially while it is being processed? Right now, I create the big 4-D array before I do the statistics (which are grouped by method), would it help to split up the data feed by method? PHP seems to be most sluggish during the query mysql_query(...), rather than the mysql_fetch_array() looping.