Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Is The Query Scalable? 1

Status
Not open for further replies.

3dColor

Programmer
Jan 10, 2006
240
US
Rudy helped me create this query a couple of years ago to find the stats for each day of an rental ad's duration. It worked great before but now it is taking 17 seconds to process the query with over 66k records in the table.

Is the query scalable?

Code:
<!--- retrieve stats past 40 days for an ad --->
	<cffunction access="remote" name="retrieveStats40days" returntype="query">
		<cfargument name="adId"	type="numeric" required="true"/>
		<cfargument name="adduration" type="numeric" required="true"/>
		<CFQUERY name="GetStats40days" datasource="#DSN#">
			SELECT thedate, 
				count(Stats.timeStampStats) as cnt
  				FROM (
       				select date_sub(current_date, interval i day) as thedate
        		 from Integers t
        		where i <= #adduration# 
       		) as i_40 
			LEFT OUTER
     		 join Stats
      		  on date(Stats.timeStampStats)
      		   = i_40.thedate
 	 		 and Stats.listingFk = <cfqueryparam value='#adId#' cfsqltype='cf_sql_integer'>  
			GROUP BY thedate
			ORDER BY thedate DESC
		</CFQUERY>
	<cfreturn GetStats40days>
	</cffunction>
 
Good morning Rudy!

The Stats table just has one Primary index key which is "statsId".

The Integers table just has one column and Primary index key is "i".

Here is the source code for both tables:

Code:
CREATE TABLE `Stats` (
  `statsId` int(11) unsigned NOT NULL auto_increment,
  `listingFk` int(11) unsigned default NULL,
  `timeStampStats` timestamp NULL default CURRENT_TIMESTAMP,
  `ipAddress_stats` varchar(18) default NULL,
  PRIMARY KEY  (`statsId`)
) ENGINE=MyISAM AUTO_INCREMENT=982781 DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;

Code:
CREATE TABLE `Integers` (
  `i` int(11) NOT NULL default '0',
  PRIMARY KEY  (`i`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Thanks Rudy,
Dave
 
ALTER TABLE Stats ADD INDEX nummy (timeStampStats)

indexing is the key to performance

then change this --

LEFT OUTER
JOIN Stats
ON date(Stats.timeStampStats) = i_40.thedate

to this --

LEFT OUTER
JOIN Stats
ON Stats.timeStampStats >= i_40.thedate
AND Stats.timeStampStats < DATE_ADD(i_40.thedate, INTERVAL 1 DAY)


r937.com | rudy.ca
 
Thanks Rudy!

I will give that a try and report back...
 
Rudy,

I change the table so now it reads:

Code:
CREATE TABLE `stats` (
  `statsId` int(11) unsigned NOT NULL auto_increment,
  `listingFk` int(11) unsigned default NULL,
  `timeStampStats` timestamp NULL default CURRENT_TIMESTAMP,
  `ipAddress_stats` varchar(18) default NULL,
  PRIMARY KEY  (`statsId`),
  KEY `nummy` (`timeStampStats`)
) ENGINE=MyISAM AUTO_INCREMENT=839561 DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC

and now the Query is this:
Code:
<cffunction access="remote" name="retrieveStats40days" returntype="query">
		<cfargument name="adId"	type="numeric" required="true"/>
		<cfargument name="adduration" type="numeric" required="true"/>
		<CFQUERY name="GetStats40days" datasource="#DSN#">
			SELECT thedate, 
				count(Stats.timeStampStats) as cnt
  				FROM (
       				select date_sub(current_date, interval i day) as thedate
        		 from Integers t
        		where i <= #adduration# 
       		) as i_40 
			LEFT OUTER
				JOIN Stats
				ON Stats.timeStampStats >= i_40.thedate
				AND Stats.timeStampStats < DATE_ADD(i_40.thedate, INTERVAL 1 DAY)
 	 			and Stats.listingFk = <cfqueryparam value='#adId#' cfsqltype='cf_sql_integer'>  
			GROUP BY thedate
			ORDER BY thedate DESC
		</CFQUERY>
	<cfreturn GetStats40days>
	</cffunction>

But it still takes 17 seconds. Did I do the changes right?
 
do me a favour, pull the query out of the coldfusion, substitute actual values for #adduration# and #adId#, then run an EXPLAIN on it directly in mysql



r937.com | rudy.ca
 
Ok I did as you asked in MySQL Query Browser and I got the same 17 seconds result.

I am not sure I got all the lines right - this is what I used:

Code:
SELECT thedate,
count(Stats.timeStampStats) as cnt
FROM (
select date_sub(current_date, interval i day) as thedate
from Integers t
where i <= 40
) as i_40
LEFT OUTER
JOIN Stats
ON Stats.timeStampStats >= i_40.thedate
AND Stats.listingFk = '8342'
GROUP BY thedate
ORDER BY thedate DESC
 
please stick the word EXPLAIN in front of SELECT and run it again

i got a feeling we should include listingFk into `nummy`

if you have time, try it with listingFk first, do an EXPLAIN, then try it again with it second, after timeStampStats, and do another EXPLAIN

r937.com | rudy.ca
 
When I place EXPLAIN at the very top this is what returns:

Code:
1 primary <derived2>
1 primary stats
2 derived t

I am not sure how to alter the stats table, is this correct?
Code:
ALTER TABLE Stats ADD INDEX nummy (listingFk)

 
yes, that's what EXPLAIN produces, but you've only shown part of it

ALTER TABLE Stats DROP INDEX nummy ;
ALTER TABLE Stats ADD INDEX nummy (listingFk,timeStampStats)

do you have the mysql documentation handy?

r937.com | rudy.ca
 
It is running really fast now (less than 1 second) but something look very wrong to the graph now.

Take a look at this image of the output:

Appears to be a log curve heading up to 59 and then it levels out, which is not what is coming from the DB.

I do not have the mysql documentation handy.



 
Rudy,

Now it is working! And it is working fast - around 250ms - much better than 17 seconds!

Thanks for your help.

I am not sure what I did but it is working now. I think it was when I added the nummy stuff that made it work.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top