Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extreamly Slow Search?

Status
Not open for further replies.

michaeljiang

Programmer
Aug 4, 2007
5
US
I have the following 2 tables

table pictureinarticle{
'id' int(10) unsigned NOT NULL auto_increment,
'articleid' int(10) unsigned NOT NULL,
'pictureid' int(10) unsigned NOT NULL,
PRIMARY KEY ('id'),
KEY 'pictureid' ('picutreid', 'articleid')
}

table article{
'id' int(10) unsigned NOT NULL auto_increment,
'timestamp' timestamp(14) NOT NULL,
PRIMARY KEY ('id'),
KEY 'timestamp' ('timestamp')
}

'articleid' in table 'pictureinarticle' is PRIMARY KEY 'id' in 'article'.

Now, I want to search total number of pictures in all articles that are in a time range START and END (both of TIMESTAMP type). It is a simple search as follows:

select count(pictureid) from pictureinarticle where articleid in ( select id from article where timestamp between START and END);

PROBLEM:
- the above search took about 45 seconds to finish.

Data:
- Number of articles in [START, END] is around 260,000, returned from 'select id from article where timestamp between START and END'.
- Number of records in table 'pictureinarticle' is around 1,400,000.
- Number of records returned by above select is around 520,000.

Platform:
Linux w/ 2.8GHz Interl Xeon.

Analysis:
- select articles in [START, END] is fast, no more than 1 second: select id from article where timestamp between START and END.
-it is 'select count(a.pictureid) from pictureinarticle a, article b where a.articleid in (...)' that actually cost 110 seconds.
This seems to be due to a poor indexed table? But I did have 'articleid' indexed in table 'pictureinarticle'.

EXPLAIN above query output:

*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: pictureinarticle
type: index
possible_keys: NULL
key: pictureid
key_len: 7
ref: NULL
rows: 1342998
Extra: Using where; Using index
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: article
type: unique_subquery
possible_keys: PRIMARY,timestamp
key: PRIMARY
key_len: 3
ref: func
rows: 1
Extra: Using where

So, a 45 seconds to get around 520,000 results is rather slow. What may cause this slow search problem?

Any thoughts? Thanks!
 
Hi

That must be slow.
Code:
[b]select[/b]
count(*)

[b]from[/b] pictureinarticle [b]as[/b] pa
[b]inner join[/b] article [b]as[/b] a [b]where[/b] a.id=pa.articleid

[b]where[/b] a.timestamp [b]between[/b] [green][i]START[/i][/green] [b]and[/b] [green][i]END[/i][/green]

Feherke.
 
[q]select
count(*)

from pictureinarticle as pa
inner join article as a on a.id=pa.articleid

where a.timestamp between START and END[/q]

Thanks! In fact, I've also tried this and the following:

select count(*) from pictureinarticle pa, article a
where pa.articleid=a.id
and
a.timestamp between START and END

Both took similar amount of time, around 38 seconds, definitely better than the one posted in question. So, I guess that slow query is due to a all table scan to check a "IN" set operation. But a explicit "JOIN" does not improve much (well, yes, a 15% faster). Can we do better than this?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top