Performance of query

abenstex · Aug 3, 2006

Hi everyone!

i have a table with around 57 million tuples, with the following columns: pid(varchar), crit(varchar), val1(varchar), val2(varchar). Example:
pid crit val1 val2
p1 c1 x y
p1 c2 x z
p1 c3 y x
...
What i am doing is to query all val1 and val2 for one pid and all crit values:

Code:

select val1, val2, crit from mytable where pid='somepid' and crit in(select crit from myCritTable);

where myCritTable is a table that contains all crit values (around 42.000) ordered by their insertion date.

Code:

                                                                QUERY PLAN

--------------------------------------------------------------------------------
----------------------------------------------------------
 Hash IN Join  (cost=1033.67..134959.41 rows=37120 width=23) (actual time=357.11
6..356984.535 rows=37539 loops=1)
   Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
   ->  Bitmap Heap Scan on test2  (cost=232.92..132766.66 rows=37120 width=23) (
actual time=291.600..356707.737 rows=37539 loops=1)
         Recheck Cond: ((pid)::text = '1'::text)
         ->  Bitmap Index Scan on idx_test2_pid  (cost=0.00..232.92 rows=37120 w
idth=0) (actual time=234.516..234.516 rows=37539 loops=1)
               Index Cond: ((pid)::text = '1'::text)
   ->  Hash  (cost=700.20..700.20 rows=40220 width=13) (actual time=65.055..65.0
55 rows=40220 loops=1)
         ->  Seq Scan on snps_test  (cost=0.00..700.20 rows=40220 width=13) (act
ual time=0.020..30.131 rows=40220 loops=1)
 Total runtime: 357017.259 ms

Unfortunately the query takes pretty long for the big table, so maybe one of you has a suggestion on how to make it faster.

feherke · Aug 3, 2006

Hi

The sub-select is executed for each line, so is usefull only for small queries. Try to rewrite your [tt]select[/tt] to use a [tt]join[/tt].

Code:

[b]select distinct[/b]
t.val1, t.val2, t.crit

[b]from[/b] mytable t
[b]inner join[/b] myCritTable [b]using[/b] (crit)

[b]where[/b] t.pid=[i]'somepid'[/i]

If there are only a few value in myCritTable, I would try to compose an [tt]or[/tt]'ed condition list with the application like this :

Code:

[b]select distinct[/b]
val1, val2, crit

[b]from[/b] mytable

[b]where[/b] t.pid=[i]'somepid'[/i]
[b]and[/b] ( crit=[green][i]crit1[/i][/green] [b]or[/b] crit=[green][i]crit2[/i][/green] [b]or[/b] crit=[green][i]critn[/i][/green] )

As I observed, this kind of expressions are abit faster then [tt]in[/tt].

Feherke.

http://rootshell.be/~feherke/

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Performance of query

abenstex

Programmer

feherke

Programmer

Similar threads

Part and Inventory Search

Sponsor