Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Performance of query

Status
Not open for further replies.

abenstex

Programmer
Jan 9, 2005
47
0
0
DE
Hi everyone!

i have a table with around 57 million tuples, with the following columns: pid(varchar), crit(varchar), val1(varchar), val2(varchar). Example:
pid crit val1 val2
p1 c1 x y
p1 c2 x z
p1 c3 y x
...
What i am doing is to query all val1 and val2 for one pid and all crit values:
Code:
select val1, val2, crit from mytable where pid='somepid' and crit in(select crit from myCritTable);
where myCritTable is a table that contains all crit values (around 42.000) ordered by their insertion date.

Code:
                                                                QUERY PLAN

--------------------------------------------------------------------------------
----------------------------------------------------------
 Hash IN Join  (cost=1033.67..134959.41 rows=37120 width=23) (actual time=357.11
6..356984.535 rows=37539 loops=1)
   Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
   ->  Bitmap Heap Scan on test2  (cost=232.92..132766.66 rows=37120 width=23) (
actual time=291.600..356707.737 rows=37539 loops=1)
         Recheck Cond: ((pid)::text = '1'::text)
         ->  Bitmap Index Scan on idx_test2_pid  (cost=0.00..232.92 rows=37120 w
idth=0) (actual time=234.516..234.516 rows=37539 loops=1)
               Index Cond: ((pid)::text = '1'::text)
   ->  Hash  (cost=700.20..700.20 rows=40220 width=13) (actual time=65.055..65.0
55 rows=40220 loops=1)
         ->  Seq Scan on snps_test  (cost=0.00..700.20 rows=40220 width=13) (act
ual time=0.020..30.131 rows=40220 loops=1)
 Total runtime: 357017.259 ms

Unfortunately the query takes pretty long for the big table, so maybe one of you has a suggestion on how to make it faster.
 
Hi

The sub-select is executed for each line, so is usefull only for small queries. Try to rewrite your [tt]select[/tt] to use a [tt]join[/tt].
Code:
[b]select distinct[/b]
t.val1, t.val2, t.crit

[b]from[/b] mytable t
[b]inner join[/b] myCritTable [b]using[/b] (crit)

[b]where[/b] t.pid=[i]'somepid'[/i]
If there are only a few value in myCritTable, I would try to compose an [tt]or[/tt]'ed condition list with the application like this :
Code:
[b]select distinct[/b]
val1, val2, crit

[b]from[/b] mytable

[b]where[/b] t.pid=[i]'somepid'[/i]
[b]and[/b] ( crit=[green][i]crit1[/i][/green] [b]or[/b] crit=[green][i]crit2[/i][/green] [b]or[/b] crit=[green][i]critn[/i][/green] )
As I observed, this kind of expressions are abit faster then [tt]in[/tt].

Feherke.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top