Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Oracle Rank is eating up my temp tablespace

Status
Not open for further replies.

goodmans

MIS
Apr 23, 2008
63
GB
Hi Oracle gurus,
I got the following query which is using the rank function and eating up my temporary tablespace and causing the query run very slow. DBS says its using lots of temp tablespace to sort the output.

This table got 30m rows.

Query:
select col1, col2
from (select a.col1, a.col2, a.col3,
rank() over(partition by a.col1, a.col2 order by a.col2 desc) "rank"
from a)
where rank = 1

Any suggessions? I trust that there will be a solution with you oracle gurus.

Regards
G
 
Are you really sorting by the same column that you are partitioning on ?

rank() over(partition by a.col1, a.col2 order by a.col2 desc) "rank"

That makes no sense since the rank of every row will be 1 and you will just return everything in the table.

How quickly do you expect it to run ? This query will have to sort a 30 million row table. That will never be a simple operation and is bound to take some time. If you have appropriate indexes, you might consider subquery as an alternative:

Code:
select col1, col2          
from a a1
where col2 =
(select max(a2.col2) from a a2
where a2.col1 = a1.col1)

With a nested loops type of plan, this should start to return rows more quickly and won't use any space in temp.
 
sorry i am sorting by col3 (date value) different column.


select col1, col2
from (select a.col1, a.col2, a.col3,
rank() over(partition by a.col1, a.col2 order by a.col3 desc) "rank"
from a)
where rank = 1


Regards
G
 
Another option might be to put a (preferrably unique) index on a.col1 and a.col2. I rigged up a quick example with about 1 million rows and showed that rank is able to make use of the index.

Code:
drop table test_rank_ind;

create table test_rank_ind (x number, y number, z number, constraint pk_tri primary key (x, y, z));

declare
begin
  for i in 1..100 loop
    for j in 1..100 loop
      for k in 1..100 loop
        insert into test_rank_ind values (i, j, k);
      end loop;
    end loop;
  end loop;
end;
/

alter table test_rank_ind enable constraint pk_tri;

select x,y,z
from (select x,y,z,rank() over (partition by x,y order by z) as rank
from test_rank_ind)
where rank=1

Plan
SELECT STATEMENT  ALL_ROWSCost: 2,445  Bytes: 52,071,500  Cardinality: 1,001,375  			
	3 VIEW UN2_APP_OWNER. Cost: 2,445  Bytes: 52,071,500  Cardinality: 1,001,375  		
		2 WINDOW NOSORT  Cost: 2,445  Bytes: 9,012,375  Cardinality: 1,001,375  	
			1 INDEX FULL SCAN INDEX (UNIQUE) UN2_APP_OWNER.PK_TRI Cost: 2,445  Bytes: 9,012,375  Cardinality: 1,001,375
 
Thanks mate. It doent really work in my case as this history table always have duplicate keys with the combination of (x,y,z) so i am using
"order by col3, rowid" to get any one of them so I can not create any unique index on those columns, is it worth having a normal index on (x,y,z)?

Regards,
G
 
I forgot to mention that I have indexes created on all of those 3 columns.

Regards
G
 
A couple of questions for you.

1) Am I right in thinking you basically just want to get a set of unique col1 and col2 from your target table regardless of the value of col3 or is the ordering of col3 significant?

2) What release of Oracle 10 are you on.

IF the answer to Q1 is that the order of col3 is not significant AND the answer to Q2 is that you are on release 2 then it may be worth checking out DML with error logging.
See the link below for a good intro:


Basically you want to create a second table (table2) with two columns with unique index on col1 and col2. Next do an insert /*+Append */ into table2 select col1,col2 from table1 LOG ERRORS etc ...

Pretty sure it will still take some time to run but hopefully will be faster than what you've got.




In order to understand recursion, you must first understand recursion.
 
The best bet would be a concatenated index on the (col1,col2,col3). I've tried to get my query to work with a non-unique index but so far had no success, even with hints.

You may as well use ROW_NUMBER instead of RANK as that will guarantee to return only one row regardless of whether there are duplicates. You can then dispense with ROWID, which really adds nothing because you can't ascribe any meaning to it.

It's probably worth checking the values of database parameters concerned with sorting (PGA_AGGREGATE_TARGET, WORKAREA_SIZE_POLICY, SORT_AREA_SIZE, SORT_AREA_RETAINED_SIZE etc) in case you've got something silly there. Also check that the TEMP tablespace is locally rather than dictionary managed and has sensible extent sizes. Of course, make sure all the tables are analyzed properly.

Beyond that, the only other possibility would be to use a two level sub-select query e.g.

Code:
select a1.col1, a1.col2, a1.col3
from a a1
where a1.col3 =
(select max(a2.col3) from a a2
where a2.col1 = a1.col1
and   a2.col2 = a1.col2)
and a1.rowid=
(select max(a2.rowid) from a a2
where a2.col1 = a1.col1
and   a2.col2 = a1.col2
and   a2.col3 = a1.col3)

This might be able to make more use of the indexes you've got available. A /*+ FIRST_ROWS */ hint might help if the aim is to get something back quickly from the query.
 
Ignore me , I appear to have swallowed some silly pills this morning and have outlined a fancy way of mimicing a select distinct col1,col2 from table1


In order to understand recursion, you must first understand recursion.
 
Hi Dagon, nested queries are using the indexes but the problem is that such query on 21m rows table is taking quite a long time. and the throutput is only around 50 rows per sec.

Regards
G.
 
It depends on whether you want to see all rows or the first few rows. If it's all rows, then a sort of the table may ultimately be quicker than using indexes.

How often are you running the query and how many rows does it return ? Maybe you should consider something like a materialized view if you're running it frequently.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top