Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

outer join performance problem 1

Status
Not open for further replies.

mahgoo

Programmer
Jan 8, 2012
6
Hi guys

I have a performance problem and no idea how to handle it ;(
I will illustrate it by an example:
there are 4 tables :
one contains the actual session data [session_tbl (2 millions ds, may exeeds to 24 millions)], some sessions are linked with some sellers [seller_tbl (1000 ds)] the linked table (a m:n relation [session_seller_tbl 5000 ds]).
at last a priority table [prio_tbl 2 ds]

if I do a query on session_tbl :
select count(*) from session_tbl where duration > 60
it costs 16ms
if I do the same on session_vw :
select count(*) from session_vw where duration > 60
it costs 18 seconds

create view session_vw as
select a.session_id, a.starttime from session_tbl a left join session_seller_tbl b on a.session_id = b.session_id
left join seller_tbl c on b.seller_id = c.seller_id
left join prio_tbl d on b.seller_id = d.seller_id;

indexes that are exists are :
CREATE UNIQUE INDEX "PK_SELLER_SESSION_TBL" ON "session_seller_tbl" ("session_id", "seller_id", "SOURCE")
CREATE INDEX "FKI_SESSID_SELLER" ON "session_seller_tbl" ("session_id")
CREATE INDEX "FKI_SELLERID_SESSION" ON "session_seller_tbl" ("seller_id")

CREATE UNIQUE INDEX "PK_SELLER_TBL" ON "seller_tbl" ("seller_id")

CREATE UNIQUE INDEX "PK_SESSIONTBL" ON "session_tbl" ("session_id")
CREATE INDEX "SESSIONS_DURATION_IDX" ON "session_tbl" ("duration")
CREATE INDEX "EDM"."SESSIONS_STARTTIME_IDX" ON "SESSIONS_TBL" ("starttime" DESC)

CREATE UNIQUE INDEX "PK_PRIO_TBL" ON "prio_tbl" ("seller_id")

any idea are welcome

regards
Mic
 
Are the stats on all your tables and indexes up to date?

 
Hi Dagon,

thx for yr reply. yes you were right the indexes of session_seller_tbl were not actual. Now the response time of sessions_tbl similary to sessions_vw, but I forgot an important think.
The whole sessions has to be ordered by starttime cause I allways need only the latest 3000 because of list pagination.
And the sort needs for 2 million entries 18 seconds on sessions_tbl and also on view

The idea I need is how I get allways the latest sessions on query restriction:
select sessions_id from sessions_vw where duration > 60 order by starttime desc

Do you have any suggestion how I handle it better ?
 
Is the query actually returning 2 million rows if you let it run all the way through? When you say you need the latest 3000, do you mean that is the total number you need or do you mean you need to display 3000, then another 3000 etc i.e. display in blocks of 3000?

 
if I restrict the query by duration > 60 its returning 15.000 lines, without its returning 2.700.000 lines.
Because the results are shown by jsp/html sites I need only the latest 3000 entries.
similar to :
select * from (
select * from sessions_vw order by starttime )
where rownum < 3001;
this is shown partly by 30 entries a time
 
The sort is always going to be a limiting factor. The only chance you have of avoiding it is to try to use the index on starttime to provide the sort. However, having an index on starttime on its own will be useless because Oracle will still need to go to the main table to retrieve the other columns.

Your best chance therefore is to have an index on every column that you are retrieving from session_data. This will give full coverage and, in theory, allow Oracle to get all the data back from the index without having to go to the main table. Here is an example:

Code:
drop table session_data;

create table session_data (session_id number primary key, start_time date);

begin
  for i in 1..1000000 loop
    insert into session_data values (i, (trunc(sysdate)-1000000/(24*60*60))+i/(24*60*60));
  end loop;
end;

create index sesdat_id on session_data (start_time desc, session_Id asc)

begin
  dbms_stats.gather_table_stats(OWNNAME=>'NPP', TABNAME=>'session_data');
end;

select session_id, start_time from session_data sd
 order by start_time desc, session_Id asc

Plan
SELECT STATEMENT  ALL_ROWSCost: 3,611  Bytes: 12,994,124  Cardinality: 999,548  	
	1 INDEX FULL SCAN INDEX NPP.SESDAT_ID Cost: 3,611  Bytes: 12,994,124  Cardinality: 999,548

However, as your joining to lots of other tables, this may not work on your system.


 
I'm already using an index on any column but unfortunaly indexes on sessions_id or starttime are using only one key per block. Means there is no big performance advantage from indexes.

I read a lot of arcticles today and it seems that the biggest advantage lays in partitioning of tables, indexes and materialized views.
I will look for how I can use it for my purposes and how many work I have to investigate

so long and thx for your replies
mic
 
I'm saying you need to use an index on sessions_id and starttime, not individual indexes. You need to aim for index coverage with the columns sorted in the order of the query.

Partitioning would be a big step and might have adverse effects on other queries that you run. It would only really help if you got partition elimination and that is not going to happen unless you have a where condition on the starttime (assuming that's what you've partitioned on). Otherwise, it's probably still going to have to scan all the partitions and sort the result set, so performance would be worse than with a standard table.

A materialized view that pre-joins all the tables may help slightly, but it wouldn't get round the core problem that you want to sort a large data set.

 
there is another index I created :
CREATE INDEX "EDM"."IDX_OBJSAT_COMB_OBJID" ON "EDM"."SESSION_SELLER_TBL" ("SESSIONS_ID", "SELLER_ID")

I had already an concatenated index on these 2 columns of sessions_tbl
CREATE UNIQUE INDEX "EDM"."SESSIONS_STARTT_COMP_IDX" ON "EDM"."SESSIONS_TBL" ("STARTTIME" DESC, "SESSIONS_ID")

but in the execution plan this index is not considered. Do you think that concatenation in another sequence has more success ?
I don't understand why you use the trunc function to insert date data ? It should be the same if I leave my sessions_tbl like it is and
drop and create an index with concatenated columns on it.

than my execution plan :

Plan hash value: 2721653085

----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2698K| 465M| | 143K (1)| 00:28:41 |
| 1 | SORT ORDER BY | | 2698K| 465M| 980M| 143K (1)| 00:28:41 |
|* 2 | HASH JOIN RIGHT OUTER | | 2698K| 465M| | 35643 (2)| 00:07:08 |
| 3 | INDEX FULL SCAN | PK_PRIO_TBL | 5 | 25 | | 1 (0)| 00:00:01 |
|* 4 | HASH JOIN RIGHT OUTER | | 2698K| 452M| | 35594 (2)| 00:07:08 |
| 5 | INDEX FULL SCAN | PK_SELLER_TBL | 5 | 25 | | 1 (0)| 00:00:01 |
|* 6 | HASH JOIN RIGHT OUTER| | 2698K| 440M| | 35544 (2)| 00:07:07 |
| 7 | INDEX FAST FULL SCAN| IDX_OBJSAT_COMB_OBJID | 4285 | 351K| | 14 (0)| 00:00:01 |
|* 8 | TABLE ACCESS FULL | SESSIONS_TBL | 2698K| 223M| | 35482 (1)| 00:07:06 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("D"."SELLER_ID"(+)="C"."SELLER_ID")
4 - access("C"."SELLER_ID"(+)="B"."SELLER_ID")
6 - access("A"."SESSIONS_ID"="B"."SESSIONS_ID"(+))
8 - filter("A"."DURATION">60)

Statistics
-----------------------------------------------------------
50 recursive calls
3 db block gets
160100 consistent gets
136137 physical reads
0 redo size
0 workarea executions - multipass
0 parse time cpu
0 parse time elapsed
0 frame signature mismatch
2 execute count

about table partitioning I read that this makes sense if I have a big amount of data - that is the case.
yes I thought about partition of the starttime, because the sessions_tbl is increased sometimes by 100.000 sessions a day
but only a little amount of it are really used. So I have a lot of historical data that will not changed anymore.

update:
with the materialized view it seems that I came into a new performance leak, because the insertion into sessions_tbl needs now 16 seconds per session.
before I created the materialized view I inserted 3 per second



 
The trunc stuff was just a mechanism I used to get some test data. The idea was to start a million seconds before the current date (truncated to the start of the day) and bring the date forward 1 second at a time.

You will need to aim for full index coverage i.e. include every column from sessions_tbl you are selecting and probably also the columns like seller_id that you are joining on. Another option would be to create a materialized view which pre-joins all the tables and then create the covering index against the materialized view.

 
Dagon, thx for your support.
there is a lot to try and I'm not sure which way I will go. I think the best way is to separate session data and to work with materialized views on historical data. but we will see.
also if someone is interested in tuning measurements - this is a good link (in german) :
 
after you created INDEX "EDM"."SESSIONS_STARTT_COMP_IDX" if Oracle is still not using it then force its use using a hint.

Oracle will many times do what it thinks is better, not what should be done, and you as the one that knows the data better sometimes have to force the correct path.

apart from not using the index above, those hash joins are killing your performance

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top