Tuning: indexes, visual explain, or speed up of queries 1

Lysen · Jan 27, 2004

Hi folks,

We have a table called content in our teradata db. This table is heavily accessed (+300.000 queries per day) because an internet utility uses it to search for holiday content. During the day, when our website isn't used much, the queries run very fast. But during peak hours (8 pm to 10.30 pm) it takes much MUCH longer to finish the queries... (about 30 times longer)

I've entered some of the most common queries in the teradata index wizard to analyze, and it came back with some recommodations. After validating these, Index wizards reports an performance increase of 84%!

Nice, right? Well, after creating the indexes on the content table, and entering a query in visual explain shows that the index is NOT used, thus no improvement is had in those queries!

Why is the index skipped if such a huge improvement lies within reach? Could someone please help me with this? Am I doing something wrong here? Or any other suggestions to speed the table/queries up a bit?

Thanks...

dnoeth · Jan 27, 2004

Hi Lysen,
did you collect statistics on those columns?

Are prepared statements used to access that table?
E.g. if you use "exec mymacro(12, 'abc')" the query is optimized for those values, but if it's "exec mymacro

param2,

aram1)" the optimizer is trying to create a plan which fits all possible values.
This may lead to a plan not using that index, especially if there are values with lots of rows within the indexed column.

It would be helpful, if you could provide more details:
DDL, number of rows, details about column statistics and some of the queries...

Dieter

Lysen · Jan 27, 2004

Hi Dieter,

Yes, statistics are collected on all columns, because all columns can be used as a search criteria... It is a table containing about 100.000 records, and 11 columns. Column 1 = reisnummer, column 2 to 11 = box1, box3, box4 etc. (box2 is not used at this time)

If you point your browser to

http://www.oad.nl/search/search.asp

you can see what kind of queries Teradata has to answer.

Visitors of our website first have to select a country to start searching for a holiday. For example, they select 'turkije' (Turkey). The asp then fires about 10 queries (one for each box) to get the distinct values back to fill the box with the remaining answers. So:

select distinct(box1) from content where box1 = 'turkije'
select distinct(box3) from content where box1 = 'turkije'
select distinct(box4) from content where box1 = 'turkije'
select distinct(box5) from content where box1 = 'turkije'
select distinct(box6) from content where box1 = 'turkije'
select distinct(box7) from content where box1 = 'turkije'
select distinct(box8) from content where box1 = 'turkije'
select distinct(box9) from content where box1 = 'turkije'
select count(distinct(reisnummer) from content where box1 = 'turkije' -- (to count the remaining holidays)

I couldn't think of a way to get the distinct values per box back in 1 query...

When the visitor clicks next on a value in one of the other boxes, above process is repeated with an extra ".. and box X = 'XXX'"

The possibilities are endless, so I tried to speed things up, so Teradata would finish queries quicker and serve more visitors on our website. I rad Index wizard and it came up with a few additional indexes, which I validated and it showed a 84% speed increase.

But teradata doesn't use the index to answer the queries... I hope you have a better understanding of what i'm trying to do now... Hope you can help me out

dnoeth · Jan 27, 2004

Hi Lysen,

the main column used for every access is box1, so it's the best candidate for an index.
BUT the table is quite small (100000 rows) and the number of distinct values is about 50 (countries)
So you can't place the PI on it and as a NUSI the optimizer will probably not use it for access.
And this it's probably the same for all boxes, the selectivity of that index is not good enough. Maybe *without* statistics the optimizer will use it ;-)

As you have lots of possible combinations of columns in the Where-clause you can't use multi-column indexes. The only way i would think of are two-column indexes with box1 and box? for each box...

Before adding indexes i would suggest to replace the DISTINCT as it needs lots of resources for columns with just a few values.
Replace it with:
select box? from content where box1 = 'turkije'
group by 1;

As a rule of thumb it's more efficient to use
- Distinct for lots of distinct values and just a few rows per value
- Group By for a small number of values with lots of rows per value

And to retrieve all values within one query use Union All:

select 'box1', box1 from content where box1 = 'turkije'
group by 2
UNION ALL
select 'box2', box2 from content where box1 = 'turkije'
group by 2
...

And you can replace the first query
select distinct(box1) from content where box1 = 'turkije'
with
select 'turkije' as box1 ;-)

Dieter

Lysen · Jan 28, 2004

Hi Dieter,

I now have an index on box1, and Teradata uses this for the search to fill the remaining boxes! yahoo! :-D This speeds up the search process. I also applied your suggestion to the queries about the use of 'group by' instead of 'distinct', which also speeds up the process considerably...

Now I'll just wait until tonight to see if it stays this fast. :-D Thanks for your help so far!

But I still have another question...;-)
What if all boxes are immediately click-able, so you don't have to begin with the selection of a country, but for example start with 'vervoer' (=transportation). One doens't have to click on a country anymore, but selects 'vliegtuig' (=aircraft) and teradata is supposed to fetch all destinations where one can go to by airplane.

Now, if I start the query with box5= 'vliegtuig', the index on box5 is again not being used, thus getting the desired result slower, while this is one of the heaviest queries possible in this set up. (lots of people don't know where to go, but they know HOW they want to go somewhere... funny...)

Isn't there a way to force teradata to use the index or is the optimizer always right, so there is no possible way to speed it up?

Oh about the single query possibility with the union, I'll have to check on this idea if it's possible to filter it on 'box~' at ASP level to fill the appropriate boxes..

Thanks again for the time you already put into my questions, I really appreciate it!

dnoeth · Jan 28, 2004

Hi Lysen,
glad to hear it's getting faster

You can't force the optimizer to use an index.
Your main problem is the small size of the content table, it's only 100.000 rows, so the optimizer is probably not considering using several indexes or index bit-mapping.
Maybe if your company grows and offers millions of vacanties ;-)

Another way might be using Aggregate Join Indices, but you still have lots of queries accessing different column combinations.

Another remark, is Reisnummer unique?
Then you can get rid of the "count(DISTINCT reisnummer)"

And what's the PI, maybe you should consider using a different one...

Dieter

Lysen · Jan 28, 2004

Hi Dieter,

Nope, reisnummer is not unique. All the boxes together form unique rows.. So reisnummer, box1,box3,box4,box5,box6,box7,box8 and box9 together are unique. That's why I first had a unique primary index of all the columns in the table.

But after trying some the UPI or some other PI, I found that there wasn't a gain/loss in performance, no matter if I use the UPI, of just a PI of reisnummer alone. Or am I missing something here?

Anyway, it quite a bit faster now, I'm very curious how it holds up with 60 people using this search method (so that actually means teradata does about 600 queries at the same time!)

Oh, now I think of it, it's obvious teradata does a lot of the same queries (for example, quite a few people search for 'turkije' on a day.) Isnt there a way to have teradata cache or remember often fired queries, so it doesn't have to do the same job for thousand times a day?

Like in the morning someone clicks on 'turkije', teradata does it's job, and then caches the result, so when another person clicks on 'turkije', it returns the resultset it previously fetched?

Thanks again for your time and effort!

Eli

dnoeth · Jan 28, 2004

Hi Lysen,
if you want cached results sets you should try mysql ;-)

To be honest, Teradata wasn't made for running that kind of "simple" queries. If it only was a complex 10-way join on huge amounts of data...

Regarding the PI: If you have a combination of columns which is used more often than other combinations, this would be a good candidate for a PI.

I just tried join indexes on each column of a table and it was quite promising. For each column do a:

create join index ji_box1 as
select (box1), (rowid)
from content;
collect statistics on content column box1;

Most queries showed a single-AMP retrieve step using the most selective JI, followed by a join back to the base table using residual conditions. But i can't tell if it's really faster, because i tested it on my notebook...
And you have to consider how the table is loaded/maintained, e.g. MultiLoad doesn't support JI, you'll have to drop/load/recreate.

Dieter

Lysen · Jan 29, 2004

Argh! MySql! I've always been pro-teradata when it comes to teradata VS. MySql...! ;-)

But anyway, the whole thing's really really fast now, and everyone is very enthusiastic about the way teradata took the work away from the XML server, and the whole site is much smoother now... So I think we'll leave it way! We're happy, and that's good! :-D

Dieter, thank you for getting this thing back on the track!

dnoeth · Jan 29, 2004

Did you had to argue TD vs. mysql?
Then your problems really might be solved using the limited SQL offered by mysql, e.g.
- still no subqueries in productive versions
- no complex queries
- sometimes strange behaviour (

http://sql-info.de/mysql/gotchas.html)

If you ever get a Teradata training, maybe we'll meet, i'm delivering trainings for NCR in EMEA ;-)

Dieter

Lysen · Feb 8, 2004

Well, it's been stable for a while now...
but yesterday (sunday the 8th of february) our database crashed?? At least it didn't work anymore...

I mean: :-(

This was the error teradata gave back to us..:

OLE DB Provider for Teradata error '80004005'

[NCR][Teradata RDBMS] All virtual circuits are currently in use.

After a reset all was back to normal. Does anyone know what this means? Should I contact NCR, or would that be kinda like calling microsoft when Win95 crashes? ;-)

dnoeth · Feb 9, 2004

Hi Lysen,
you got too many sessions trying to log on.
Did you run some loads at the same time?
Try to reduce the number of sessions by load or the number (the default for MultiLoad/FastLoad is the nmuber of AMPs in your system)

Another solution is adding another Parsing Engine to the system (each PE supports up to 120 sessions and you probably got 1 or 2 per node)

From the Mesages manual:
"08024 All virtual circuits are currently in use

Explanation: This error is sent in response to an
assign request when a either the gateway capacity or
the capacity of all PEs configured with the same logical
host id has been reached.

Generated By: Gateway

For Whom: Field Engineer

Remedy: If the frequency of the message is abnormally
high, contact the Global Support Center. Otherwise,
ignore this message."

Dieter

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Tuning: indexes, visual explain, or speed up of queries 1

Lysen

Programmer

dnoeth

Instructor

Lysen

Programmer

dnoeth

Instructor

Lysen

Programmer

dnoeth

Instructor

Lysen

Programmer

dnoeth

Instructor

Lysen

Programmer

dnoeth

Instructor

Lysen

Programmer

dnoeth

Instructor

Similar threads

Part and Inventory Search

Sponsor