Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Advise on large table indexing 1

Status
Not open for further replies.

pgferro

Programmer
Aug 21, 2001
111
BS
Hi Guys,

I have a large (50 million records) demographic table, normalized, with about 10 searcheable fields (annual income, number of children, etc etc) all searchable fields are id's related to smaller tables.

The query I will run against the db is a classic multiple options choice :

Income : 1.All 2.Between X and Y 3.Between J and K
Children : 1.All 2.None 3.Between X and Y

etc

What is the best approach in your opinion coming to indexing ? Should I index all searcheable fields, combine fields or ... ?

Any advise very much appreciated.

Thanks !

--
PG
 
A multi-column index will only help you if all fields at the front of the index are referenced in the query. E.g. if you have an index on A,B,C it will not help you if you query on B and C. But if you know that some high proportion of your queries reference fields X and Y, then a multi-column index on those would be beneficial.

Also, with 50 million records a field that only has a few possible values, i.e. number of children where a huge number of records are going to have the value of 2 or 3, a b-tree index may actually hurt performance. Oracle provides a bitmap index for fields like that, but last time I checked MySQL didn't provide anything similar.

In a case like that it'd be better to create indexes on more appropriate (high-variance) fields, then the optimizer will use those indexes first to limit the result set, then use a table scan of that smaller set to search the low-variance fields.
 
Thanks Eric !

What would you consider the lowest variance threshold for considering indexing ?
Another question, you say that the optimizer will use the indexes first than the table scan, should the query be constructed accordingly (with a specific order in the WHERE clause) to improve performance or it is ininfluent ?

Thanks again.

--
PG
 
With 50 million records, I'd recommend against indexing anything with fewer then a couple of hundred possible values. Index the obvious high variance fields first, then test the performance while adding in indexes one at a time.

The order of the fields in your query is not necessarily important, but I'm not as familiar with the vaguaries of the mysql optimizer as I am with Oracle's. Have you read the fine manual on the subject? Try reading and
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top