Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

slow sql query on lg indexed table

Status
Not open for further replies.

cc4mail

Technical User
Jan 8, 2002
47
US
I have a DB table with 4.1 million unique rows of dword number blocks, indexed on start.

start end location
1259273696 1259273703 11532
1259273704 1259273704 4249
1259273705 1259273705 3487
1259273706 1259273711 4249
1259273712 1259273727 11532
1259273728 1259273791 117384
1259273792 1259273855 4053
1259273856 1259273983 117384
1259273984 1259274239 116001
etc....


$sql = "
SELECT location
FROM blocks
WHERE start<=" . $dword . "
AND end>=" . $dword . "
LIMIT 1";

$loc_code = $db->get_sql_field($sql,'location');

the query works, but with the DB size, the query is very slow. I also tried BETWEEN, also very slow.

any suggestions on how to improve the query?

perhaps using the row prior to the result row?
WHERE start>= $dword

if so, how can I select the row prior to the query result row?


thanks !

 
I would suggest putting an index on "end" as well. That way, MySQL can see which of those two indexes yields the most optimal query.

apart from that, the result may be very large and I don't see any sort order that could give an optimization for both fields. So the optimizer can discard a part of the table based on either start or end, but all the remaining records must be checked for the other column. Alas, that is in the nature of checking for greater or smaller instead of just equal.

+++ Despite being wrong in every important aspect, that is a very good analogy +++
Hex (in Darwin's Watch)
 
Thanks. I felt I was overlooking a simple solution.

Would this query...

$sql = "
SELECT (the internal mySql row number)
FROM blocks
WHERE start>=" . $dword
LIMIT 1";

avoid a row by row examination, on an indexed table?
Then, I could retrieve the (mySQL row number result -1) for the correct row result?

$sql = "
SELECT location
FROM blocks
WHERE (mySql row number) =
(the above row number result)-1
LIMIT 1";


I recognize I would have to include an error routine for the first row match.

All the start numbers are consecutive from the prior row end block numbers.

However, I haven't found any reference to a mySQL function() for a internal row number?

 
first, There are no row numbers in MySQL. And with reason. An index is a sorted list of values with pointers to records. So if you use an ORDER BY clause with an indexed field, you get the answers as fast as possible. But that tells you nothing about where those records are! That is why results from one index are not indexed anymore to use with a second index: it is sorted totally different.

Furthermore, what you are trying to do is done by the query optimizer optimizer if possible. Don't try to outsmart it. If you are interested, read the optimization section of the MySQL manual.



+++ Despite being wrong in every important aspect, that is a very good analogy +++
Hex (in Darwin's Watch)
 
Will do, I appreciate the lession and why I couldn't find a row number function - Thank you. (and the sunday response)

Since my db is static (data never changes), it would make sense then just to add a field "row" to the DB

then

$sql = "
SELECT row
FROM blocks
WHERE start>=" . $dword . "
LIMIT 1";

would get the first instance,
then..

$sql = "
SELECT location
FROM blocks
WHERE row = " . $return['row']-1 . "
LIMIT 1";

I realize the query() is missing, but would this be faster?

 
No. If you put an index on "end" as well as on "start", the data in your table is sorted in 3 ways:
- in the default way (by primary key, for instance). But there really is not something like a default order you can trust. Add an index and it could change.
- Sorted according to "start",
- Sorted according to "end".

If you split the table according to a point in the "start" index, you have split a totally random set of rows according to the "end" index. That the data is sorted by "start" does not mean it has a comparable sort in "end". So if you want to split the table in "start" point, you will have to check every row to see if you can use the position according to the "end" field. It is like two coordinates: "start" and "end" are independent.

Now MySQL has a clue to how your indexes are working. So if the "start" criterium would leave 80% of the table to be analysed and an "end" criterium just 60%, MySQL would use the "end" index.

But MySQL should stop when it finds one, as you have a LIMIT 1 clause.

Anyway, what does
Code:
EXPLAIN SELECT location FROM blocks WHERE start<=(somevalue) AND end>=(somevalue) LIMIT 1
tell you?


+++ Despite being wrong in every important aspect, that is a very good analogy +++
Hex (in Darwin's Watch)
 
Interesting problem...
First indexing might not always increase speed of access..
You should consider using partitions...
Not sure there is an easy answer other than trying a few different approaches....
You may notice that simply scanning the full table works pretty well.

You say your data never changes... do you have to do this with a database.
For example if you write a simple series of integers to a disk file turning your database table

start end location
1 5 8789
6 7 8865

into a disk file
8789 8789 8789 8789 8789 8865 8865

Then to find the location value of something at 1259273703, your first example, you just seek to the 8*1259273703 position in the disk file and read the integer value.

Might be a bit wasteful of space. So you might simply write the file as your table illustrates with start (you dont need the end) and location and then use a binary chop algorithm to find the value.

My guess is even these crude approaches would be significantly faster.


 
I added the end index - still about a 7sec user second query.
-actual query .0002 for start<= only query
-actual query 2.193 for start<= and end>= query.

EXPLAIN - after I added ur suggestion, index "end" - sorry, couldn't make the display cleaner.

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ip_blocks ALL start,end NULL NULL NULL 4149287 Using where


wouldn't making a primary index on a "row number" field and run 2 querys, one for the start <= LIMIT 1 and then get the row prior be efficient?

I did read (and re-read a few times) your explanation of random rows, however the data is consecutive and static. - or is there a better way to enter the data - I checked making each number in the start , end block an individual row, that would be 850m rows - ouch


 
Now this is funny. MySQL knows that two indexes can be used and uses neither of them! (possible_keys="start, end", while "key"=NULL). MySQL not using an index is normal for very few rows, but you have plenty of rows for an index to be helpful.

If the table had been an archive table (ENGINE=ARCHIVE), there would have been no indexes.

What MySQL version do you use? And does it help to give the dword value the same type as the fields (all numeric or all strings)?

Does it help to provide an index by force?
Code:
SELECT location FROM blocks FORCE INDEX [i]index_name[/i] WHERE start<=(somevalue) AND end>=(somevalue) LIMIT 1
Where index_name is the name you provided or the one that shows in the result of SHOW CREATE TABLE location
Just pick one (start or end)

For the row number idea:
your table contains effectively two coordinates (start and end). MySQL should use the best index, so split your field of solutions as neat as possible. If you draw a line in that field, there's no way to tell what rows are left/right or above/below, because row numbers are not sorted that way. And if they were, that would add nothing because either the start field or the end field is already sorted alike.
Thos coordinates may be static, but you are searching in a kind of "point cloud".

+++ Despite being wrong in every important aspect, that is a very good analogy +++
Hex (in Darwin's Watch)
 
DonQuichote
>>Now this is funny. MySQL knows that two indexes can be used and uses neither of them!
I think this is to be expected the indexes may well be larger than the data so scanning the table is probably easier than using the indexes.
 
mySql 5.0.81

CREATE TABLE `blocks` (
`start` bigint(16) NOT NULL,
`end` bigint(16) NOT NULL,
`location` bigint(16) NOT NULL,
KEY `start` (`start`),
KEY `end` (`end`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

used the FORCE INDEX with start and start, end

using start only, was approx 2 secs on the user end ( 5 secs faster) using start and end was 3 secs -actually slower - interesting.

mysql ref manual noted the indexes cannot be named the same as column names for FORCE and USE so they are changed to startndx and endndx.


$sql = "
SELECT location
FROM ip_blocks
FORCE INDEX (startndx)
WHERE start<=" . $ip_dword . "
AND end>=" . $ip_dword . "
LIMIT 1";

saving 5 secs on the user end is pretty dramatic.

Thanks!!!! Your help is really appreciated.

any other ideas?

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top