data extraction from a table with 20Million records

max1x · Sep 9, 2009

I'm extracting data from a MySQL DB that has greather than 20 Million records in it and 12 fields (or columns) defined. The engine is MyISAM and the field I'm interested in is a defined as the primary key and as a BIGINT.

The query takes 3-7 mins with a where clause. Is there anyway to optimize the query?

select cid_1,cid_2 from clients where cid_1='1400001';

DonQuichote · Sep 9, 2009

What does "explain" say?

Code:

 EXPLAIN select cid_1,cid_2 from clients where cid_1='1400001';

+++ Despite being wrong in every important aspect, that is a very good analogy +++
Hex (in Darwin's Watch)

darrellblackhawk · Sep 9, 2009

As DonQuichote says: "What does explain say?"

If as you've posted, cid_1 is the primary key, it shouldn't take anywhere near that long. It sounds like somehow the engine's doing a row scan of the table.

ingresman · Sep 10, 2009

I'd have to look thisd up but I guess mysql is looking at the number of rows in the table and the estimated number of roes it might find (the selectivity) if it exceeds a threshold it will say I'd be better off doing a table scan that doing loads of keyed reads. Where loads is the number it decided to find.
If they key was unique it should know that there is only 1 possible value for the key and should just do a key read.
could you make it unique (which it should be if it really is unique).
you could also try creating an index on cid_1 and cid_2 which should ocupy a lot less pages than the base table so if the optimizer tries a table scan it should actualy do it on the index and should be a but quicker.
I'll have a further dig when I get home about increasing selectivity.

DonQuichote · Sep 10, 2009

No, that should not be the case. If there is a primary key constraint on cid_1, MySQL should perform a binary search on that field. Also, it should return at most 1 record due to this constraint. So this should be very fast. If it is so slow, my guess is that there is no index on cid_1. A compound index will help nothing here if the primary key already exists, but will help a great deal if the primary key is missing. Even then, the compound index should list cid_1 first to be effective here.

MySQL willingly uses no index for tables with fewer than 4 rows or so. But 20 million records should definitely cause an index to be used.

+++ Despite being wrong in every important aspect, that is a very good analogy +++
Hex (in Darwin's Watch)

ingresman · Sep 10, 2009

Yes agree if the primay is unique then the index wil be used. From the OP's description it looks like cid_1 is the primary.
Could you ever NOT have an index on cid_1 if it's defined as the primary ?
Maxlx
Could you post your table definition and the explain please

ingresman · Sep 10, 2009

From the manual

A PRIMARY KEY is a unique index where all key columns must be defined as NOT NULL. If they are not explicitly declared as NOT NULL, MySQL declares them so implicitly (and silently). A table can have only one PRIMARY KEY. If you do not have a PRIMARY KEY and an application asks for the PRIMARY KEY in your tables, MySQL returns the first UNIQUE index that has no NULL columns as the PRIMARY KEY.

So it MUST have a unique index created for it. However we need to understand how the primary key is defined e.g. if the primay is defined as cid_2,cid_1, the optimiser won't really see it. Also is the key defined as btree, it seems you can use hash or rtree (not sure if mysql engine supports these) it would also impact the query plan.
There is an interesting article at

http://sheeri.com/archives/77

So let's see the table def !

max1x · Sep 13, 2009

Thank you for everyones suggestions.

If I run the the query with LIMIT 1, explain results are below, which I have checked before and did not see any issues with it.

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE clients const PRIMARY PRIMARY 8 const 1

I also checked for "duplicate primary key values", but there are none.

Now I'm seeing a bizzare pattern, when if I run an inline query the results are rendered very quickly, but if I run it via a PHP script, which does nothing more than create a DB connection and run the same query, the results take a very long time to populate.

Mystery continues...

DonQuichote · Sep 13, 2009

So the problem might be in PHP. Maybe you should ask in the PHP forum and post some code.

+++ Despite being wrong in every important aspect, that is a very good analogy +++
Hex (in Darwin's Watch)

max1x · Sep 14, 2009

I agree, post much testing, it was an issue with how PHP / AJAX communicating with the server, the snoops on the communication ports showed dropped packets and also in some instances getting hung.

The ISP has reported some issue with overload and the issue got resolved.

Thank you everyone for your assistance and direction. I knew I wasn't going crazy.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

data extraction from a table with 20Million records

max1x

Programmer

DonQuichote

Programmer

darrellblackhawk

Programmer

ingresman

IS-IT--Management

DonQuichote

Programmer

ingresman

IS-IT--Management

ingresman

IS-IT--Management

max1x

Programmer

DonQuichote

Programmer

max1x

Programmer

Similar threads

Part and Inventory Search

Sponsor