azzazzello
Technical User
Hi guys,
I have a set of 16 tables TABLE_0 through TABLE_F, where a primary key for each entry is a hexadecimal number 32 digits long. Each table contains entries starting with a corresponding HEX digit (so TABLE_0 contains numbers starting with 0, TABLE_D - with D's, etc). There is approximately the same number of records in each table, and that number is currently 1,800,700. Which makes about 30,000,000 records in the set of 16. Every day about 3,000,000 records are processed such that basically if an entry exists, it is updated with new information - if it does not, an insert is made. They also get equally spread up between the 16 tables. 95% of these daily 3M records are UPDATES - only about 150,000 are new inserts. I am doing an multiple INSERT ... ON DUPLICATE KEY UPDATE, a 1000 records at a time per table. There are 3 indexes on each table, one on ID and 2 on DATE fields, one of which only gets set on new INSERTS and never updated after that.
Now my problem. It is dreadfully slow
. 100,000 records are processed in about 10-12 minutes. As I said, I run multiple insert/updates with 1000 a pop, and I optimize the tables every day before the processing. All the tables are MyISAM, MySQL version is 4.1.15, the server is a dual 3.6Ghz Xeon with 4G of RAM. I have verified that it is the updates that are taking this long, because if I delete the entries needing updates beforehand and then run the script, it will take much, much faster (5 - 6 times faster). I realize that if I have a table for each of the first 2 hexes instead of just one, I will have 16 times less entries to look through, but I would rather not have to manage 256 tables. I am using perl to processes and DBI to interface with MySQL if that matters. Any suggestions as to how I can improve the speed of this system? Perhaps INSERT...ON DUPLICATE KEY UPDATE is not the best thing to do if I know that 95% will be updates? Thank you in advance
I have a set of 16 tables TABLE_0 through TABLE_F, where a primary key for each entry is a hexadecimal number 32 digits long. Each table contains entries starting with a corresponding HEX digit (so TABLE_0 contains numbers starting with 0, TABLE_D - with D's, etc). There is approximately the same number of records in each table, and that number is currently 1,800,700. Which makes about 30,000,000 records in the set of 16. Every day about 3,000,000 records are processed such that basically if an entry exists, it is updated with new information - if it does not, an insert is made. They also get equally spread up between the 16 tables. 95% of these daily 3M records are UPDATES - only about 150,000 are new inserts. I am doing an multiple INSERT ... ON DUPLICATE KEY UPDATE, a 1000 records at a time per table. There are 3 indexes on each table, one on ID and 2 on DATE fields, one of which only gets set on new INSERTS and never updated after that.
Now my problem. It is dreadfully slow