Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Storage Engine for large amounts of data

Status
Not open for further replies.

forces1

Programmer
Apr 5, 2007
29
NL
Hi all,
I'm creating a meta-search engine which stores the spidered text in a database. Of course, this takes a lot of space. When I started I had a MyISAM storage engine that became 185 mb as size. Then my database (also all my other tables) started to fail. I guess the database was too big, because when I removed the table, everything worked fine.

Now my question is: Is MyISAM not the right storage engine for large amounts of data? And which one is the right storage engine then?

Thanks a lot..
 
I'm suprised you have problems with a table at 185m, here is my largest.

-rwxrwxrwx 1 mysql mysql 9.0K Jun 8 2006 IMAGES.frm
-rwxrwxrwx 1 mysql mysql 57G Jan 23 10:30 IMAGES.MYD
-rwxrwxrwx 1 mysql mysql 99M Jan 23 10:30 IMAGES.MYI


______________________________________________________________________
There's no present like the time, they say. - Henry's Cat.
 
That's strange then. But what kind of storage engines are those? The largest is MYD?
 
When you say fail? Do you mean lag? Or become corrupt?

If your crawler is multi threaded, as any crawler should be, then you probably want to stay away from MyISAM tables because they perform poorly with unique keys and multiple inserts. This is because they don't provide row level locking, which InnoDB does, so one select/insert will lock the entire table until it's completed. With tables that have unique keys (such as domain, etc) this is a problem as the table gets bigger and things will start to slow down as the crawler multiple threads issue multiple selects and inserts.

Even with a single thread you'll still have the entire table locked up until the query completes with MyISAM.

I have some crawler too, and large databases. InnoDB performs well.

Also, if you're crawling a lot of data, research database sharding. Even with innoDB your database may eventually get too slow to query faster unless the tables are sharded.

I have TB size databases and the only reason they survive is because of sharding.

Good luck.
Luc L
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top