Is Size an issue....

jpadie · Sep 6, 2006

i am about to start helping on a data-mining project where the starting data weighs in at 350GB. this is the compressed size on a set of backup tapes so i presume the uncompressed size will be greater.

i do not need the data mining scripts to be fast but i do need rapid view access to the recordsets. I am not hopefully going to be deploying expensive servers on this project either: it's hopefully not going to be a big job.

Does anyone have active experience using mysql with data of this size? will it cope on low-mid spec machines?

bear in mind:

the first few tasks will be removing columns that are obviously irrelevant to our purpose.
the next few will involve an analysis of which further columns we can delete.
then a cleansing exercise involving simplification of various columns
then a transformation on the remaining recordset
lastly a single query across the entire recordset that is intended to result in a single number (the data is several years of financial data from which we are trying to derive an index that will be maintained monthly going forward - depending on where we end up we might well decide to recalculate the index on the fly per additional base record: all depends on how long the generation query takes.

thanks in advance for any insights you may have

Justin

KarveR · Sep 7, 2006

I haven't run anthing with that much data, I guess the closes I've been to that is around 90g.

This was run on a dual xeon 2.8 with 4g of ram, the hardware didn't get much above 3% most of the time.
Our dev server is a HP DL140 which performs just fine (£540ish UK) and currently has around 75g of data.

All I can really offer is ; throw as much ram at it as you can, as you don't say what you feel is a low-mid spec.

______________________________________________________________________
There's no present like the time, they say. - Henry's Cat.

jpadie · Sep 7, 2006

thanks. the budget is about £500 which was what i meant by l-m spec. might be an interesting task for Yonah and a very big SATA disk...

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Is Size an issue....

jpadie

Technical User

KarveR

MIS

jpadie

Technical User

Similar threads

Part and Inventory Search

Sponsor