Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

What factors should I consider when selecting a data mining tool?

Data Mining

What factors should I consider when selecting a data mining tool?

by  Predictor  Posted    (Edited  )
Which data mining tool best suits a given need depends on a number of factors.

One important factor is the specific analysis which needs to be performed. Broadly, the most common data mining analyses are: 1. classificiation, 2. numeric prediction, 3. clustering (segmentation), 4. anomaly detection and 5. association rules analysis ("market basket analysis"). 1 and 2, above, will be much easier to perform properly if the tool provides appropriate testing procedures (at least one of: holdout testing, cross-validation, bootstrapping, etc.).

Another major factor in this decision will be the budget for tools, although I'll note that pricetag is not a good indicator of data mining tool quality. There are some relatively inexpensive tools (<US$1,000) which are very capable and some very pricey ones (>US$100,000) which are fairly weak.

My personal opinion is that, for most analysis, multi-tiered (client-server) tools and special conenctions to the database are unnecessary. Commodity desktop computing hardware has enjoyed astronomical growth in power (which continues unabated), and I have built predictive models on 1,000,000+ rows of data using off-the-shelf PCs. Are there some applications which require more than this? Yes, but ordinary, inexpensive PCs serve well for many applications, and more every day as the platform grows in power.
Register to rate this FAQ  : BAD 1 2 3 4 5 6 7 8 9 10 GOOD
Please Note: 1 is Bad, 10 is Good :-)

Part and Inventory Search

Back
Top