I am curious as to whether data miners here had preferences for rule-based systems, neural networks, decision trees, etc. for data mining? What have your experiences been?
aseth (over on the Yahoo! Data Mining Club) wrote:
"The few factors that bias the choice are:
A. Type of data. If most of the data types are numeric and the non-numeric data types have ordinality associated with them, then the choice usually is NNs. Data sets rich in non-numeric data types usually warrant some kind of a rule induction based system."
Yes, definitely. While different data types can be "bent" into others (dummy variables, binning, etc.), modeling systems make more natural fits with some kinds of data than others.
aseth continues:
"B. Customer preference of model type. Some customers are adamant on seeing rules and have been (mis)educated on the black box nature of NNs. While some others (especially in the financial/banking sectors), love NNs and have large systems deployed around NNs (like fraud detection systems)."
This is what I call the "political" factor. I think many people think they're getting something they're not, and not just with neural networks. In marketing, for example, it is popular to find "segments" (clusters) in populations, and look for "triggers" of important behaviors (purchasing, defection, etc.). I don't think this is as simple as many people seem to believe.
aseth continues:
"Bottom line:
Usually, I let my guys loose on both types of modeling for a given data set (obviously after careful preparation of the data to suit the type of modeling that the data is going to be put through). The results of all of these are compared - a.k.a model efficiencies are discerned using various methodologies (error rates, tendencies for over-fitting, etc.)."
This is what I was really wondering about. I like to throw multiple tools at any problem, but I have far more tools than I generally have time to try. As a consequence, sometools get used more than others, for a variety of reasons (convenience, testing capabilities, etc. in addition to typical accuracy performance). I have noticed that many data miners develop a preference for particular types of tools.
The way I see this is that A Neural System NS would be a Combination of other System and had more to do with a complex and diverse gama of Source data and Meta Data, that you would combine to Generate Bussines Decision Support Data that could be applyed to several segments of the Bussines.
This aproach would imply that you are building a Big umbrela DW insted of several unconnected and Segmented DW's for each bussies Decision Center.
AL Almeida
NT/DB Admin
"May all those that come behind us, find us faithfull"
It might help to look at the cardinality of your data set. Perhaps there are 10,000 records and 500 unique combinations of input variables. This suggests that there is not a lot of sampling variability; the data is rather "clumped." In this case, a rule-based system such as TREE or even a lookup table (500 rows with predictor patterns and dependent variable) is more reliable and more interpretable.
A neural network will try to interpolate in the sparse space of the sample - and while it could give a plausible solution, you are standing out there on a twig as far as reliability goes.
Conversely, when you have a rich sample, with decent sampling adequacy throughout your multidimensional space, a mathematical/interpolating method such as neural networks performs beautifully and gives a more granular solution.
Therefore, step one for me is always to count the number of unique patterns in the data set, to plot the spread of values for each variable and pairwise variables, and to look at the cardinality of the data set. Sometimes having low cardinality can signal you that your data sample is lame, and that you should seek more cases in the unpopulated regions of the space.
crazy analyst wrote:
"What can please give more information about optimal binning for targert in data classification?"
There are many automatic binning schemes. They may consider one variable at a time, or several, and may or may not consider a target (dependent) variable in the process. A Web search will turn up many different methods. The simplest are equal-width (just cut up the range of any given variable into bins of equal range) and equal-frequency (cut up the data by quantile). Look for chi-merge, also.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.