Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Data Mining?? 8

Status
Not open for further replies.

aljubicic

Programmer
Nov 7, 2002
82
AU
Hi all,

Could someone explain to me in layman's terms what is Data Mining and the mechanics of it ??

Regards
Anthony
 
THIS IS THE ART OF MINING DATA ...IN OTHER WORD MANIPULATING A SYSTEM TO GET INFORMATION TO PUT IN REPORTS... ALSO CALLED MIS
 
pwrpuff -

Are you equating data mining with MIS and reporting?
 
pwrpuff (Instructor), on 17 Feb 05, write:
"THIS IS THE ART OF MINING DATA ...IN OTHER WORD MANIPULATING A SYSTEM TO GET INFORMATION TO PUT IN REPORTS... ALSO CALLED MIS"


That is not data mining at all. Data mining is typically a empirical modeling process, and has little if anything to do with MIS or simple reporting. See the FAQ here, for "What's the difference between data mining and data warehousing, databases, querying, etc.?"


-Will Dwinnell
 
I concur! Data Mining is NOT just MIS reporting. Far from it.

As we use it, Data Mining is more about finding metrics and important indicators in your data that you weren't previously aware of. Even predicting future events. MIS Reporting is about tracking indicators and metrics you are already aware of (reporting the past). The mining is often done using AI tools such as neural networks and decision trees to make very complex pictures of corelated factors in your data.

As an example, say you are in a service industry and you want to keep customers from leaving and going to a competitor. One way to keep customers is to offer incentives to stay. It's too expensive to offer incentives to every customer, so you would like a way to predict which of your current customers may be about to leave. This is where the mining comes in. Take as much information as you can possibly find about customers that have left you already. Information such as their usage, past bills, reason for leaving, demographics (zip code, income, nationality, age, shoe size, dependents, etc), everything you can get. Feed these customers who have left to a neural network to train it. At this point you have just trained the neural network to be able to identify a customer that's likely to leave. You then feed your whole current customer base to the neural net to rate your customers as to how likely they are to be disgruntled and want to leave. You then proactively offer incentives to the customers getting the highest disgruntled rating to make them "happier" before they decide to go to your competitor. It's not perfect, you may give incentives to some happy customers, but you will also give incentives to many unhappy customers. Hopefully, you lose fewer customers and have minimized the expense doing it.

This is just one example of Data Mining, there are many other things that can be done that fall under that category. But in general, it's an attempt to have your data tell you things about the business you weren't aware of, where MIS Reporting is telling you things about your business you ARE already aware of.

There are a lot of Data Mining software vendors out there. Visiting their sites can give you many more examples and ideas.

Hope this helps.
 
Data mining is a data-driven decision support process. Data mining is used to discover previously unknown relationships among the data. Contrast this with traditional decision support such as reporting, data warehousing and OLAP, which are model based techniques. The model's structure, either a report format or data structure design, is based on the designer's belief that certain aspects of the business are important and need to be monitored; metrics and KPI's.

The modeler assumes some knowledge about what is happening in the business and attempts to verify his or her hypotheses via data analysis.

Data mining assumes nothing about the business and searches for trends and relationships across the data.



-------------------------
The trouble with doing something right the first time is that noboby appreciates how difficult it was.
- Steven Wright
 
Meaningless management/pm speak, it's reporting with queries that are a little more complex than normal, larger numbers of tables, sometimes from multiple dbs, that's all, but in the end, still just reporting. The term comes from the same people that like to say "on a go-forward basis" and "maximizing your core competencies". ick...
 
hejamana -

I have seen many defintions of "data mining" but I do not consider one to be "complex reporting".

The first definition listed at google (states:

The process of analyzing data to identify patterns or relationships.

This is more in line with my experience. For example, we have reporting that reports on customer churn. In fact, this reporting even covers some of the attributes of churners. However, until we built a churn predictive model, we did not know what the top attributes that differentiate churners from non-churners were. Knowing these relationships allows us to focus our retention efforts on the customers in the greatest need.

I consider this predictive model process as data mining.
 
Yeah, I can describe a pencil as a "carbon-cored self-diminishing user pressure-stabilized writing utensil requiring periodic reshaping on a go-forward basis" too...
It's reporting dude.
 
While it is true that there are many pretenders out there, peddling consulting double-talk, there really is a data mining discipline, and it has nothing to do with report-writing. Data mining is the application of statistics and machine learning to large data sets, generally to produce predictive models. As an example, I build mathematical models to predict which customers will repay their loans and which won't. I've yet to meet anyone in I.T. who is qualified to perform data mining.


-Will Dwinnell
 
Agree with Predictor. However, the models produced by the results of data mining often become part of the model-driven decision support process.

Data mining, by itself, assumes nothing about the data (no model) and looks for relationships (correlations, etc), trends, etc. in the data.

For instance, in my current application area (auto insurance), we have found that credit score is a good predictor regarding a person's risk as far as having accidents. Although there is no clear reason why this correlation exists, IT DOES, and can be used with some degree of reliability to predict accidents and claims and hence be used for establishing policy rates and risk.

The same can be said for zip codes. There are certain demographic and geographic aspects regarding zip codes that can be used to predict comprehensive claims. This, however, can be explained rationally. Certain city zip codes are more prone to having stolen cars or break-ins. At the same time, those city zip codes are unlikely to have claims assoicated with fallen trees caused by storms, which would be more common in rural areas.

-------------------------
The reasonable man adapts himself to the world. The unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. - George Bernard Shaw
 
Hi
I've added some (actually four) data mining definitions on my web site ( This might be helpful for you. As for mechanics - there are some descriptions on data mining methods page.
Best Regards:)
 
Another one of those annoying meaningless yuppie boomer neo-terms for something that's been done for decades, even in the file cabinet days before computers. Dig for data relationships that aren't in any existing report yet. In other words, "we don't have a report that does X?? write a report that does X!"
Add it to:
"code reuse"
"on a go-forward basis"
"paradigm shift"
"talking points"
"core competencies"
"mission statement"
"win-win"
and my all-time favorite:
"synergy"

There is nothing new under the sun.
 
No, no, no. Data-mining isn't a neo-yuppie term for report-writing. The word for report-writing is "report-writing". Data-mining is exactly what it says it is, in plain english.

A mine is a place where you dig in the ground to extract something valuable that is buried there. Data mining is where you dig in the data to extract meaning. The difference between data-mining and extracting data to put into a report is similar to the difference between digging in the ground to find raw materials that you didn't put there, and going down into the cellar to take out things you did put there.

(Data-warehousing, which people get confused about, is also plain-english. A warehouse is where you store things, so data warehousing is about storing data. Obviously you need to get goods out of a warehouse; getting data out of a data-warehouse can be a matter of compiling reports.)

SamBones example illustrates the difference beautifully:

Imagine that you work for a bank, and create queries and reports. Your manager asks you to write a system that extracts the names of all clients who borrowed money last week. No problem.

Now imagine he comes to you and asks you to write a query that will extract the names of clients who are going to borrow money next year...
 
In the old days we used to have databases that held a subset of the main database for various reasons, usually departmental, we still do, we never thought to call it a "data warehouse". We dug for stuff in it and the main database all the time for stuff that wasn't in any existing report, still do, never thought to call it "data mining". I'm old enough to see through all these corp-speak yuppie terms that seek to rename old things so that they appear new.

I am waging a one-man war on corp-speak.

There is nothing new under the sun.
 
Data mining is not "reporting". Whatever the issues surrounding labels, or what is or is not new, data mining is the construction of statistical models, not the querying of databases.

When a SQL query has been executed (or someone fetches a manilla folder from the file cabinet), the result is a set of data.

When a data mining program is finished, the result is a statistical model of the data.


-Will Dwinnell
 
Yes, a report containing data about data, how many people who bought ford pickup trucks last year also bought goodyear tires, and how many were women, we used to dig for stuff like that back in the 80s on our old NCR mainframe. Damn I'm old!
[sad]

If more than 1 goose are geese, why aren't more than 1 moose meese??
 
You are so right eyeswideclosed. Nothing is new, everything has been done and said before. Just love your handle, too. It's so........beautifully descriptive.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Why thank you. When you've been around for as long as I have you tend to see patterns, but if you want to believe all this is new then that is your business, but when they rename it yet again in 10 years (data dredging?), you'll see what I mean.

If more than 1 goose are geese, why aren't more than 1 moose meese??
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top