Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Understanding Data Mining

Status
Not open for further replies.

Darrick

Technical User
Aug 28, 2001
77
US
I've been reading and trying to comprehend Data Mining so I can apply it to several lines of businesses within my company (Mainly a call center environment.). However, I'm just not getting the hang of it. I am using MS Analysis for Data Mining, and have found that I need some sort of Regression algorithm. Is there one available to "Add on" to MS Analysis?

Also, I do not understand the difference between the data to "Train" the model and live data for the analysis? If I can't use real data to train the mode, how will I know it is accurate and if I can't use the same data to analysis how will I get results? Can someone explain this to me in basic terms?

Thanks in advance,

Darrick
dderby (AT NO SPAM) pscufs (dot) com
 
I recommend the book Kimball wrote about datamining.
I don't kwow the full name anymore, but you'll find it.

It will get the hang of it.

BigMag, The Netherlands.
bigmag@chello.nl
 
Great questions Derrick. See comments below...

I've been reading and trying to comprehend Data Mining so I can apply it to several lines of businesses within my company (Mainly a call center environment.). However, I'm just not getting the hang of it. I am using MS Analysis for Data Mining, and have found that I need some sort of Regression algorithm. Is there one available to "Add on" to MS Analysis?

<BKJ>
First... some ideas on how to use DM in a call ctr environment. Say you work in a customer care center, use DM to find...
- which customers have the highest propensity to call the most and longest? ID them and then find a way to prevent calls or to resolve them quicker.
- when a customer calls, is there another product/service that we can offer them? Here you use DM to ID a customer's propensity to take different products/services. When they call, you offer the one with the highest propensity. Sometimes called Next Best Activity.

These are just 2 examples. The theme here is to identify customers with specific behaviors so you can take action.

Regarding MS Analysis for Data Mining - is this the analysis services package of MS SQL Server? Not sure of add-ons. Realize that regression is one “tool” used under the varying definitions of data mining. Other “tools” include decision trees, neural networks, and clustering. Tool is dictated by what you are trying to accomplish, what is available, etc.
</BKJ>

Also, I do not understand the difference between the data to "Train" the model and live data for the analysis? If I can't use real data to train the mode, how will I know it is accurate and if I can't use the same data to analysis how will I get results? Can someone explain this to me in basic terms?

<BKJ>
You do use real data to build/train and validate the model. Little background and detail:

There are 2 basic types of data-mining – prediction and description. An example of description is segmentation – dividing your customers into like groups (e.g. high value) so you can take action.

Seems like you are more interested in prediction – when you want to assign a likelihood or value of a future event. Examples include “of my customers, who is likely to call the most next month?” or “which headsets in the call center are most likely to break?”

The premise behind prediction is that future behavior is related to past behavior. This is where training and validation come into play.

Let’s look at the process through an example. Say you are trying to identify who is likely to call next month. First, you would get data of who called and did not call this month as well as some data describing these customers from the past (how many times they called over the last 3 months, where they live, etc.).

Then, you randomly split the data into at least 2 portions: Training and Validation (AKA test). The training portion is used to build the model and the validation is used to see how the model works on other data. It may sound funny that you would use the same data (originally) to build and validate but we are mostly trying to see if there are any anomalies, bias, etc. that would cause the model to not be effective. You could also test/validate the model on data from a different call center or different time period, etc.

We can go on (and on…). See if this answers you questions. – Brian
</BKJ>

Thanks in advance,

Darrick
dderby (AT NO SPAM) pscufs (dot) com
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top