Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Data Mining and Information Retrieval 1

Status
Not open for further replies.
Sep 15, 2003
4
0
0
SG
dear all,

I am confused with these two:
1) Data Mining and 2) Information Retrieval

To mee, two of them are the same - focus on how to retrieve the information. But not sure if each of them is refers to separate item. Please confirm.

Any other question is Data Mining is just a want to extract the information from the database. To do this, we can write a SQL query to do it. What make this Data Mining so special?

Please advise. Thanks in advance.
 
Data mining is about data pattern recognition.

Information retrieval is about extracting data from databases - what a user does with the data is up to them.

Normally to perform data mining, you have to start with information retrieval.

More info can be found at and at
 
Data mining is a method of looking at data, and finding correlations that you did not even suspect existed.

For example in a huge database of customers, you might out that certain demographic group LOVES your product, but only in the state of California, only for 2002, and this same group HATES your product this year. In NY state everyone thiks your product is OK, but does not change year-to-year.

Data retrieval allows to pull up all the people in California that bought your product, but you would have to know the question before you started.

Cheers
BR
 
>Data retrieval allows to pull up all the people in >California that bought your product, but you would have >to know the question before you started.

Would this not be OLAP?

M
 
Marlene

Yes!

If you *KNEW* to query the data, for that certain group of customers only in California, from last year, compare it to this year, & compare to all of North America, that is just standard OLAP. OLAP is great for this type of data querying, splitting up data by some criteria that the user selects.

However before you started, you would NOT know to query this particular subset of the data, because you had no idea that this relationship existed in the data. If you use data mining techniques, they would lead to criteria to use on the data, for OLAP to show the results.

I hope this makes sense, because that's as much data mining as I understand. I am decent at OLAP, but data mining is over my head.

Cheers
Bruce
 
Hi All,

thanks for the input. However, I still don't see the difference:

Data Mining
"Data mining is a method of looking at data, and finding correlations that you did not even suspect existed..."
If we like to look for particular data, didn't we specify the "key" in our query and use this "key" to do the matching in the database?

Information Retrieval
In your example, you says the data extraction is the way to retrieve the data from the database. Again, in order to retrieve the data from the database, we can write a SQL Query to do this. i.e., I have Visual Basic created the interface and connected it to my MS Access database. I can create some SQL Query to retrieve the information from my MS Access easily.

Why I ask because I have seen a lot of research being carried out in these two fields. But I don't understand why people still do this research? What make them so special? Like I says, a SQL Query can pull out the data from the database easily...

I hope you can help.

Thanks again
 
Batman

As you said, writing a SQL query from a database is easy stuff (or in OLAP). The tough part is deciding WHAT query to ask out a large database. Should you pull all males who bought gifts for their wives? Should you pull all customers that make over $100,000 who live in NY? Are you supposed to look for customers that did not finish high school, that started their own business? Should it be a combination of these factors? How do you decide what queries to write?

This is were data mining comes in, sifting through the mass of data, & finding causal relationships in the data. Once the data mining analysis is done you get relationships.

I saw one example, in England they discovered a relationship in supermarket shopping. They discovered that men that bought nappies (diapers), were more inclined to buy beer at the same time. The reason seemed clear AFTER they found this out, that the men buying diapers, are usually going to be staying home & not go out pub crawling with the boys.

This example was only possible to find with data mining techniques, so unless the person writing the queries guessed this relationship between beer buying & diapers with men, never would have found this using SQL alone.

Data mining is very powerful, but also very difficult to do well, as it not have a pre-defined goal before you start.

Cheers
BR





 
See also thread354-418882, as JTM says this better than I could.

Bruce
 
Data mining is discovery of unknown patterns that exist in your data based upon algorithms. A good example for datamining is persription drugs. Phrama companies do not instantly know that 80% Drug C is perscribed in conjuction with Drug A when the patient is a White male between 25 and 40 who lives in the north east. However by applysing specific algorthims based upon defined criteria they can process their data and see that this pattern does exist. This would prompt them to then analyze other factors to find out why this pattern exists.

While datamining can be viewed as Information Retrival it is more part of the process of Information Retrival. The information Retrival targets the data to answer the question developed from the patterns exposed from data mining.

As for your question regarding can you write a sql statement to do it or do you need other tools. You could write a sql query but the complexity of the query would probably make this approach less than desirable. There are a number of datamining tools that make the process much simpler. However before starting a datamining project make sure you or someone involved in the project has a solid foundation it datamining concepts and practices.

"Shoot Me! Shoot Me NOW!!!"
- Daffy Duck
 
BS/Daffy Duck,

"...The tough part is deciding WHAT query to ask out a large database..."
To solve this, didn't we need to do analysis first? Like ask our user what information they want to see, then we build the criteria then we call pull this via SQL Query already.

If we don't know the criteria, how can we pull out this from database? I have no clue at all how data mining can help in this process. I am also not very sure how the data mining can know what is related to what if we didn't define the criteria in upfront. example, if I want to pull out all the data from table XXX for those who are older than 50, then I will use
Select * from XXX where Person_Age > "50"

see, because I know the criteria (age greater than 50), so I can pull out this record from the database. If data mining, how it work?

Thanks in advance.
 
Batman

How did you know to use "Select * from xxx where person_Age>50" in your query? Who told you that the pattern works for 50+? This starting point for SQL can be found by using mining. If the end users have a certain report in mind, with all the constraints done, you do NOT need mining- only SQL/OLAP. Examples are like your select statement you did, Top-10 lists, most profitable customer lists, total sales last year, etc.

"If we don't know the criteria, how can we pull out this from database?" You cannot, you will have to 1st use data mining tools to find the patterns, THEN pull out the records that match this.

Data Mining is like real mining, in that you don't know what you will find until you do all the analysis.

Hope this helps
Bruce
 
Bruce,

Thanks for the input. From your message, it seems to me like with Data Mining, we don't need to define the criteria in our SQL query, and yet, end user can pull out the relevant record.

Now, can you tell me how this possible? i.e. if I want to pull out the all the customer information from table xxx where age > 50, how data mining can solve this problem if I don't define the criteria in up front - "Select * from xxx where person_Age>50".

Normally, my end user will tell me what kind of report they want in up front, and I define it in the SQL query.

Do help.

Thanks,
 
Something seems to be getting lost in the translation in this thread. Data mining is fundamentally an analytical, statistical process. Querying databases is a completely distinct function.

In data mining, one typically deals with data which is already prepared as a single table (or at least abstractly, as a single relational database query), and the goal is to have the computer discover patterns in the data, as models, segments, etc. Input: "all" of the data (or a statistical sample), output: the discovered patterns.

Querying, on the other hand involves a dliberate specification of a subset of the data to be retrieved. Input: query specification, output: relevant data set.

While data mining may involve querying (especially to extract the relevant statistical sample), and querying may be driven by things discovered during data mining, these are seperate processes.
 
predictor

Thanks for explaining it better than I was trying to do.

Bruce
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top