Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Data mining in text logs: associating sentences with each other

Status
Not open for further replies.

Edward1984

Programmer
Jan 30, 2015
2
CH
Hello everyone.

Can you advise me please on this project I'm trying to do.
I've got about 45Gigs of text logs. I've done some search and extracted regular expressions for certain text sentences related to errors that I'm mostly interested in. Now I'd like to be able to do some of the following:

1. be able to predict a possibility of occurrence of some sentences in their relation to other sentences (eg: occurrence of error string 1 is likely with probability P to be located with error string 2 in the range of N lines).

2. at least to be able to cluster roughly error strings by their occurrence together with some range of lines.

Could you advise me please what tools and methods to use best? Thank you in advance!
 
For analysis #1, you are looking at time series data. You are interested in events before or after other events. This is regression (but within time series). Analysis #2 is probably best approached as time series as well, although it probably can be done using other data mining techniques. Note that time series analysis tools are generally not included in most data mining packages. Look for keywords like Box-Jenkins techniques.

==================================
adaptive uber info galaxies (bigger, better, faster than agile big data clouds)


 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top