Data mining in text logs: associating sentences with each other

Edward1984 · Jan 30, 2015

Hello everyone.

Can you advise me please on this project I'm trying to do.
I've got about 45Gigs of text logs. I've done some search and extracted regular expressions for certain text sentences related to errors that I'm mostly interested in. Now I'd like to be able to do some of the following:

1. be able to predict a possibility of occurrence of some sentences in their relation to other sentences (eg: occurrence of error string 1 is likely with probability P to be located with error string 2 in the range of N lines).

2. at least to be able to cluster roughly error strings by their occurrence together with some range of lines.

Could you advise me please what tools and methods to use best? Thank you in advance!

johnherman · Jan 30, 2015

For analysis #1, you are looking at time series data. You are interested in events before or after other events. This is regression (but within time series). Analysis #2 is probably best approached as time series as well, although it probably can be done using other data mining techniques. Note that time series analysis tools are generally not included in most data mining packages. Look for keywords like Box-Jenkins techniques.

==================================
adaptive uber info galaxies (bigger, better, faster than agile big data clouds)

Edward1984 · Feb 3, 2015

Hi John, thanks a lot for your answer.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Data mining in text logs: associating sentences with each other

Edward1984

Programmer

johnherman

MIS

Edward1984

Programmer

Similar threads

Part and Inventory Search

Sponsor