Hi!
I need to create a co-occurrence matrix from a text file. So far I have a term extractor that given the file ( data.txt ) returns a file with the relevant terms (term.txt). From these two I would now like to create a co-occurrence matrix using a window of size w. I am guessing that the algorithm will look something like
for every term t in data.txt
if (t co-occurs with a term s from term.txt
within w terms)
count(t, s) ++;
The output should be a text file with all terms i and j
term i, term j, count(term i, term j).
I'm guessing some kind of stemming is necessary?
Any ideas?
/lillyth
I need to create a co-occurrence matrix from a text file. So far I have a term extractor that given the file ( data.txt ) returns a file with the relevant terms (term.txt). From these two I would now like to create a co-occurrence matrix using a window of size w. I am guessing that the algorithm will look something like
for every term t in data.txt
if (t co-occurs with a term s from term.txt
within w terms)
count(t, s) ++;
The output should be a text file with all terms i and j
term i, term j, count(term i, term j).
I'm guessing some kind of stemming is necessary?
Any ideas?
/lillyth