Hi All
I am facing a statistical problem which involves the fitting of an appropriate probability distribution to a certain data set. I have a set of inverse document frequency (IDF) values (please refer to [1]) for all the words in a document, and the problem I am facing is to chop this array of values at a certain point so that I only have the most useful values and not the irrelevant ones. I am looking for tools, algorithms etc that can be used to crack this problem. Please contact me in pandey.gaurav@gmail.com to discuss any ideas. I can also send a sample set of these values to those interested.
Thanks!
Gaurav Pandey
References:
[1] S. Robertson, Understanding Inverse Document Frequency: On theoretical arguments for IDF, Available at
I am facing a statistical problem which involves the fitting of an appropriate probability distribution to a certain data set. I have a set of inverse document frequency (IDF) values (please refer to [1]) for all the words in a document, and the problem I am facing is to chop this array of values at a certain point so that I only have the most useful values and not the irrelevant ones. I am looking for tools, algorithms etc that can be used to crack this problem. Please contact me in pandey.gaurav@gmail.com to discuss any ideas. I can also send a sample set of these values to those interested.
Thanks!
Gaurav Pandey
References:
[1] S. Robertson, Understanding Inverse Document Frequency: On theoretical arguments for IDF, Available at