Hi all,
I am working on a basic indexing engine for a database driven article/news type site.
I have the search bit sorted, and am working on the indexing. The article always comes from a database in plain text.
What I figured is this:
1. Strip all the punctuation and bad words from the text
2. Get all the words into an array
3. Somehow 'compress' the duplicate words in the array to one entry + the number of times it occurred.
4. Store this data in a table.
So, I'm kinda stuck on part 3.
I know about array_unique() but this doesn't help with the 'number of occurences' bit. I could walk through them and test against the database but I feel there must be a more efficient way.
Could someone either;
a) enlighten me
b) point me in the right direction
c) tell me I'm going about it completely the wrong way.
if so that would be wonderful.
Cheers
01101000011000010110010001110011
I am working on a basic indexing engine for a database driven article/news type site.
I have the search bit sorted, and am working on the indexing. The article always comes from a database in plain text.
What I figured is this:
1. Strip all the punctuation and bad words from the text
2. Get all the words into an array
3. Somehow 'compress' the duplicate words in the array to one entry + the number of times it occurred.
4. Store this data in a table.
So, I'm kinda stuck on part 3.
I know about array_unique() but this doesn't help with the 'number of occurences' bit. I could walk through them and test against the database but I feel there must be a more efficient way.
Could someone either;
a) enlighten me
b) point me in the right direction
c) tell me I'm going about it completely the wrong way.
if so that would be wonderful.
Cheers
01101000011000010110010001110011