Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Full-Text Search - 50% rule

Status
Not open for further replies.

MasterKaos

Programmer
Jan 17, 2004
107
GB
The Full-Text search feature in mySQL has this rather annoying "feature" of returning an empty set when the search string matches more than 50% of the rows in a table, as it deems such a search not to be relevant enough.

I was wonderring if anyone knows if there is a way of telling if this has happened? I mean how do I know if the set is empty because there were too many results rather than too few?

I am implementing a search feature for my web page using this, and it would be nice to tell users either "too many matches, refine your search" OR "no matches, please try again". It's not much good telling them "well, gee there might be too many matches, or there might not be any, I don't really know what you should do next..."

If the answer is no, can anyone suggest a fairly basic PHP search function, that searches one or two columns and returns a relevance ranked result set? Boolean would be nice but not a must-have.

Thanks!
 
You could always use 'IN BOOLEAN MODE'. It ignores the 50% threshold. This only works in MySQL 4.0.1+.
 
Yeah I tried IN BOOLEAN MODE before (i got MySQL 4.0.12) and thought it still did the 50% rule, but now looking cloesly i'm really confused.

My web site is for a small business i am starting that does web site design, e-commerce and custom software for small to medium businesses. So far I've writeen the text for the intro page and the page about web design.

The word "web" comes up very often, usually followed by "site". The word "business" comes up very often too. the word "you" is everywhere.

If i enter "business" without boolean mode i get 0 results, and in boolean mode i get about 7 which makes sense.

But if i enter "you" i get 0 results without boolean AND with, which is weird.

But here's the really confusing part. If i enter "web" without boolean i get 0, if i enter "web" in boolean i get 0. But if i enter "web +site" in boolean i get 6 results!
"web -site" i get 0, "web +presence" i get 1.

Where's the logic in this? how can "web -site" be 0 when there's at least one instance of "web presence" ???

Here's the code i'm using for the query:

Code:
select section_id, page_id, name, title, content from section WHERE MATCH (title,content) AGAINST ('$search' IN BOOLEAN MODE);

Has anyone used Full-Text search and found it to be useful? coz it's beginning to look like it just doesn't work!



----------------------------------------
The first 90% of the code accounts for the first 90% of the development time. The remaining 10% of the code accounts for the other 90% of the development time.
 
The reason why you get 0 results when you search for 'you' is
because 'you' is a stopword, and MySQL will always ignore it
when doing a full-text search.

If your ft_stopword_file variable says 'built-in' then all the
stopwords are in file ft_static.c, compiled in with the rest of
the source code.

The reason why "web -site" can be 0 and "web +presence" can be
1 is because "web -site" means 'return records that MAY have
the word web in them, but definitly do not have the word site,'
whereas "web +presence" means 'return records that MAY have the
word web in them, but must have the word presence in them.'
Your results indicate that the word 'web' is not being found,
much like a stopword. I don't know why that is.
I checked the stopword list and 'web' is not among them,
at least in my version of the MySQL source code (4.0.15).
What happens when you change 'web' to '+web'?
Try changing "web +presence" to "+web +presence".
 
'+web' seems to do the same as 'web'

Try playing around with it yourself -
my site:
remember only "about us", "web solutions" and "contact us" has text in it so far.

i might be working on the page so you might get a parse error or something every now and again, but I won't be touching the search function for now. At the moment its in boolean mode.



----------------------------------------
The first 90% of the code accounts for the first 90% of the development time. The remaining 10% of the code accounts for the other 90% of the development time.
 
I figured out why 'web' was being ignored by full-text search.
It's because MySQL, by default, ignores words that are less than
4 characters long. If you add,

ft_min_word_len=3

to the [mysqld] block of your my.ini/my.cnf file the full-text
indexes will pick up 3-letter words, too. Of course you'll have
to rebuild the index to see results.
 
AH-HA!!!

Mystery solved! YaY! Good work vanekl!

Well seing as its only a small site i won't worry about the fact that boolean mode throws away relevance ranking, but for some future project, I wonder if there is a way of getting around the 50% rule without losing relevance ranking...?

anyway it's all good for now. Thanks!

----------------------------------------
The first 90% of the code accounts for the first 90% of the development time. The remaining 10% of the code accounts for the other 90% of the development time.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top