Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Verity Stemming 1

Status
Not open for further replies.

joepeacock

Programmer
Nov 5, 2001
74
0
0
US
I'm having a problem with Verity's stemming feature. It works a lot of the time, but for some words, it does not.

Here's my CFSEARCH code:
<cfsearch name="get1" collection="photos" type="simple" maxrows="5000" criteria="#lcase(sterm)#">

When #sterm# is "filmed", it returns all of the photos with the word "film" like the examples say it should.

But when #sterm# is "dog", photos named "Dogs" are not returned, and vice versa. Likewise with "pet" and "Pets"

Anyone know what's up with that?

-Joe
 
When sterm is dog, you're sure its dog (or DOG) and not mixed caps? (dOG, Dog, doG). One way to ensure is to say #lcase(sterm)#.

ALFII.com
---------------------
If this post answered or helped to answer your question, please reply with such so that forum members with a similar question will know to use this advice.
 
...duh... sorry...When I posted that I saw something about lcase and then when I went to see about it, I couldn't find it...

The situations are reversed... You've got filmed picking up film.. (A form of the word picking up the root) and then you've got dog not picking up dogs (the root not picking up a form). It seems if you'd have a problem it'd be the backwards problem.

ALFII.com
---------------------
If this post answered or helped to answer your question, please reply with such so that forum members with a similar question will know to use this advice.
 
Well, "dogs" doesn't pick up "dog" either. I'm wondering if the problem is more connected to the length of the word. I don't know exactly how the verity engine does its thing, but maybe below a certain term size, it won't parse out the root correctly? There is surprisingly little information available on Verity, so I have not been able to verify this hunch.
 
I've done a little more research and found that, in fact, Verity will not stem three letter words. Apparently, this also means that Verity will not stem words with three letter roots (e.g. "Dogs"). So, I added a little bit of code, but I'm not sure I like it. Feels like spackling a stucco wall with Aquafresh.

Code:
<cfif sterm is "man" OR sterm is "men">
 <cfset theterm = "man<OR>men">
<cfelseif len(sterm) is 4 and right(sterm,1) is "s">
 <cfset theterm = "#left(sterm,3)#*">
<cfelseif len(sterm) LTE 3>
 <cfset theterm = "#sterm#*">
<cfelse>
 <cfset theterm = sterm>
</cfif>
 
<cfsearch name = "get1" collection = "photos" type = "simple" maxrows="5000" criteria="#lcase(theterm)#">

Anybody got any better ideas?

Joe
 
Maybe only slightly better answers, though someone might come up with a great idea...

You answered your own question correctly, and I learned a bit... thanks

ALFII.com
---------------------
If this post answered or helped to answer your question, please reply with such so that forum members with a similar question will know to use this advice.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top