Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

PHP function to de-pluralize a noun? 2

Status
Not open for further replies.

sen5241b

IS-IT--Management
Sep 27, 2007
199
US
Before I spend a lot of time on a problem that seems simple but isn't, does anyone know of a PHP function that will de-pluralize a noun?

e.g.
messes to mess
undresses would remain the same.
fathers to father
fathers's to father's
John T. Hammers would remain the same.


 
I forgot to mention:

Oxen to Ox
matrices to matrix
radii to radius

Again, not a simple problem. I figure google search has a way of doing this.
 
i know that there is no such built in function.

may i ask why you need this? there may be another way of approaching the problem.
 
Apparently, there is a PHP library called Chyrp that has pluralize and depluralize functions but Chyrp requires MySQL.

I think the complexity of this problem requires a dictionary with additional data on the pluralization of each word in the English language. The English language is just too illogical to be able to apply a simply set of pluralization rules. (Recall the numerous invasions of England but non-english speakers, the Danes, the Normans, etc.)

Thanks for your response jpadie. I have found another approach.

 
please share - that's what this forum is about.

i was going to suggest using phonemes as a comparison method (such as soundex).

i'm pretty sure that one could rewrite chyrp to use alternative databases and/or a file based storage mechanism
 
I'm writing a profanity filter. It shouldn't be too difficult to pluralize an offensive word from a limited list using some simple code but writing a function to pluralize any English word I think could be daunting.
 
ok. i think that i'd use a combination of metaphone() and similar_text() to achieve this goal.

steps are:

1. create a table of all dictionary words.
2. create a table of all profanity
3. calculate the metaphone value of all profane words
--setup done --
split input into words.
1. compare each word against dictionary. add words to holding list that are not in the dictionary.
2. compare each word against profanity table. substitute profanities
3. calculate metaphone values of each non-dictionary non-profane word.
4. compare metaphone values against dictionary of profane words.
5. if any matches then calculate similarity of two words. if over 80% then substitute and add to profane word filter, if over 65% then allow and flag into a watch list.

this is expensive as a process, but it's accurate and will get more accurate as time goes by.

in the alternative, there are free web services that might help. CDyne manages one for input strings up to 10k chars.
 
The "sound alikes" are definitely a problem that must be dealt with but it seems disguised profanity and false positives are an even bigger problem.
 
the sound-alike should fix the disguises. you could also try the leventshein difference. and also you could just do simple vowel substitution or put overl punctuated words into a hold list.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top