PHP function to de-pluralize a noun? 2

sen5241b · Jul 15, 2008

Before I spend a lot of time on a problem that seems simple but isn't, does anyone know of a PHP function that will de-pluralize a noun?

e.g.
messes to mess
undresses would remain the same.
fathers to father
fathers's to father's
John T. Hammers would remain the same.

sen5241b · Jul 15, 2008

I forgot to mention:

Oxen to Ox
matrices to matrix
radii to radius

Again, not a simple problem. I figure google search has a way of doing this.

jpadie · Jul 15, 2008

i know that there is no such built in function.

may i ask why you need this? there may be another way of approaching the problem.

sen5241b · Jul 16, 2008

Apparently, there is a PHP library called Chyrp that has pluralize and depluralize functions but Chyrp requires MySQL.

I think the complexity of this problem requires a dictionary with additional data on the pluralization of each word in the English language. The English language is just too illogical to be able to apply a simply set of pluralization rules. (Recall the numerous invasions of England but non-english speakers, the Danes, the Normans, etc.)

Thanks for your response jpadie. I have found another approach.

jpadie · Jul 16, 2008

please share - that's what this forum is about.

i was going to suggest using phonemes as a comparison method (such as soundex).

i'm pretty sure that one could rewrite chyrp to use alternative databases and/or a file based storage mechanism

sen5241b · Jul 16, 2008

I'm writing a profanity filter. It shouldn't be too difficult to pluralize an offensive word from a limited list using some simple code but writing a function to pluralize any English word I think could be daunting.

jpadie · Jul 17, 2008

ok. i think that i'd use a combination of metaphone() and similar_text() to achieve this goal.

steps are:

1. create a table of all dictionary words.
2. create a table of all profanity
3. calculate the metaphone value of all profane words
--setup done --
split input into words.
1. compare each word against dictionary. add words to holding list that are not in the dictionary.
2. compare each word against profanity table. substitute profanities
3. calculate metaphone values of each non-dictionary non-profane word.
4. compare metaphone values against dictionary of profane words.
5. if any matches then calculate similarity of two words. if over 80% then substitute and add to profane word filter, if over 65% then allow and flag into a watch list.

this is expensive as a process, but it's accurate and will get more accurate as time goes by.

in the alternative, there are free web services that might help. CDyne manages one for input strings up to 10k chars.

sen5241b · Jul 17, 2008

The "sound alikes" are definitely a problem that must be dealt with but it seems disguised profanity and false positives are an even bigger problem.

jpadie · Jul 17, 2008

the sound-alike should fix the disguises. you could also try the leventshein difference. and also you could just do simple vowel substitution or put overl punctuated words into a hold list.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

PHP function to de-pluralize a noun? 2

sen5241b

IS-IT--Management

sen5241b

IS-IT--Management

jpadie

Technical User

sen5241b

IS-IT--Management

jpadie

Technical User

sen5241b

IS-IT--Management

jpadie

Technical User

sen5241b

IS-IT--Management

jpadie

Technical User

Similar threads

Part and Inventory Search

Sponsor