Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

More English dictionary fun 1

Status
Not open for further replies.

Chris Miller

Programmer
Oct 28, 2020
4,814
DE
Based on a file that contains 194,000 English words, Mike Lewis mentioned in thread1551-1811770

I mentioned a spin off question to his question about all words in the graph

[pre]L - A - R
| x | x |
C - O - I
| x | x |
T - A - F
[/pre]

What letters in which arrangement allow to build the most words?

I think the sheer number of 26^9 combinations (as letters can occur more than once and their position matters) cannot easily be brute forced.
So I thought of a milder problem of which 9 letter word inn the dictionary yields most words. There still is the issue of needing to test all paths in all the different ways the graph nodes can be populated.
So I thought I at least probe by choosing random paths of length 9 and random 9 letter words from the dictionary.

There are 784 full-length paths and there are 28608 9-letter-words, which may be reduced by palindromes and removing mirrored or rotated paths, but even without such optimizations means analysis of a magnitude of 10 million problems of that type. Could be tested in about 8 hours. Lets say within a day, when assuming 27 ms for a single problem setting is the average.

It could be fun to output the best solution found so far on the way. It will not include situations where no 9-letter word is found, but there might be more shorter words worth doing without the 9-letter-word category. Another idea is combining the letters of all 4 and 5 letter words to have statistics on which letter composition is most used. You can of course get an orientation from letter frequency analysis, but the number of repeated letters and letter combinations can shift this a bit. Many vowels are good, I guess repeated Es would work better but surely the optimum isn't all vowels.

Chriss
 
Karluk said:
I suspect that Mike Lewis and I are using different dictionaries, since I found multiple words of 15 or more letters but not the 15 letter word Mike provided earlier.

When I started all this (with my first efforts at a Trackword), I was using a 194,000-word dictionary ( which I expect is the one that you are using, Karl. But then I switched to a smaller one (84,000 words: I can't remember exactly why I switched. It might have been because the bigger one had too many ridiculously obscure words.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
And using that smaller dictionary, her is my list of vowel/consonant alternating words:

Code:
[i]15 letters[/i]
VERISIMILITUDES

[i]14 letters[/i]
RECAPITULATIVE
REHABILITATIVE
VERISIMILITUDE

[i]13 letters[/i]
DEMILITARISED 
DEMILITARISES 
DENATURALISED 
DENATURALISES 
DEPOLITICISED 
DEPOLITICISES 
MINERALOGICAL 
NUMEROLOGICAL 
RECAPITALISED 
RECAPITALISES 
RECAPITULATED 
RECAPITULATES 
REHABILITATED 
REHABILITATES 
REHABILITATOR 
TARAMASALATAS 
TOXICOLOGICAL 
UNIMAGINATIVE

Not all of these are in my "real" dictionary, by the way.

Mike


__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
By comparison, I get the following 21 words of 14 letters. It seems clear that your program not finding any long 'y' words is due in part to your dictionary not bothering to list words formed by adding 'ly' to a shorter word. Thus, you found 'unimaginative' but not 'unimaginatively'

aminosalicylic
deliberatively
deliverability
denominatively
hereditability
heterozygosity
manipulatively
pararosaniline
recapitulative
recapitulatory
recoverability
regeneratively
rehabilitative
relocatability
supererogative
supererogatory
tumorigenicity
vaporisability
vaporizability
verisimilitude
vituperatively
 
@Mike Lewis
I'm afraid my results using the shorter engmix.zip dictionary don't match yours. Your program doesn't seem to find a bunch of legitimate alternating consonant/vowel words that contain a 'y'. (You also seem to have manually edited out words that are minor spelling variants of another.)

My list using engmix.zip:
15 letters
unimaginatively
verisimilitudes

14 letters
manipulatively
recapitulative
recoverability
rehabilitative
supererogatory
verisimilitude
vituperatively

13 letters
demilitarised
demilitarises
demilitarized
demilitarizes
denaturalised
denaturalises
denaturalized
denaturalizes
depoliticised
depoliticises
depoliticized
depoliticizes
gynecological
imaginatively
ineligibility
inevitability
inimitability
mineralogical
monocotyledon
numerological
recapitalised
recapitalises
recapitalized
recapitalizes
recapitulated
recapitulates
rehabilitated
rehabilitates
rehabilitator
taramasalatas
toxicological
unimaginative
 
Thanks for letting me know. I'll check my program, and also check which dictionary it was using. I certainly didn't manually edit my list.

I noticed earlier that my list only shows British spellings, not American ones (at a quick glance). Yet your list has both American and British variations, such as "demilitarised" and "demilitarized", which is strange. I'll investigate.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top