Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

More English dictionary fun 1

Status
Not open for further replies.

Chris Miller

Programmer
Oct 28, 2020
4,814
DE
Based on a file that contains 194,000 English words, Mike Lewis mentioned in thread1551-1811770

I mentioned a spin off question to his question about all words in the graph

[pre]L - A - R
| x | x |
C - O - I
| x | x |
T - A - F
[/pre]

What letters in which arrangement allow to build the most words?

I think the sheer number of 26^9 combinations (as letters can occur more than once and their position matters) cannot easily be brute forced.
So I thought of a milder problem of which 9 letter word inn the dictionary yields most words. There still is the issue of needing to test all paths in all the different ways the graph nodes can be populated.
So I thought I at least probe by choosing random paths of length 9 and random 9 letter words from the dictionary.

There are 784 full-length paths and there are 28608 9-letter-words, which may be reduced by palindromes and removing mirrored or rotated paths, but even without such optimizations means analysis of a magnitude of 10 million problems of that type. Could be tested in about 8 hours. Lets say within a day, when assuming 27 ms for a single problem setting is the average.

It could be fun to output the best solution found so far on the way. It will not include situations where no 9-letter word is found, but there might be more shorter words worth doing without the 9-letter-word category. Another idea is combining the letters of all 4 and 5 letter words to have statistics on which letter composition is most used. You can of course get an orientation from letter frequency analysis, but the number of repeated letters and letter combinations can shift this a bit. Many vowels are good, I guess repeated Es would work better but surely the optimum isn't all vowels.

Chriss
 
Oh, that said I don't have done this yet nor will I start right now. But perhaps you bring in your own ideas or puzzles based on the dictionary.

Chriss
 
Chriss, you mentioned 9-letter palindromes. So I knocked up a quick program to find all the palindromes in the dictionary (ignoring words with three or fewer lettes). The longest are each seven letters.

Code:
* Looks for all single-word palindromes with more than three letters.

SELECT UPPER(cWord), nLen FROM Words82K ORDER BY nLen DESC WHERE IsPalindrome(cWord)

FUNCTION IsPalindrome
* Returns .T. if the specified word is a palindrome

LPARAMETERS tcWord

LOCAL lcWord, lcRight, lcReversed, lnI, lnHalfLen

lcWord = ALLTRIM(UPPER(tcWord))

IF LEN(lcWord) <= 3
  * Reject words of three letters or fewer.
  RETURN .F.
ENDIF

* Get the length of half the word, ignoring the middle letter if
* the word has an odd number of letters.
lnHalfLen = INT(LEN(lcWord) / 2)

* Get the right-most half-word and reverse its letters
lcRight = RIGHT(lcWord, lnHalfLen)
lcReversed = ""
FOR lnI = lnHalfLen TO 1 STEP -1
  lcReversed = lcReversed + SUBSTR(lcRight, lnI, 1)
ENDFOR

* Return .T. if the left-hand half matches the reversed
* right-hand half.
RETURN (lcReversed = LEFT(lcWord, lnHalfLen))

ENDFUNC

Running time (excluding printing the results): 0.73 seconds.

Three longest palindromes:

DEIFIED, REVIVER, ROTATOR

Mike



__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Sorry, I meant anagrams, not just palindromes.

Edit: You see? hen you swap letters and apply all arrangements on the 3x3 nodes graph it's sufficient to test one of the anagram words. palindromes obviously are a special case of that.

Chriss
 
Sorry, I meant anagrams, not just palindromes.

Not to worry. It was an interesting exercise to find the longest single-word palindromes.

What I would really like is some automatic way of generating multi-word palindromes, but that would be asking too much. (I'm thinking of things like being able before seeing Panama, or something vaguely like that <g>.)

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Hi

Mike Lewis said:
The longest are each seven letters.
Maybe I did not understood the challenge, but the longest palindromes in the english3.txt file are 9 letters long.
[spoiler one-liner][pre]
rev english3.txt | diff --old-line-format= --new-line-format= english3.txt -[/pre]
[/spoiler]
[spoiler longest palindrome]malayalam rotavator
[/spoiler]


Feherke.
feherke.github.io
 
No, my general idea was, what if you don't have given letters. Maybe illustrated this way:

[pre]? - ? - ?
| x | x |
? - ? - ?
| x | x |
? - ? - ?[/pre]

So which letters do you need to put in to get most words out from traversing legal paths in this graph (the rules were visiting neighbor by neighbor, neither skipping over a node (ie the graph does not connect all letters to each other, only the center letter sees all others as its neighbors) nor visiting nodes twice overall and even not to use an o as oo in a word. And more strict rules someone suggested was not allowing paths to cross, eg not going right, down+left ,right, up+left, as the last step crosses the line of the second step. I won't follow that rule, so I will allow crossing, but you can always modify a task as you like to.

Since you can populate the 9 ? nodes with 26 letters you have 26^9 combinations, some that are mirrored or rotated could be removed, but the magnitude is rather that for the arrangement of any letter alone, and then each can be traversed in all the different allowed paths. The arrangement makes a difference, if you have two Os, but they are not neighbor nodes, you can't build words with oo.

So my thought to make that a bit easier is populate the nodes with 9-letter-words and find the optimal one in its optimal path to allow most other words, down to 1-letter-words and find out which 9-letter word in which arrangement yields the most words. And there anagrams make no difference, nor palindromes, sorry. But that's actually the least important about the task, just a possible way to shrink down the number of cases to process.

(Another edit: I think even anagrams make a difference, because you can have new direct neighbors you didn't have before, so that is even a red herring).

Chriss
 
Mike Lewis said:
What I would really like is some automatic way of generating multi-word palindromes, but that would be asking too much. (I'm thinking of things like being able before seeing Panama, or something vaguely like that <g>.)

Not sure if you were being circumspect or not but my favorite Palindrome involves Panama.

A Man, A Plan, A Canal, Panama!
 
Feherke,

Yes, you did understand the challenge. The two words that you identified are indeed palindromes, and are nine letters long. However, they are both proper nouns, Rightly or wrongly, I am now using a slightly shorter word list (downloaded from the same source) that excludes many proper nouns. Maybe I'll go back to the longer one.

kwbMitel,

Yes, of course I was referring to the famous Panama palindrom, and also to the supposed quote re Napoleon's exile: "Able was I ere saw Elba."

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
I was researching Shakespeare's play, Love's Labor's Lost, to see if I might be interested in attending a performance next year. I read that it contains the word 'honorificabilitudinitatibus', which happens to be the longest single word in all of Shakespeare's works. According to Wikipedia, it also happens to be the second longest English word that consists entirely of alternating consonants and vowels.

That suggests a programming challenge to find the longest alternating consonant/vowel or vowel/consonant word in the dictionary used in this thread. It looks to me to be an easy programming assignment, except that I can't think of a simple way to handle the letter 'y'. One would need some artificial intelligence built in to distinguish 'y' as a consonant from 'y' as a vowel. It might be easiest to have a manual step at the end to examine the output and eliminate 'y' words that don't meet the requirements.
 
One would need some artificial intelligence built in to distinguish 'y' as a consonant from 'y' as a vowel.

Alternatively, you could simply establish a rule that says that Y is always a consonant.

Mike


__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
OK. I've just written a quick program to find all words with alternating vowels / consonants (taking Y as a consonant). Ignoring words with fewer than six letters, it find 3,211 words, of which the longest contains 15 letters.

I'll do a bit more testing and tidy up the code before I post it. In the meantime, here is the 15-letter word:

VERISIMILITUDES

Unless anyone can find a longer one?

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
So here is my code for finding the longest word with alternating vowel/consonants. It's in Visual FoxPro, but should be possible to translate to other languages fairly easily:

Code:
* Determines the longest words that contains alternating consontants
* and vowels.

* Create a tempory table to hold the results
CREATE CURSOR csrSolution (cWord c(24), nLen i)
    && assume longest word has 24 letters

* Open the dictionary table
IF NOT USED("Words82K")
    USE Words82K IN 0 ALIAS Words
ENDIF
SELECT Words

* Loop through the table ignoring words with fewer than 6 letters
SCAN FOR LEN(ALLTRIM(cWord)) >= 6

    lcWord = ALLTRIM(UPPER(Words.cWord))

    * replace all vowels with "A" and all consonants with "B"
    lcTest = CHRTRAN(lcWord, "AEIOU", REPLICATE("A", 24))
    lcTest = CHRTRAN(lcTest, "BCDFGHJKLMNPQRSTVWXYZ", REPLICATE("B", 24))

    * Create test pattern: either ABABAB... or BABABA... depending on
    * whether word being tested starts with a vowel or a consonant
    lcPattern = IIF(INLIST(LEFT(lcWord, 1), "A", "E", "I", "O", "U"), ;
        REPLICATE("AB", 12), REPLICATE("BA", 12))

    * Test the word against the test pattern
    IF lcTest == LEFT(lcPattern, LEN(lcTest))
        * We have a hit; write it to the cursoe
        INSERT INTO csrSolution (cWord, nLen) VALUES (lcWord, LEN(ALLTRIM(lcWord)))
    ENDIF

ENDSCAN

* Put into descending order of length
SELECT cWord, nLen FROM csrSolution ORDER BY nLen DESC INTO CURSOR csrSolution

* Display the results
BROWSE NOWAIT

Mike


__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Incidentally, I just modified the program such that Y is treated as a vowel rather than a consontant. I get excactly the same result. So the issue of Y being a vowel or a consonant doesn't arise (at least, not for words with six or more letters).

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Mike Lewis said:
Incidentally, I just modified the program such that Y is treated as a vowel rather than a consontant. I get excactly the same result. So the issue of Y being a vowel or a consonant doesn't arise (at least, not for words with six or more letters).

In all probability you are right that none of the longer words contain a 'y'. However, I suspect your program doesn't conclusively demonstrate this. I don't see any code that allows for the possibility of multiple 'y's in the same word with at least one 'y' used as a consonant and another 'y' used as a vowel.
 
I don't see any code that allows for the possibility of multiple 'y's in the same word with at least one 'y' used as a consonant and another 'y' used as a vowel.

Agreed. I didn't allow for that possibility. I would need to think about that.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
For what it's worth, I found 313 words from my dictionary that contain more than one "Y". Of course, not many of those will follow the "alternative vowel/consonant" rule, but it should be easy to work out which ones do. I'll do that in due course. But that won't answer the question about words where the "Y" occurs both as a vowel and a consonant.

And in case anyone is interested, the longest word that contains multiple Ys is: encyclopaedically (which my spell-checker has just rejected).

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
I have written my own program to find alternating consonant/vowel words. I suspect that Mike Lewis and I are using different dictionaries, since I found multiple words of 15 or more letters but not the 15 letter word Mike provided earlier. Here is my list:

[hide]
15 letters
cytomegalovirus
monocotyledones
paramyxoviruses
unimaginatively

16 letters
hypocotyledonary
volatilisability
volatilizability

22 letters
honorificabilitudinity
[/hide]
 
By the way, there are some words that meet the alternating consonant/vowel criteria that have one 'y' used as a consonant and another 'y' used as a vowel. Examples are 'yakety' and 'yarely'. I happened to notice them towards the tail end of my output. I have no idea if there are longer examples, since the only way I have to detect them is by visually scanning my output.

My program treats every 'y' as a wildcard that can be either a consonant or vowel. Because of this I can find words like 'yakety', but I also get a lot of spurious hits that don't meet the alternating consonant/vowel criteria at all. An example is 'waylays', The first 'y' passes the a/y test as a consonant and the y/l test as a vowel. Similarly for the second 'y'. A more sophisticated program could doubtlessly eliminate these spurious hits within the progamming logic, but I'm not sure I'm sufficiently interested to make the effort.
 
I guess I was sufficiently interested to make the effort. I found a fairly simple way to make the fix and no longer seem to be getting spurious matches. This also allowed me to produce a list of 107 words of six or more letters that contain more than one 'y'. Visually scanning this list, I found the following words that use 'y' as both a consonant and vowel.

[hide]words of six or more letters with 'y' used as both a consonant and vowel
anyway
everyday
everyway
gayety
yakety
yarely[/hide]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top