Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

English dictionary fun

Status
Not open for further replies.

Mike Lewis

Programmer
Jan 10, 2003
17,509
Scotland
I don't know if anyone is still lurking in this forum. If so, here's a little exercise for you:

What are the longest words in English that can be made up from each row of a standard English typewriter keyboard?

So, each word can only contain letters from a single row, but each letter can be used multiple times.

I recently downloaded a file that contains 194,000 English words, and I hope to use it to generate various word games and puzzles. I started out by writing a program to find the answer to the above question. The results are quite interesting.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
What do you both mean by "not following the rules"? Do you mean that you found a word that contains exactly the specified nine letters, and that word just happened to fit the rules? If so, that's perfectly legitimate.

But now try to find all the words (three letters or more) that follow the rules. A bit harder, I think.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Hi

[spoiler I mean not following the rules like this][pre]master # ruby -nle 'puts $_ if "aacfilort" == $_.chars.sort * ""' english3.txt
factorial[/pre]
[/spoiler]

Feherke.
feherke.github.io
 
Here is some (simplified) pseudo-code for my first shot at a program to find all the contained words:

Code:
Start with a 3 x 3 array containing the puzzle

Initialise the list of hits

Flag all the cells in the array as not yet visited

Loop through the array, looking at each of the nine cells in turn

  Set a global variable (call it gcWord) to an empty string

  Call a subsidiary function, passing the current cell's coordinates

End of the loop through the array

We now have the complete list of words; it remains only to eliminate
any duplicates

-----

Here is the subsidiary function:

If the cell (passed as the parameter) is out of range OR it has 
already been visited

  Exit the function

Flag the current cell as visited

Search the table for words that start with gcWord plus the letter in 
the current cell

If found

  If this is an exact match AND it has more than three letters

    Add the word to the list of hits

  Add the letter in the current cell to the end of gcWord

Else (not found)

  Clear gcWord

  Exit the function

Recursively call the same function for each of the eight adjacent
cells (never mind if some of those cells lie outside the array;
these will be ignored)

I haven't tried to code this yet. It looks quite complicated, and probably has logical errors. Perhaps someone can suggest a simpler approach?

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
>ou also sorted the letters

Yes, there is some letter sorting involved :)
 
Feherke, I'm impressed with the conciseness of your Ruby code.I can't imagine writing it in VFP in such a small amount of code.

That said, I feel it must be relatively easy to get the nine-letter word, if you "don't follow the rules". You would find all the 9-letter words that have exactly the same letters as the puzzle, and then examine them by hand (by eye?) to see which ones fit the rules. (In practice, I would guess most puzzles would only produce one possible word.)

What is more interesting is finding all the possible words (three letters or more) that follow the rules.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
>What is more interesting is finding all the possible words (three letters or more) that follow the rules.

Working on it ...
 
Hi

Mike, I wrote a proper solver in Ruby for the trackword puzzle. I would suggest to start a thread dedicated to that one. Discussing multiple puzzles in the same thread will become hard to read.


Feherke.
feherke.github.io
 
Another idea for the 3x3 grid: Find which letters (and their positioning) yield the most words.
Presumabley by symmetry there will be 8 solutions you can boil down to 1 by excluding rotated and mirrored solutions.

Chriss
 
Chriss said:
hehe, you failed to see my spoiler

Oh,no. How did I miss that? A neat answer, Chris. You'll be writing cryptic crossword clues next.

Feherke said:
I would suggest to start a thread dedicated to that one [Trackword]. Discussing multiple puzzles in the same thread will become hard to read.

I take your point, Feherke. I think I'll just finish this one by posting the complete solution to the above puzzle (once I've finished my program to solve it - unless anyone else gets there first). I might perhaps start a new thread after that, if anyone is interested.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Mike,

yes, I'd also agree on more new threads. The puzzles forum isn't very popping so it doesn't hurt to have multiple threads on the same theme of word puzzles, does it?

Chriss
 
I've temporarily given up trying to write a fully-automatic solver (using recursion, based on the pseudo-code I posted above). It's proving more difficult than I thought.

So I've now down a semi-automatic brute force method. This produces all the words all of whose letters appear in the puzzle, but not necessarily following the navigation rules. I then manually inspected them and eliminated those that don't follow the rules.

Here then is the solution. There are 38 words in the solution, compared to 16 that I found by trying to solve the puzzle by myself. I've checked that all of the 38 are present in Chambers.

ACT, ACTOR, AIR, ARIA, CAR, CAROL, CLOT, COAL, COAT, COIR, COR, CORAL, COT, FACT, FACTOR, FACTORIAL, FAIR, FAT, FIAT, FOAL, FOCAL, IOTA, LAC, LAIR, LOAF, LOIR, LOT, LOTA, OAF, OAR, OAT, ORAL, RIOT, ROC, ROT, ROTA, TACO, TAFIA

Thanks for all your feedback on this. I'll try to improve my program when I can grab some more time.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Hi

My solution found 90 words. ( Their sorted lowercase list, one word per line, has MD5 d15bf996dbea633df799d048355dfa2a. ) Checked only one randomly : you missed "cairo".


Feherke.
feherke.github.io
 
I manually eliminated CAIRO because I decided to disallow proper nouns. This is the advantage of creating a puzzle. It gives you complete power over the rules.

Were your 90 words all ones that followed the navigation rules?

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Mike, I haven't started to program, yet. But my idea differs and led me to the variation of the question I suggested earlier:
Populate a structure with 1-9, create all valid numeric combinations, replace the digits with letters and find whether these are valid words.

This is just the brute force version of it, though, and likely can be optimized further. You only need to start from 1,2, and 5, because starting from anywhere else is just creating a path that's a rotation or mirror of the paths starting at these 3 digits.

One other idea I had would create all subpaths with just 2 or 3 digits (or letters) and find these combinations in words. You could also start to replace letters with digits and exclude any words which don't fully translate into the 9 digits have any digit twice.

I think a combination of these ideas could take the problem on from both sides of a list of allowed words (the dictionary file minus proper nouns per your rule) and from the geometry (the graph). The general idea is any method roughly creating a set of solutions can be combined to find the intersection or union.

Chriss
 
>This produces all the words all of whose letters appear in the puzzle, but not necessarily following the navigation rules

Yep, that's where I started ...

And I also gave up (perhaps temporaril;y) with the recursion I was then going to use to verify them (it is, as you say, not quite as simple as recursing down a directory tree). Considering alternatives currently. But sounds like feherke is way ahead of the game :)


 
Hi

Chriss said:
Populate a structure with 1-9, create all valid numeric combinations, replace the digits with letters and find whether these are valid words.
That is how I also did it. With great help from Ruby at generating the combinations :
[spoiler trackword.rb][pre]
word_maze = ARGV[0]
word_list = File.readlines 'english3.txt', chomp: true

3.upto 9 do |length|
[*0...9].permutation length do |order|
previous = nil
next unless order.all? {|i| next if previous and ((previous / 3 - i / 3).abs > 1 or (previous % 3 - i % 3).abs > 1); previous = i }

word = order.map {|i| word_maze }.join

puts word if word_list.include? word
end
end[/pre]
[/spoiler]
You pass the letter sequence as parameter, concatenated in a single word :
Code:
ruby trackword.rb larcoitaf
For this output I changed it abit to avoid posting 90 lines :
Code:
3 letters : lar lac lao lor lot act air aia rac rai roc rot ria rio ria car col cor coi cot cat oar ora oca oca oat oaf ira tol tor tac tao tai act aia air for foi fir fat
4 letters : lair loir lota loaf aria acta raca rota rial riot clot caro cola coal cora coir coif coat cato oral octa iota tola tori taco atoc foal fora fiar fiat fact fair
5 letters : ariot actor claro clair calor carol cairo coral cairo taira tafia acari actor focal facto
6 letters : lariat factor
7 letters :
8 letters :
9 letters : factorial
Note that words that can be tracked on 2 different paths are included twice : act, actor, aia, air, cairo, oca, ria.

Mike said:
I manually eliminated CAIRO because I decided to disallow proper nouns.
Well, I guess the "cairo" on line 21504 of english3.txt refers to the cairo graphics library :
Code:
master # grep -xn cairo english3.txt 
21504:cairo


Feherke.
feherke.github.io
 
Strongm said:
sounds like feherke is way ahead of the game

That's right. And so in Chriss. I need to get my head round their approach. It's not there yet.

Feherke said:
I guess the "cairo" on line 21504 of english3.txt refers to the cairo graphics library

Silly of me. I should have thought of that. (I could argue that the name of a software product is a proper noun. But I won't)

Feherke said:
Note that words that can be tracked on 2 different paths are included twice

That would happen with all the methods discussed. You would have to do a de-dupe:

Code:
SELECT DISTINCT cWord FROM Solution

which also puts them into alphabetical order.

Beofre looking again at the above suggestions, I'm going to have one more shot at a brute-force method.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
So here's what I've got in mind now.

I start with my existing semi-automatic brute force method which produces all the words all of whose letters appear in the puzzle, but not necessarily following the navigation rules. In the case of the FACTORIAL puzzle, that produces 165 words.

For each result, I then apply a test for adjacent letters. From the dictionary word, I take each pair of consecutive letters in turn. I search the puzzle word for each occurrence of the first letter of the pair (the puzzle word consists of the nine letters in the grid, in the order in which they appear in the grid; so in this case it would be LARCOITAF).

I then do a CASE structure (or a series of IFs, depending on the language). It would look something like this:

[pre]
Position of first Test the letters at these
letter of the pair positions in the puzzle
within the puzzle word against the second
word (1-based) letter in the pair

1 2, 4, 5
2 1, 3, 4, 5, 6
3 2, 5, 6
4 1, 2, 5

and so on
[/pre]

If the second letter of the pair is NOT at one of the positions indicated, then the test fails. Otherwise, continue with the next pair. If you get to the end and the test has not failed, then you have a hit.

I haven't coded this yet. I might have to spend some token time today on some real work, but will have a shot at this when I can grab some time. In the meantime, please keep up with all your other efforts.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Hi

Got a bad idea to make the puzzle more difficult : avoid the path crossing itself. Thus making one of the "cairo" invalid solution :

[pre]
[gray]accepted : rejected :[/gray]
L A R L A R
[green]/| / \[/green][red]/[/red][green]|[/green]
[green]/ | /[/green] [red]/[/red][green]\|[/green]
[highlight lime]C[/highlight] O I [highlight lime]C[/highlight] O I
[green]\ /[/green]
[green]\ /[/green]
T A F T A F
[/pre]
No idea for now how could this be handled efficiently.


Feherke.
feherke.github.io
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top