Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

English dictionary fun

Status
Not open for further replies.

Mike Lewis

Programmer
Jan 10, 2003
17,509
Scotland
I don't know if anyone is still lurking in this forum. If so, here's a little exercise for you:

What are the longest words in English that can be made up from each row of a standard English typewriter keyboard?

So, each word can only contain letters from a single row, but each letter can be used multiple times.

I recently downloaded a file that contains 194,000 English words, and I hope to use it to generate various word games and puzzles. I started out by writing a program to find the answer to the above question. The results are quite interesting.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Hi

This kind of problems I prefer to solve without writing programs.
Code:
[blue]master #[/blue] grep -xE '[qwertyuiop]+' english3.txt | wc -L
11

[blue]master #[/blue] grep -xE '[qwertyuiop]{11}' english3.txt
proprietory
rupturewort

[blue]master #[/blue] grep -xE '[asdfghjkl]+' english3.txt | wc -L
8

[blue]master #[/blue] grep -xE '[asdfghjkl]{8}' english3.txt
alfalfas
falashas
galahads
haggadah

[blue]master #[/blue] grep -xE '[zxcvbnm]+' english3.txt | wc -L
3

[blue]master #[/blue] grep -xE '[zxcvbnm]{3}' english3.txt
bbc
bmx
mcc
xxv
xxx

Feherke.
feherke.github.io
 
Probably the best way to solve NYTimes xword puzzles is to get PHDs in literature and history. Personally, I'm too old and too not interested.

Anyway, FWIW, here's my code in the Valid method of the text box:

Code:
* Search word list to match input template
* Input characters: a-z or A-Z (case insensitive) are hardwired to their places
*   in the input string
* Leading and trailing spaces are trimmed.
* All other characters (including spaces if embedded) are considered blanks
*   in their respective places except trailing asterisk(s)
* Trailing asterisks are used to add additional word lengths to search.

skeleton = LOWER(ALLTRIM(THISFORM.Text1.Value))
skelLen = LEN(m.skeleton)

* Calc length(s) of words to search
IF RIGHT(m.skeleton,1) = '*'
 minLen = m.skelLen - 1
 FOR ii = m.skelLen-1 TO 1 STEP -1
  IF SUBSTR(m.skeleton,m.ii) = '*'
   minLen = m.minLen - 1
  ELSE
   EXIT
  ENDIF
 ENDFOR
 whereCl = 'LEN(TRIM(Word))>=' + LTRIM(STR(m.minLen)) + ' AND ' ;
         + 'LEN(TRIM(Word))<=' + LTRIM(STR(m.skelLen))
 skelLen = m.minLen
ELSE
 whereCl = 'LEN(TRIM(Word))=' + LTRIM(STR(m.skelLen))
ENDIF

* Form where clause
FOR ii = 1 TO m.skelLen
 cChar = SUBSTR(m.skeleton,m.ii,1)
 IF BETWEEN(m.cChar,'a','z')
  whereCl = m.whereCl + ' AND SUBSTR(Word,'+LTRIM(STR(m.ii))+',1)="' + m.cChar + '"'
 ENDIF
ENDFOR

IF USED('Temp')
 USE IN Temp
ENDIF

* Select words into Temp.dbf
sfx = IIF(THISFORM.Op1.Value=1, '', 'Fr')
SELECT Word ;
  FROM Words&sfx ;
  INTO TABLE Temp ;
  WHERE &whereCl

*WordList.Grid1.RecordSource = 'Temp.dbf'
*WordList.Grid1.Column1.ControlSource = 'Temp.Word'
THISFORM.LbResults.Caption = LTRIM(TRANS(RECCOUNT(),'99,999,999'))
WordList.Grid1.Refresh()

*USE IN Temp

*WITH THISFORM
*.CmdSearch.Enabled = .F.
*.Text1.Value = ''   && Reset
*.Text1.SetFocus()
*ENDWITH
 
VB6/VBA version (essentially feherke's solution, but as a program :) )

Code:
[blue]Option Explicit

Public Sub example()
    Dim re As RegExp
    Dim mymatches As MatchCollection
    Dim fso As New FileSystemObject
    Dim lp As Long
    Dim max As Long
    Dim pattern As Variant
    
    Set re = New RegExp
    For Each pattern In Array("qwertyuiop", "asdfghjkl", "zxcvbnm")
        max = 0
        With re
            .pattern = "^([" & pattern & "]+)$"
            .MultiLine = True
            .Global = True
            Set mymatches = .Execute(fso.OpenTextFile("d:\downloads\english3.txt", ForReading).ReadAll)
         End With
         For lp = 0 To mymatches.Count - 1
            If Len(mymatches(lp)) > max Then max = Len(mymatches(lp))
         Next
         For lp = 0 To mymatches.Count - 1
            If Len(mymatches(lp)) = max Then Debug.Print pattern & "(" & max & "): " & mymatches(lp)
         Next
     Next
End Sub[/blue]
 
Feherke,

A good idea to use grep. I didn't think of that. Your results are the same as mine, which is encouraging.

Feherke and Steve,

Here is the highly simplified pseudo-code for my method (I wrote the program in Visual FoxPro, but there is nothing language-specific about it):

Code:
Index the table (the word list) on length of word.

Loop through the table in descending order (so, start with the longest word in the table).

For each row on the keyboard:

  From the current word, remove those letters that are also present in the current keyboard row
  (there is a built-in VFP function which does that)

   If the result of the above operation is an empty string

      Record the current word as a hit

   If we have now found a hit for the current row AND the current word is shorter than
   the hit

      We've finished with this row

   End of loop on each row of the keyboard

End of loop on current word

I don't claim that this is the most efficient method, but it only takes a couple of thousand milliseconds to run (on a local machine).

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
I'm disappointed by what appears to be the answer to the top row of keys.

I've always understood this to be a different answer based on a puzzle question from back in my childhood.

The answer I understood to be correct was somewhat ironic which is why it has stuck with me.

How sure are we of the "correct" answer? It seems to simply be a alternate spelling of another word.
 
Considering only words that are in common use in English, here is a summary of the results:

Top row

PROPIETORY

But I am inclined to dispute this. According to Chambers, the preferred spelling is PROPIET[highlight #FCE94F]A[/highlight]RY.

Middle row

ALFALFAS

Since this row only has one vowel, I was surprised to get any interesting result at all (although I am not convinced that ALFALFA has a plural).

Bottom row

Given there are no vowels in this row, it's not surprising not to get any actual words - just abbreviations and Roman numbers.

The longest of the above words has 11 letters. I've now tweaked the program to give all words with ten or more letters. These are as follows (all from the top row):

REPERTOIRE

PERPETUITY

TYPEWRITER

I especially like the last of the above. It's nice that one of the longest words that you can type on one row of a typewriter is TYPEWRITER.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Before I program something I know and like that you can write the word TYPEWRITER just using the upper letter row of a keyboard. I also guess that's not by chance.

For the search at hand you can easily go through all words, to get there most efficiently, I'd look into algorithms for finding anagrams. It's not exactly the task, but I guess you could modify something to not just allow words with all letters but only some of them and allowing repeats.

It doesn't really pay since the word list is quite small and the brute force method already is fast enough.

Chriss
 
@Mike Lewis, you sussed out my ironic answer. I am also noting for future reference that it is not the only word of that length that is possible using the top row. Makes it less special in my mind and I may have to retire this tidbit of worthless trivia or at least not sell it so hard.
 
The question of the longest word using the only the top row of the keyboard came up in an old thread thread1229-1712464 At the time I used an anagram solver to come up with "preprototype" as a possible 12 letter solution. The same anagram solver generated the following ten letter words:

perpetuity
pirrouette
proprietor
repertoire
typewriter
 
Thanks for that, Karluk. The thread that you mentioned looks interesting. I'll read the whole thing when I can grab a moment.

PREPROTOTYPE is not in my list of 194,000 words. Maybe it should be. Then again, what does it mean? A prototype is something that exists before the thing that is being prototyped. So is a preprototype something that exists before the prototype?

Ah, I have just googled the word. There are lots of hits, mainly to do with medical research and usability testing. So,yes, let's accept it.

Good to see this thread has generated so much interest.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Moving on from typewriter keys, my next exercise will be to produce "trackword" puzzles. (The name "trackword" might be copyright, so I might have to call it something else.)

The puzzle is based on a 3 x 3 grid of letters - see example below. The aim is:

(a) Find the hidden nine-letter word. You start from any letter, and move through the grid one cell at a time, horizontally, vertically or diagonally, visiting each cell only once.

(b) Using the above navigation rule, find as many words as possible of three letters or more.

Here is an example:

[tt] L A R
C O I
T A F[/tt]


At a quick glance, I can see 16 contained words. But the aim is to write a program that would find as many as possible. I haven't yet thought how to do that, but I imagine it would be quite a bit harder than finding the longest word for each keyboard row. (Feel free to make suggestions.)

Mike


__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
In fact - or I also could be wrong - I don't get a 9-letter word from this.

Chriss
 
I said earlier that I had found 16 words. But the more you look at it, the more words jump out at you. My score is now up to 21.

These were all found "manually", as it were. I haven't yet tried to automate the process.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Hi

After all this discussion decided to solve it programmatically too :
Code:
result [teal]= [][/teal]
ARGF[teal].[/teal]each_line chomp[teal]:[/teal] [b]true do[/b] [teal]|[/teal]word[teal]|[/teal]
    [teal][[/teal][i][green]'qwertyuiop'[/green][/i][teal],[/teal] [i][green]'asdfghjkl'[/green][/i][teal],[/teal] [i][green]'zxcvbnm'[/green][/i][teal]].[/teal]each_with_index [b]do[/b] [teal]|[/teal]row[teal],[/teal] nr[teal]|[/teal]
        [b]next if[/b] [purple]0[/purple] [teal]>[/teal] longer [teal]=[/teal] word[teal].[/teal]length [i][green]<=>[/green][/i] result[teal][[/teal]nr[teal]][[/teal][purple]0[/purple][teal]].[/teal]length [b]rescue[/b] [purple]1[/purple]

        [b]if[/b] word[teal].[/teal]chars[teal].[/teal]all? [teal]{|[/teal]char[teal]|[/teal] row[teal].[/teal][b]include[/b][teal]?[/teal] char [teal]}[/teal] [b]then[/b]
            result[teal][[/teal]nr[teal]] = [][/teal] [b]if[/b] longer [teal]>[/teal] [purple]0[/purple]
            result[teal][[/teal]nr[teal]].[/teal]push word
        [b]end[/b]
    [b]end[/b]
[b]end[/b]

puts result
Code:
[blue]master #[/blue] ruby ruby/longest-word.rb english3.txt 
proprietory
rupturewort
alfalfas
falashas
galahads
haggadah
bbc
bmx
mcc
xxv
xxx


Feherke.
feherke.github.io
 
>Anyone else managed to find it?

yep. With a program, but not one that properly follows the rules of the game. Yet.
 
Hi

Also found it. Also with a program, also not following the rules. Yet.

For my curiosity, strongm, you also sorted the letters or found other simple alternative ?


Feherke.
feherke.github.io
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top