Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Disentangle jumble of words - No easy solution 1

Status
Not open for further replies.

sen5241b

IS-IT--Management
Sep 27, 2007
199
US
Trying to write a PHP function that disentangles a jumble of words. I already have something written but it is imperfect.

liquidweather: liquid weather OR liquid we at her ?

If your answer is the latter doesn't make sense then the solution becomes very difficult. Code to determine the meaning of a phrase might require a team of PHDs. Natural language processing is a deep subject --very deep.

todaysununderstand: todays un understand OR today sun understand ? (Aspell says 'un' is a word).

All ideas welcome.
 
Really not a simple problem. And your setup of it is really too broad and indefinite.
Moreover your examples have no meaning (what's a liquid weather?), so there is no way to distinguish between interpretations that are equally meaningless.
There is so much work done on the argument that the only way to start off, IMHO, is to read many books and articles.

Franco
: Online engineering calculations
: Magnetic brakes for fun rides
: Air bearing pads
 
This is a problem & a half
if you get a reasonable solution then you would probably have qualified for a job a Bletchley park!

Computers are like Air conditioners:-
Both stop working when you open Windows
 
I don't think I need to take into account the the "meaning" of the phrases and trying to do so would require delving, unnecessarily, into natural language processing. I would be satisfied to have an algorithm that determines whether any possible disentanglement of the word jumble results in a series of valid words with no extraneous characters.



 
I would be satisfied to have an algorithm that determines whether any possible disentanglement of the word jumble results in a series of valid words with no extraneous characters

that seems feasible. and at this time in the evening after a half bottle of wine, quite easy. or at least logically easy if computationally intensive.

I wrote a crossword solver/anagrammer four years ago which might do the trick with a bit of tweaking. but now there are web services that are quite probably backed by big servers and make more sense. so why not shove the input into one of those services via curl and see whether the length of the returned data is the same as the length of the input string? no syntactical analysis, of course, but it seems that you don't need that.

It might help if you told us the functional aim of this process?
 
If you don't need any meaning analysis, and consequently are not choosing between alternative solutions, then the algorithm seems, in principle, pretty straightforward.
You need of course a list of accepted words, that is complete with all the possible variations (plurals, declinations, etc.).
Then start by determining all the words matching the start of your jumble, and for each continue with subsequent words. With this you build a tree, where branches that do not give a match at a certain point are simply discarded.
There shouldn't be too much solutions, possibly not more than two or three, for each jumble, and the number of solutions should not increase, even it should decrease, by increasing the length of the jumble. So the solution will take a finite and reasonable time to be completed.

Franco
: Online engineering calculations
: Magnetic brakes for fun rides
: Air bearing pads
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top