Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

VFP like GOOGLE…

Status
Not open for further replies.

Nro

Programmer
May 15, 2001
337
CA
Hello.

The other day, I was using a very popular search engine and like always, I white a couple of words into the text box and instantly received a couple of hundred answers.

Then I continue working on one of my project. It’s for a chemical company, and one of the requirements is to be able to search parts of a chemical name inside a 100,000 bank of name. They never search the entire word exactly, but a word of two. For example, if they white ACID in the textbox, the result is

Chromic Acid Flakes
Chromic Acid Flakes
Fluoboric Acid
Sulfuric Acid 1.800 Tech

The quick and dirty line of code to do that is the same I use for decades:

Code:
SELECT pro_name FROM pro_pro1 WHERE "acid" $ LOWER(pro_name) INTO TABLE ttRes

The result came from an indexed table on a network, and took about 10 to 20 seconds to finish. My question is: is it possible to achieve in VFP something like popular search engines?
My poor result is based on 100,000 records, but these engines search over gazillions records. I can’t use any soundex() or algorithm that transform the whole field. It has to search for one particular word in a string.

Thanks in advance

Nro
 
Nro,

I've been doing this sort of thing for years. In fact, I go further. I give the user a single textbox, and let them enter a string, which they can then search for in any of the main character fields of the table being searched. So, it might query the person name, company name, address, etc., or perhaps the product name, product category, whatever. The point is, the user doesn't have to search a particular field; they just enter their search term.

Obviously this is slow. It might be possible to make some small improvement to the code you showed. Possibly using the LIKE operator rather than $, for example. But it won't make a big difference.

My solution is to give the user an alternative "advanced" search, where they can enter specific values for specific fields -- much more like the traditional way of doing things.

I warn them that the "simple" search is slow, and suggest they use the advanced search if the speed is an issue for them.

Now, here's my point: Most users continue to use the slower option. It seems they would rather wait a few seconds for the results to come through, in return for the simplicity of a search tools that "works just like Google" (of course, it's not really like Google, but I'm talking about what the user perceives).

I don't know if this is relevant to your applications, but it might be worth considering this approach, and see how the users react.

Mike



__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro tips, advice, training, consultancy
Custom software for your business
 
Hi
Well, the main purpose of this discussion is to improve something that’s already working. My users always complaining about the speed of applications in general, but again, like you said Mike, sometimes they prefer to use the slowest option, even if I explain them the impact.

What I’m looking for, is a way of indexing each word in a field, then, base on these indexes, use them to optimize the search. Does it seem too complex, or I need to write line and lines of code?

Thanks again
Nro
 
Hi

It is possible to do this quite a lot faster, but it requires that the individual words (and/or parts of words) be indexed in a table of their own...

So if you had a table with a key field, and (for the sake of argument) a single text field like the one below:

Code:
MyDataTable
Key  TextField
1    RING A ROSES
2    BOTTLE OF BRINE
3    LUCY IN THE SKY WITH DIAMONDS

You could set up an indexing routine that generates a table like this:

Code:
MyIndexTable
Key  TableName   TableKey WordValue 
1    MyDataTable 1        RING
2    MyDataTable 1        ROSES
3    MyDataTable 2        BOTTLE
4    MyDataTable 2        BRINE
5    MyDataTable 3        LUCY
6    MyDataTable 3        SKY
7    MyDataTable 3        DIAMONDS

The example above omits all the 'common' words (A, AT, OF, THE, WITH etc.)

But allows you to develop something you can search.

You could go further and normalise it a stage further down - to prevent repeats of the keywords across tables... depends on how efficient you need to be.

You would need to split each field to be searched into keywords and then filter out the common ones as above, but that wouldn't be difficult, just replace non A-Z characters with spaces and break down the strings into substrings on that.

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.
 
I meant to add...

it may be hard to get hold of but the wiki page links to some other potentially usefull stuff.

n
 
The result came from an indexed table on a network, and took about 10 to 20 seconds to finish

That is too long, are you sure its not the network?

The way you are doing it with the Select, IMO, is the best way...
A few things though: How are you displaying the choices to let the user select the correct "Acid"?

The table is indexed on the field with the Lower(<<fieldname>>) right? try indexing on
alltrim(lower(pro_name))

Try removing the index before the Select i.e. Set order to, then set the index back i.e.
Code:
select pro_pro1 
cOldTag = tag()
set order to
SELECT pro_name FROM pro_pro1 WHERE "acid" $ LOWER(pro_name) INTO cursor ttRes
*** I would not select into a table as now
*** it has to be created
select pro_pro1 
set order to tag (cOldTag)
also check for optomization with sys(3054)
 
Sorry; if indexed on alltrim(lower(pro_name)), this should be:

SELECT pro_name FROM pro_pro1 WHERE "acid" $ pro_name INTO cursor ttRes

Rushmore will use the right tag...
 
Try this also:This may be faster....

SELECT pro_name FROM pro_pro1 WHERE "acid" $ alltrim(lower(pro_name)) INTO cursor ttRes
 
You can make your own word index like this:

Code:
* index words in pro_name FROM pro_pro1 table

Create Table tabResults (WordID I, ProID I)
Index On WordID Tag xWordID
Index On ProID Tag xProID Additive
Index On BinToC(WordID)+BinToC(ProID)Tag xKeypair Collate "MACHINE" Additive

Create Table tabWord (WordID I Autoinc, Primary Key WordID Tag xpWordID, cWord C(40))
Index On Upper(cWord) Tag xWord Additive

Local lnCount, lcWord

Use pro_pro1 In 0
Scan
   For lnCount = 1 To GetWordCount(pro_name)
      lcWord = Lower(Alltrim(GetWordNum(pro_name,lnCount)))
      If Not Seek(lcWord,"tabWord","xWord")
         Insert Into tabWords (cWord) Values (lcWord)
      Endif
      If Not Seek(BinToC(tabWord.WordID)+BinToC(pro_pro1.ProID))
         Insert Into tabResults (tabWord.WordID, pro_pro1.ProID)
      Endif
   Endfor
Endscan

* now search for a word, eg "acid" via

Select pro_pro1.pro_name;
   From tabWords;
   Left Join tabResults On tabResults.WordID = tabWords.WordID;
   Left Join pro_pro1 On pro_pro1.ProID = tabResults.ProID;
   Where tabWords = "acid"

Change ProID to whatever is the primary key in pro_pro1. You may refine to not only extract words but also any word parts from at least three letters up to 40 or whatever maximum. Then you'd loop selstart from 1 to len(pro_name) and sellen from 3 to max and index each substr(pro_name,selstart,sellen).

The disadvantage of it is, you need to keep it in sync with the pro_pro1 table, it tends to be large even for a few records, but if you have a large table of chemicals it will also have many common words and especailly the word table of this "full text index" will not grow that much, which is a key factor besides the index on cWord, to speed up searches.

Bye, Olaf.
 
Why? when a correct index on the field and Select statement, like Nro is doing will work... He is looking for speed not a solution...
Do this all the time over a network and it works in nano seconds. 100,000 records is nothing... Maybe I am missing something here...
 
Imaginecorp,

he wants to find single parts of this field, like in the example showed in his initial post.

A simple index on the pro_name field does not help to find "acid" only. You need a Lower(pro_name) LIKE '%acid%' or 'acid' $ Lower(pro_name), but even an index on Lower(pro_name) will not be used in optimising these where conditions.

If you generate a word index, and my code is just a simple sample how to do so, then the sql can be optimised including the joins and would be reasonably faster.

Bye, Olaf.
 
Again

Imaginecorp said:
SELECT pro_name FROM pro_pro1 WHERE "acid" $ pro_name INTO cursor ttRes

Rushmore will use the right tag...

No Rushmore will not optimise this. Sall SYS(3054) and see yourself before making such a claim. Even the help on the $ operator alone says: "The $ operator is ... not Rushmore optimizable".

Bye, Olaf.
 
Nro,

Before you start writing your own indexing code, I suggest you follow up Nigelgomm's suggestion. If you can get hold of PHDBase (or a similar product), it will solve your problem.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro tips, advice, training, consultancy
Custom software for your business
 
Even the help on the $ operator alone says: "The $ operator is ... not Rushmore optimizable".

I stand corrected...

Though I still say check the network, before creating "word" tables etc. We access tables on the network with over 100,000 records (I admit not with the $ operator) in nano seconds

An idea, you may consider doing your select in a SP on the server which returns an array of your results....
 
As fas as I know PHDbase is a dead product, already was years ago. You may find it, but it's incapable to address varchar fields. It's easy enough to create your own word index.

For a better experience you may not use getwordnum but create an index on all wordparts by using substr() with variing begin and length, starting from a minimum length of 3 letters. Then you can select like with the $ operator.

Bye, Olaf.
 
This seems to me to be a possible opportunity for a collaborative project.

B-)

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.
 
More like, "We do the work and publish it as a FAQ"

Although I do see how you could come to your quote!


Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top