Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Can you write a better querie?

Status
Not open for further replies.

Guest_imported

New member
Jan 1, 1970
0
Hi,

I am writing a search engine that searches through 300,000 hierarchy
headings like the ones below. I am trying to make it as fast as possible.

1) Top:Arts:Music:Bands and Artists
2) Top:Arts:Music:Bands and Artists:B
3) Top:Arts:Music:Bands and Artists:B:Bowie, David
4) Top:Arts:Music:Bands and Artists:B:Bowie, David:Fan Pages
5) Top:Arts:Music:Bands and Artists:B:Bowie, David:Articles and Interviews

One Word Searches
---------------

If a user does a one word search such as "bowie" I want the
result to be (3) if they do a one word search for "bands" I want
the result to be (1). I have already achieved this by using
regexp searches. This works if you make sure that your search only
searches the last part of the hierarchy path...

Top:Blah:Blah:%one-word-search%

(but it is VERY slow)

Two Word Searches
----------------

If the user does a two word search such as "david bowie" I want the
answer to be (3) If the user does a two word search such as
"bowie interviews" I want the answer to be (5)

I have also achieved this by using regexp searches.

This works by making mysql search for word1 at the end of the path
only and word2 anywhere in the path.. or ..word2 and the end of the
path only and word one anywhere in the path.

(once again it is way to slow)

Three Word Searches
-------------------

I have also got this working, it is just an extension of the Two
Word Search idea. Once again way to slow.


I am assuming that to make this work faster I will have to
do something like make a meta table that splits all the paths up
into into single words and references them back to the original
path.. i.e. (and also keep track of the depth to use that as a
sort criteria)

1)Arts
1)Music
1)Bands
1)Artists
2)Arts
2)Music
2)Bands
2)Artists
2)B
3)Arts
3)Music
3)Bands (i am top)
3)Artists
3)B
3)Bowie
3)David
4)etc.

I have done this. It works great for single word searches almost instant.
I.E. If you type 'bands' it immediately gets (3) because it finds the word
bands with an (i am top) flag.

I have written an interesting querie that does the job quite well for
multiple word searches too, but once again it is too slow (but still much
quicker than the regexp way).

select *, IF(count(instID)>2,2,count(instID)) as myCount, sum(relevance) as
myRelevance, sum(depth) as myDepth from meta_path where word = '$word1' OR
word = '$word2' GROUP BY instID ORDER BY myCount DESC, myRelevance, myDepth LIMIT 0,20;

It seems to have a response time of about 3 - 6 seconds for the 300,000 records
(bearing in mind that they have now been split up into a new table that has
2.7 million single word records referencing back to the 300,000)

So. I am obviously doing something wrong here, can any one suggest a better
solution to this...

Kindest regards,
Justin Vincent.
 
Hi,

What about using SOUNDEX, a SQL Function that looks for words that 'sound' the same using ecoding. From the MySQL manual:

`SOUNDEX(str)'
Returns a soundex string from `str'. Two strings that sound almost
the same should have identical soundex strings. A standard soundex
string is 4 characters long, but the `SOUNDEX()' function returns
an arbitrarily long string. You can use `SUBSTRING()' on the
result to get a standard soundex string. All non-alphanumeric
characters are ignored in the given string. All international
alpha characters outside the A-Z range are treated as vowels:

mysql> select SOUNDEX('Hello');
-> 'H400'
mysql> select SOUNDEX('Quadratically');
-> 'Q36324'

You'll have to do some research and maybe adjust the DB, but it might be what you are looking for...

hth Bastien

There are many ways to skin this cat,
but it still tastes like chicken
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top