Guest_imported
New member
- Jan 1, 1970
- 0
Hi,
I am writing a search engine that searches through 300,000 hierarchy
headings like the ones below. I am trying to make it as fast as possible.
1) Top:Arts:Music:Bands and Artists
2) Top:Arts:Music:Bands and Artists:B
3) Top:Arts:Music:Bands and Artists:B:Bowie, David
4) Top:Arts:Music:Bands and Artists:B:Bowie, David:Fan Pages
5) Top:Arts:Music:Bands and Artists:B:Bowie, David:Articles and Interviews
One Word Searches
---------------
If a user does a one word search such as "bowie" I want the
result to be (3) if they do a one word search for "bands" I want
the result to be (1). I have already achieved this by using
regexp searches. This works if you make sure that your search only
searches the last part of the hierarchy path...
Top:Blah:Blah:%one-word-search%
(but it is VERY slow)
Two Word Searches
----------------
If the user does a two word search such as "david bowie" I want the
answer to be (3) If the user does a two word search such as
"bowie interviews" I want the answer to be (5)
I have also achieved this by using regexp searches.
This works by making mysql search for word1 at the end of the path
only and word2 anywhere in the path.. or ..word2 and the end of the
path only and word one anywhere in the path.
(once again it is way to slow)
Three Word Searches
-------------------
I have also got this working, it is just an extension of the Two
Word Search idea. Once again way to slow.
I am assuming that to make this work faster I will have to
do something like make a meta table that splits all the paths up
into into single words and references them back to the original
path.. i.e. (and also keep track of the depth to use that as a
sort criteria)
1)Arts
1)Music
1)Bands
1)Artists
2)Arts
2)Music
2)Bands
2)Artists
2)B
3)Arts
3)Music
3)Bands (i am top)
3)Artists
3)B
3)Bowie
3)David
4)etc.
I have done this. It works great for single word searches almost instant.
I.E. If you type 'bands' it immediately gets (3) because it finds the word
bands with an (i am top) flag.
I have written an interesting querie that does the job quite well for
multiple word searches too, but once again it is too slow (but still much
quicker than the regexp way).
select *, IF(count(instID)>2,2,count(instID)) as myCount, sum(relevance) as
myRelevance, sum(depth) as myDepth from meta_path where word = '$word1' OR
word = '$word2' GROUP BY instID ORDER BY myCount DESC, myRelevance, myDepth LIMIT 0,20;
It seems to have a response time of about 3 - 6 seconds for the 300,000 records
(bearing in mind that they have now been split up into a new table that has
2.7 million single word records referencing back to the 300,000)
So. I am obviously doing something wrong here, can any one suggest a better
solution to this...
Kindest regards,
Justin Vincent.
I am writing a search engine that searches through 300,000 hierarchy
headings like the ones below. I am trying to make it as fast as possible.
1) Top:Arts:Music:Bands and Artists
2) Top:Arts:Music:Bands and Artists:B
3) Top:Arts:Music:Bands and Artists:B:Bowie, David
4) Top:Arts:Music:Bands and Artists:B:Bowie, David:Fan Pages
5) Top:Arts:Music:Bands and Artists:B:Bowie, David:Articles and Interviews
One Word Searches
---------------
If a user does a one word search such as "bowie" I want the
result to be (3) if they do a one word search for "bands" I want
the result to be (1). I have already achieved this by using
regexp searches. This works if you make sure that your search only
searches the last part of the hierarchy path...
Top:Blah:Blah:%one-word-search%
(but it is VERY slow)
Two Word Searches
----------------
If the user does a two word search such as "david bowie" I want the
answer to be (3) If the user does a two word search such as
"bowie interviews" I want the answer to be (5)
I have also achieved this by using regexp searches.
This works by making mysql search for word1 at the end of the path
only and word2 anywhere in the path.. or ..word2 and the end of the
path only and word one anywhere in the path.
(once again it is way to slow)
Three Word Searches
-------------------
I have also got this working, it is just an extension of the Two
Word Search idea. Once again way to slow.
I am assuming that to make this work faster I will have to
do something like make a meta table that splits all the paths up
into into single words and references them back to the original
path.. i.e. (and also keep track of the depth to use that as a
sort criteria)
1)Arts
1)Music
1)Bands
1)Artists
2)Arts
2)Music
2)Bands
2)Artists
2)B
3)Arts
3)Music
3)Bands (i am top)
3)Artists
3)B
3)Bowie
3)David
4)etc.
I have done this. It works great for single word searches almost instant.
I.E. If you type 'bands' it immediately gets (3) because it finds the word
bands with an (i am top) flag.
I have written an interesting querie that does the job quite well for
multiple word searches too, but once again it is too slow (but still much
quicker than the regexp way).
select *, IF(count(instID)>2,2,count(instID)) as myCount, sum(relevance) as
myRelevance, sum(depth) as myDepth from meta_path where word = '$word1' OR
word = '$word2' GROUP BY instID ORDER BY myCount DESC, myRelevance, myDepth LIMIT 0,20;
It seems to have a response time of about 3 - 6 seconds for the 300,000 records
(bearing in mind that they have now been split up into a new table that has
2.7 million single word records referencing back to the 300,000)
So. I am obviously doing something wrong here, can any one suggest a better
solution to this...
Kindest regards,
Justin Vincent.