Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Tips on search engine

Status
Not open for further replies.

zoomby

Programmer
Aug 5, 2002
60
DE
Hi!

I'm looking for some tips on search engine design.
Here is what I have done until now:
A script which kills all the html tags etc. of my sites ands writes it to a MySql table with the fields "content" and "URL".
Then I'm searching the table with the MySql rlike statement (select * from keywords where content rlike="test|test2").

What can I do to improve it? Can you give me tips? Are there any good articles on search engine design in general in the web?

Bye
Chris

 
be sure when you index your site and strip code out, to strip out a list of "stopwords" too that you can define. very common words such as "the", "a", "of", "and", etc., will save lots of room in your database. also, after stirpping tags, i like to strip out any superfluous whitespace to compact the page content for storage using a simple preg_replace function:

function strip_whitespace($source) {
// two or more consective whitespace characters
// will be replaced with a single space
$source = trim(preg_replace('/\s{2,}/', ' ', $source));
return $source;
}

and then you can explode this string on the space character to an array, for efficient inserting into your database.

another way to store the keywords (instead of storing a whole line of words on same row as one of the URLs) is to create a separate table called "words" where each row has only one word and its page. in this sturcutre you can tell database to
SELECT all rows WHERE $word equals your $searchterm; in this manner i believe your database engine will always be much faster in a SELECT than a LIKE statement. these are just a few ideas for you; i finished my first search engine very recently and used many of these same priniciples. good luck!

-f!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top