Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How search engines work 2

Status
Not open for further replies.

y2k1981

Programmer
Aug 2, 2002
773
IE
I know that this probably isn't the best place to post this question, but can somebody tell me. How exactly do search engines work? ok, so we all know they go thru sites looking at meta tags (unless they're directory search engines, if I'm calling them the correct name?) but when you click on Search, what really happens? Is it an exectible file that's doing the searching or a perl script, I have no idea. That's what I'd really like to know.

Also, on company sites, they often offer the facility to search their site. Do they use the same technologies as search engines like google, but on a smaller scale, or do they use something else, again, a perl script perhaps? If I wanted to put a search facility on my site, could I create a simple mysql database with all the meta tags in it and use php to return the pages that match the entries in mysql? Just an idea.

Thanks!!
:D
 
Hi mate,

This is a big question that you are asking here.

Basically, when the user clicks search, the data is submitted from a form to the script. (Perl, PHP or whatever).

The script will take the input, search the database that it has containing all the text from the spidered sites, then it will use a ranking technique to decide what order the results will be shown in. This technique varies from engine to engine.

The script will then generate the output (Search results) and then return it to the user.

Your question about adding a database containing all of the metas is a bad way to do it, unless your metas are totally relevant to the page.

The best way is to spider the page and extract all of the text (Minus the most common words such as "and, the, it" etc.) and then store that in a database.

This way the results returned to the user are actually relevant to the page and it gives you a much wider range of terms that the user can search for. You might have 100 words in your metas, but 10,000 on your page. This way allows the user to search for the actual data that they can see.

Hope this helps Wullie

sales@freshlookdesign.co.uk

The pessimist complains about the wind. The optimist expects it to change. The leader adjusts the sails. - John Maxwell
 
yeah, wasn't asking for much was I! So basicly what you're saying is that it goes to a perl/other script. One thing that you said:

"The best way is to spider the page and extract all of the text (Minus the most common words such as "and, the, it" etc.) and then store that in a database.

This way the results returned to the user are actually relevant to the page and it gives you a much wider range of terms that the user can search for. You might have 100 words in your metas, but 10,000 on your page. This way allows the user to search for the actual data that they can see."

I thought that search engines only searched through the meta tags, page title etc. I didn't think that they went anywhere near the actual conten????
 
Hi mate,

Go to or virtually any search engine and search for a phrase, now look at the results returned, most of them will not have that phrase in their metas.

Only a few engines actually read the meta keywords tags now, but most use the descriptions etc but virtually all the text minus the most common words are stored in their database.

Using only meta tags, you could add words that are totally irelevant to your site and then rank high. Using page content the results are more relevant and are actually based on the contents of the page.

Hope this helps Wullie

sales@freshlookdesign.co.uk

The pessimist complains about the wind. The optimist expects it to change. The leader adjusts the sails. - John Maxwell
 
final question... I hope! So, basicly search engines "as a rule of thumb" don't use meta tags any more, they look at the actual content, correct? Now, you said that "virtually all the text minus the most common words are stored in their database" -- I thought that search engines didn't store sites in their database, that they went thru sites and picked out the sites relevant to the search. Does this mean that you have to register with a search engine for it to find your site? Sorry for all the questions!
 
Hi mate,

Yes, to a point.

A search engine will never stumble across your page unless it already knows it is there.

Spiders are given a URL to begin with and then extract and follow links on the way. Just because you search for a site on an engine, in no way means that you are searching every site on the net, far from it.

Possible ways that an engine could find your page are:

1) You submit it to their database.
2) Someone else submits it to their database.
3) The spider encounters a link to your site on another site.

Basically, a spider is a robot sent out to gather information, in this case the content of web pages. The spider will then extract the links, store them in it's database, then process the rest of the page data by removing the most common words. This is just to keep the database size down as most sites use the words and, the, it, me etc.

Then the spider will store all of the content it has found. Next, and this may be weeks later, the links that it encountered are followed. Any links on those pages are also entracted and the content sttored. This just keeps going and going. After a certain period of time, the engine will update it's database by re-spidering all of the sites it contins, picking up new links and content changes as it does it.

Hope this helps Wullie

sales@freshlookdesign.co.uk

The pessimist complains about the wind. The optimist expects it to change. The leader adjusts the sails. - John Maxwell
 
so search engines do contain databases of sites then? They don't go looking for the sites when the user does a search? So in order for the search engine to find your site, you'll either have to put a link to it on a site who'se URL it does have, or submit the URL to them yourself.

What about catalog search engines (I think that's what they're called?), the ones that you have to submit your site to, what's the differnece between them and ordinary search engines then, seing that ordinary search engines do keep a database of sites. Sorry for all the questions!!!!

[wavey]
 
sorry, I didn't explain myself properly in the second paragraph, I meant the ones you have to submit your site to for, and they then go through it and examine that it is about what you said it's about, and approve it. What's the difference? Maybe I've answered my own question there? Is AltaVista one of those?
 
Hi mate,

It would not be possible to search for each site as the user clicks the submit button. All it would take is one slow server and the search results would take forever to return. Also, the engine would have to search EVERY site it knows of, for EVERY search to be able to do that.

As for your second question, I assume you mean either paid engines or search directories.

Paid engines depend on the engine, but most return you for words that you specify, some also spider your site or both.

Directories normally return results based on the data that you entered when you submitted it, or they may also spider your site.

Hope this helps Wullie

sales@freshlookdesign.co.uk

The pessimist complains about the wind. The optimist expects it to change. The leader adjusts the sails. - John Maxwell
 
Thanks for all your replies. I'll leave you in peace now, but I think all that hard work answering questions deserves a star!!!!!

Regards
Martin
 
Hi mate,

Also read the following FAQ that gives a basic look into optimising your site to improve your search rankings.

faq828-2093

Hope this helps Wullie

sales@freshlookdesign.co.uk

The pessimist complains about the wind. The optimist expects it to change. The leader adjusts the sails. - John Maxwell
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top