Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Looking for search script

Status
Not open for further replies.

RenoWV

Technical User
Mar 16, 2002
156
US
Since many of you are longtime members of the perl community, I am hoping someone can offer some guidance.

I am looking for a search engine script that runs on a Unix server. We have no illusions about becoming another Yahoo or Google, so we're not wanting a top-of-the-line custom product ... in fact, ideally we'd find one reasonably priced "off-the-shelf".

The important thing is that it will index the html pages that are located on a long list of *external* url's that I would enter (perhaps a couple thousand), as opposed to only doing the pages within our own site (none of the external urls would be hosted in our own server's directory).

Can anyone make a recommendation?

Thanks....

=========================================
Here are some more details, if you think you know of a script...
=========================================

[1] As mentioned, because all the pages it would index would be located outside of our directory, this script must be capable of crawling/indexing specifically defined external url's;

[2] I'd prefer to have the option to be able to add all new url's myself, so we could review the website's content prior to adding it to the index (ie, inclusion is not automatic);

[3] It would be useful if we could set how deep the crawl would be on these external sites (to keep the size of the database manageable);

[4] Going along with #3, the spider would *only* go through pages located on the submitted domain. In other words, it would not continue to spider other sites which are linked to the submitted url;

[5] I'd like to be able to set the "look" of the results pages via html templates, and it would be best for us to have the opportunity to make other settings via a built-in control panel (or at the very least, clearly explained admin pages);

[6] It should at the least be able to initially handle a couple thousand url's (perhaps 20 to 30 thousand pages??).

[7] It would rank the returned results of a search query by their relevance.

--- These next features would be nice to have, but not essential ---

[1] We set which part of a page to index - meta tags, text, alt tags, etc;

[2] Some type of filtering feature (to block spamming, for example);

[3] The script could import url's from a deliniated database, and export in the same manner;

[4] We'd ideally want to set an automatic re-indexing schedule, to update content and remove dead links on a regular basis (though if we had to do this manually, that would be ok).

--------- Additional notes ---------

* It is not important for this script to categorize its results from these external sites;

* It is not important that this search engine have the ability to search the web at large if it could not find results in our own database of indexed pages;

* People from external sites will not be accessing any kind of account, so it is not necessary to have password protected access.

==========================
 
Hi RenoWV,

You could have a look at perlfect.com for the perlfect search engine. I don't know if it'll cater for all of your requirements, it would at least be a starting point.

Regards
Paul
 
Thanks Paul - I agree, the Perfect script looks very good. I have downloaded it, and will begin reading the documentation...
 
Thanks Wullie. This was one of the first scripts that I looked at, and it seems really good except for one limitation. As they say:

"FDSE ... can handle about 10,000 documents in all"

This of course is way more then I'd ever need for my own website, but for the use which I have planned - to index a long list of external sites - I can see that it would run out of power pretty quickly. If we estimate than an average site has 20 pages, then FDSE stops after only 500 url's. For my purposes, I'd need at least 5 times that amount.

But as I said, it is a great script, and their site itself provides a very clear presentation - thanks again...
 
Hi mate,

The script can easily handle a lot more than 10,000 documents, a lot of it depends on the hardware that is available.

Hope this helps Wullie

sales@freshlookdesign.co.uk

The pessimist complains about the wind. The optimist expects it to change. The leader adjusts the sails. - John Maxwell
 
Thanks Wullie - that is certainly welcomed news, as I am very impressed with what they offer. I'll move them onto the short list, as I consider which direction to go....
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top