Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

CGI search script:exclusion list. 1

Status
Not open for further replies.

Perlwannabe

Technical User
Mar 7, 2006
57
IT
Hi everyone,

I'm not a Perl programmer at all.
I have an elementary script to create a little search engine,and I would create a filter in order to prevent that certain terms are considered suitable for the search.This in order to avoid to get search results for common short words such as "any" "and" "for" and so on.
The script provides a length filter that cuts off all typed words shorter than a certain length,but it can't work at all as there are,in my case,short terms that,instead are suitable for the search,such as "kit" or "cap" and so on...So I must give up this idea.

Alternatively,I could store these "stop words" in an external list and create a proper expression to handle them,so that they will be simply ignored for the search,but I have no idea about how to do.

I noticed that the script provides a smut filter that works with an exclusion list,censoring all the records containing smut words;I think this could work also for the "stop words",but I should modify the code in order to obtain that my "stop words" are simply ignored for the search,with no censor.
Could anyone please help me to edit properly this code?

Here is the smut filter:

# Get the smut filter information

unless (open (DATA,"$smutfile")) {die (&error);}
if ($uselock eq '1') {
flock DATA, 2;
seek DATA, 0, 0;
}
@smutinfo = <DATA>;
if ($uselock eq '1') {
flock DATA, 8;
}
close (DATA);
foreach $smutline (@smutinfo){
$smutfilter = $smutfilter.$smutline;
@smutwords = split (/::/,$smutfilter);
}



Here is the search routine:

# Routine for 'words' search

if ($FORM{'mode'} eq "words") {
chomp($FORM{'keywords'});
$searchstring=$FORM{'keywords'};
@words = split (/ /,$searchstring);
foreach $word (@words) {
$wordlength = length($word);
# length filter:
if ($wordlength < $minword) {
$word = split (/ /,$searchstring);
}

}
&heading;
$entries = @input;
if ($position == 0) {
$currentline = $entries;
} else {
$currentline = $position;
}
$found="0";
print "<CENTER><FONT $font SIZE=3><B>Search Results : </B>'$FORM{'keywords'}'<P></FONT></CENTER>";
print "<HR WIDTH=400>";
print "<TABLE WIDTH=500><TR><TD ALIGN=LEFT><FONT $font SIZE=2>";
until ($found > 9 ¦¦ $currentline == 0) {
foreach $word (@words) {
if ($input[$currentline] =~ /$word/i) {
@data = split (/::/,$input[$currentline]);
if ($data[4] ne "") {
# begin smut filter function:
if ($safekey eq "on" && $match == 0) {
foreach $smutword (@smutwords) {
if ($input[$currentline] =~ /$smutword/i) {
$smut = 1;
}
}
unless ($smut == 1) {
print "<A HREF=\"$data[0]\"><B>$data[1]</B></A><BR>";
print "$data[4]<BR>";
print "<I>$data[0]</I><P>";
++$found;
++$match;
}
$smut = 0;
}
if ($safekey eq "off" && $match == 0) {
print "<A HREF=\"$data[0]\"><B>$data[1]</B></A><BR>";
print "$data[4]<BR>";
print "<I>$data[0]</I><P>";
++$found;
++$match;
# end smut filter function

}
}
}
}
--$currentline;
$match = 0;
}
}

If you need I also could post the whole script

Thanks in advance
Any help will be very appreciated

Sincerely
 
I already asked you to post the whole script on the other forum....
 
Oh! The world is small...

Well,I already posted the whole script on the other forum but nobody replies...

Thanks for interesting in

Sincerely
 
hmmm, I see you posted some more code but it didn't seem to be all the script and it also is missing a right-curly bracket somewhere in the script.
 
Here is the full script (it was too large for the other forum...):



Code:
#!/usr/local/bin/perl



	$sitetitle = 'Your Search Engine Name';

# Change this to the url of your seach engine index page

	$searchurl = '[URL unfurl="true"]http://www.blabla.com/search/index.html';[/URL]

# Change this to your e-mail address
# Depending on your version of PERL you may have to escape
# the @ sign like this \@
# At first try just entering your straight e-mail address

	$searchemail = 'you@youremail.com';

# You may not need to change the $mailprogram variable. 
# Try it as is first. If it doesn't work try putting a # in 
# front of the first line below and remove the # on the 
# second line. If that fails, try the removing the # on
# the third line and put a # in front of the other two. 
# If that fails, ask your administrator where the sendmail
# program is on your system. 

	$mailprogram = '/usr/sbin/sendmail';
	# $mailprogram = '/usr/lib/sendmail';
	# $mailprogram = '/usr/bin/sendmail';

# Change this to the PATH (not the URL) of the base.txt file 
# (include the filename)

	$base = '/home/bla/public_html/search/base.txt';

# Change this to the PATH (not the URL) of the head.txt file 
# (include the filename)

	$headfile = '/home/bla/public_html/search/head.txt';

# Change this to the PATH (not the URL) of the foot.txt file 
# (include the filename)

	$footfile = '/home/bla/public_html/search/foot.txt';

# Change this to the PATH (not the URL) of the respond.txt file 
# (include the filename)

	$respondfile = '/home/bla/public_html/search/respond.txt';


# Change this to the PATH (not the URL) of the smut.txt file
# Any word found in smut.txt is assumed to be adult material
# therefore you can control what is censored and what isn't 
# (include the filename)

	$smutfile = '/home/bla/public_html/search/smut.txt';

# Change this to the URL of this script 
# (include the filename)

	$scripturl = '[URL unfurl="true"]http://www.blabla.com/cgi-bin/easysearch.cgi';[/URL]

# Edit this one to choose the font for the search results
# DO NOT use " or any special characters
# Use below for an example of what is allowed
# Also do not set a font size as the script does this automatically

	$font = 'FACE=arial,helvetica COLOR=000000';

# Change this to the minimum search word length
# This is to exclude searches for "the", "and", "a", etc.

	$minword = '3';

# Enter the maximum number of characters you want to allow for
# the 'title' field for new site submissions

	$maxtitle = '50';

# Enter the maximum number of characters you want to allow for
# the 'description' field for new site submissions

	$maxdescription = '150';

# Enter the maximum number of characters you want to allow for
# the 'keywords' field for new site submissions

	$maxkeywords = '50';

# How many URLs do you want displayed on the New URLs page

	$numnew = '3';

# If you want to use flock to avoid corrupt files by double access 
# leave this line as is...if you don't then change the 1 to a 0

	$uselock = '1';

# If you want to automatically send an autorespond e-mail to visitors
# who submit their URL to the database then leave this line as is
# If you don't, then change the 1 to a 0

	$userespond = '1';


# Get the form variables

	if ($ENV{'REQUEST_METHOD'} eq 'GET') {
        	$buffer = $ENV{'QUERY_STRING'};
	}	
	else {
        	read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
	}

# Break em up into a format the script can read

	@pairs = split(/&/, $buffer);
	foreach $pair (@pairs) {
        	($name, $value) = split(/=/, $pair);
        	$value =~ tr/+/ /;
        	$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
        	$FORM{$name} = $value;
	}

# Get the heading information

	unless (open (DATA,"$headfile")) {die (&error);}
		if ($uselock eq '1') {
			flock DATA, 2;
			seek DATA, 0, 0;
		}
		@headinfo = <DATA>;
		if ($uselock eq '1') {
			flock DATA, 8;
		}
	close (DATA);
	foreach $headline (@headinfo){
		$heading = $heading.$headline;
	}

# Get the footer information

	unless (open (DATA,"$footfile")) {die (&error);}
		if ($uselock eq '1') {
			flock DATA, 2;
			seek DATA, 0, 0;
		}
		@footinfo = <DATA>;
		if ($uselock eq '1') {
			flock DATA, 8;
		}
	close (DATA);
	foreach $footline (@footinfo){
		$footer = $footer.$footline;
	}

[b]# Get the smut filter information

	unless (open (DATA,"$smutfile")) {die (&error);}
		if ($uselock eq '1') {
			flock DATA, 2;
			seek DATA, 0, 0;
		}
		@smutinfo = <DATA>;
		if ($uselock eq '1') {
			flock DATA, 8;
		}
	close (DATA);
	foreach $smutline (@smutinfo){
		$smutfilter = $smutfilter.$smutline;
		@smutwords = split (/::/,$smutfilter);
	}[/b]

# Determine what part of the script we need

	if ($FORM{'action'} eq "showadd") {
		&showadd;
	}

	if ($FORM{'action'} eq "addurl") {
		&addurl;
	}

	if ($FORM{'action'} eq "newurls"){
		&newurls;
	}

	if ($FORM{'action'} eq "randomurl"){
		&randomurl;
	}


# Assign shorter variable names 
# (Laziness on my part - but I find the longer 
# a script gets the more work typing long 
# variable names becomes.)

	$position = $FORM{'code'};
	$addshow = 0;
	$noshow = 0;
	$match = 0;
	if ($FORM{'safe'} ne "on") {
		$safekey = "off";
	} else {
		$safekey = "on";
	}

# Begin the search process and output the results

	unless (open (DATA,"$base")) {die (&error);}
	if ($uselock eq '1') {
		flock DATA, 2;
		seek DATA, 0, 0;
	}
	@input = <DATA>;
	if ($uselock eq '1') {
		flock DATA, 8;
	}
	close (DATA);

[b]# Routine for 'words' search

	if ($FORM{'mode'} eq "words") {
		$searchstring=$FORM{'keywords'};
		@words = split (/ /,$searchstring);
		foreach $word (@words) {
			$wordlength = length($word);
			if ($wordlength < $minword) {
				&stringshort;
			}
		}
		&heading;
		$entries = @input;
		if ($position == 0) {
			$currentline = $entries;	
		} else {
			$currentline = $position;
		}
		$found="0";
		print "<CENTER><FONT $font SIZE=3><B>Search Results : </B>'$FORM{'keywords'}'<P></FONT></CENTER>";
		print "<HR WIDTH=400>";
		print "<FONT $font SIZE=2>";
		until ($found > 9 || $currentline == 0) {
			foreach $word (@words) {
				if ($input[$currentline] =~ /$word/i) {
					@data = split (/::/,$input[$currentline]);
					if ($data[4] ne "") {
						if ($safekey eq "on" && $match == 0) {
							foreach $smutword (@smutwords) {
								if ($input[$currentline] =~ /$smutword/i) {
									$smut = 1;
								}
							}	
							unless ($smut == 1) {
							print "<A HREF=\"$data[0]\"><B>$data[1]</B></A><BR>";
                                		print "$data[4]<BR>";
							print "<I>$data[0]</I><P>";
							++$found;
							++$match;
						}
						$smut = 0;
					}
						if ($safekey eq "off" && $match == 0) {
							print "<A HREF=\"$data[0]\"><B>$data[1]</B></A><BR>";
                                		print "$data[4]<BR>";
							print "<I>$data[0]</I><P>";
							++$found;
							++$match;
						}
					}
				}
			}
			--$currentline;
			$match = 0;
		}  
	}[/b]

# Routine for 'phrases' search

	if ($FORM{'mode'} eq "phrases") {
		$searchstring=$FORM{'keywords'};
			$wordlength = length($FORM{'keywords'});
			if ($wordlength < $minword) {
				&phrase;
			}
		&heading;
		$entries = @input;
		if ($position == 0) {
			$currentline=$entries;	
		} else {
			$currentline = $position;
		}
		print "<CENTER><FONT $font SIZE=3><B>Search Results : </B>'$FORM{'keywords'}'<P></FONT></CENTER>";
		print "<HR WIDTH=400>";
		print "<FONT $font SIZE=2>";
		until ($found > 9 || $currentline == 0) {
			if ($input[$currentline] =~ /$FORM{'keywords'}/i) {
				@data = split (/::/,$input[$currentline]);
				if ($data[4] ne "") {
					if ($safekey eq "on") {
							foreach $smutword (@smutwords) {
								if ($input[$currentline] =~ /$smutword/i) {
									$smut = 1;
								}
							}					
							unless ($smut == 1) {
							print "<A HREF=\"$data[0]\"><B>$data[1]</B></A><BR>";
                                		print "$data[4]<BR>";
							print "<I>$data[0]</I><P>";
							++$found;
						}
						$smut = 0;
					}
					if ($safekey eq "off") {
						print "<A HREF=\"$data[0]\"><B>$data[1]</B></A><BR>";
                                	print "$data[4]<BR>";
						print "<I>$data[0]</I><P>";
						++$found;
					}
				}
			}
			--$currentline;
		}  
	}
	print "</FONT>"; 
	&footer;

#################   SUBROUTINES   ######################

sub heading {
	print "Content-type: text/html\n\n";
	print "$heading";
}

sub footer {
	$keyencode=$FORM{'keywords'};
	$keyencode =~ tr/ /+/;
	if ($found > 9) {
		$position=$currentline;
		print "<CENTER><FONT $font size=3><A HREF=\"$scripturl?keywords=$keyencode&code=$position&mode=$FORM{'mode'}&safe=$safekey \"><B>More Results</B></A><BR><HR WIDTH=400></FONT></CENTER>";
	}
	else {
		unless ($addshow == 1) {
			print "<CENTER><FONT $font SIZE=2><B>End of Results.</B><BR><HR WIDTH=400></FONT></CENTER>\n";
		}
	}
	unless ($noshow == 1) {
		unless ($addshow == 1) {
			print "<CENTER><P><FORM METHOD=post ACTION=$scripturl><TABLE><TR><TD VALIGN=TOP><FONT $font SIZE=3><B>Search For :</B></FONT></TD><TD><INPUT TYPE=TEXT NAME=keywords SIZE=25 VALUE=\"$FORM{'keywords'}\"><BR><FONT $font SIZE=2><B>Mode :</B><INPUT TYPE=\"radio\" NAME=\"mode\" VALUE=\"words\" CHECKED>Words<INPUT TYPE=\"radio\" NAME=\"mode\" VALUE=\"phrases\">Phrase<BR><B>Safe : </B><INPUT TYPE=\"checkbox\" NAME=\"safe\" CHECKED>Omit Offensive Slang</FONT></TD><TD VALIGN=TOP ALIGN=CENTER WIDTH=60><INPUT TYPE=SUBMIT VALUE=\"Search!\"></TD></TR></TABLE></FORM><p></CENTER>\n";
		}
		if ($FORM{'keywords'} ne "") {
			print "<CENTER><FONT $font size=2><B>Search for \"$FORM{'keywords'}\" in these search engines...<br><A HREF=\"[URL unfurl="true"]http://www.altavista.com/cgi-bin/query?pg=q&what=web&q=$keyencode\">AltaVista</A>[/URL] <A HREF=\"[URL unfurl="true"]http://search.dejanews.com/dnquery.xp?query=$keyencode&defaultOp=AND&svcclass=dncurrent&maxhits=20\">DejaNews</A>[/URL] <A HREF=\"[URL unfurl="true"]http://search.excite.com/search.gw?search=$keyencode\">Excite</A>[/URL] <A HREF=\"[URL unfurl="true"]http://guide-p.infoseek.com/Titles/?qt=$keyencode\">GO[/URL] Network</A> <A HREF=\"[URL unfurl="true"]http://www.hotbot.com/?MT=$keyencode&SM=MC&DV=0&LG=any&DC=10&DE=2&_v=2&OPs=MDRTP&Search.x=38&Search.y=15\">HotBot</A>[/URL] <A HREF=\"[URL unfurl="true"]http://www.lycos.com/cgi-bin/pursuit?query=$keyencode&maxhits=20\">Lycos</A>[/URL] <A HREF=\"[URL unfurl="true"]http://www.webcrawler.com/cgi-bin/WebQuery?searchText=$keyencode&maxHits=20\">WebCrawler</A>[/URL] <A HREF=\"[URL unfurl="true"]http://search.yahoo.com/bin/search?p=$keyencode\">Yahoo!</A></B><P></CENTER>\n";[/URL]
		}
	}
	&generate;
	print "$footer";
	exit;
}

sub error {    
	$noshow = 1;
	&heading;
	print "<CENTER><FONT $font><h2>File Access Error</h2><P><B>You have an error in your PATH configuration variables in the $ENV{'SCRIPT_NAME'} file.</B><P>Your server reports that your BASE path is : $ENV{'DOCUMENT_ROOT'}<BR>Note that this is reported as your BASE path, not the FULL path to your files.<P>If you require help installing this script please consider purchasing the professional version of this script. Your purchase includes full tech support and installation.<P><B>Get it at : <A HREF=[URL unfurl="true"]http://www.getperl.com/easysearch/>http://www.getperl.com/easysearch/</A></B></FONT></CENTER><P>\n";[/URL]
	&footer;
}

sub stringshort {
	$noshow = 1;
      print "Content-type: text/html\n\n";
	&heading;
      print "<CENTER><FONT $font><h2>Word Too Short</h2><P><B>Sorry...each word must be at least $minword characters long.</B></FONT></CENTER><P>\n";
	&footer;
}

sub phrase {
	$noshow = 1;
      print "Content-type: text/html\n\n";
	&heading;
      print "<CENTER><FONT $font><h2>Phrase Too Short</h2><P><B>Sorry...your phrase must be at least $minword characters long.</B></FONT></CENTER><P>\n";
	&footer;
}

sub generate {
	print "<HR WIDTH=400>\n";
	print "<FONT $font SIZE=2><CENTER><P>Powered by : <A HREF=\"[URL unfurl="true"]http://www.getperl.com/\"><B>EasySearch</B></A>[/URL] - Copyright 1999 by Thomas J. Delorme</CENTER></FONT><P>\n";
	print "<HR WIDTH=400>\n";
}

sub showadd {
	&heading;
	$addshow = 1;
	print "<CENTER><FONT $font SIZE=3><B>Add URL</B></FONT><HR WIDTH=400>\n";
	print "<FONT $font SIZE=2>Please fill out the following information and press the SUBMIT button.<BR>\n";
	print "<B>Please note that all fields are required</B>.</FONT><P>\n";
	print "<FORM METHOD=post ACTION=$scripturl>\n";
	print "<TABLE BORDER=0 CELLPADDING=0 CELLSPACING=5>\n";
	print "<TR><TD ALIGN=RIGHT><B><FONT $font SIZE=2>E-mail :</FONT></B></TD><TD><INPUT NAME=email TYPE=text SIZE=35><BR></TD></TR>\n";
	print "<TR><TD ALIGN=RIGHT><B><FONT $font SIZE=2>Site Title : </FONT></B></TD><TD><INPUT NAME=title TYPE=text SIZE=35 MAXLENGTH=$maxtitle><BR></TD></TR>\n";
	print "<TR><TD ALIGN=RIGHT><B><FONT $font SIZE=2>Url : </FONT></B></TD><TD><INPUT NAME=url TYPE=text SIZE=35 value=[URL unfurl="true"]http://><BR></TD></TR>\n";[/URL]
	print "<TR><TD ALIGN=RIGHT><B><FONT $font SIZE=2>Description : </FONT></B></TD><TD><INPUT NAME=description TYPE=text SIZE=35 MAXLENGTH=$maxdescription><BR></TD></TR>\n";
	print "<TR><TD VALIGN=TOP ALIGN=RIGHT><FONT $font SIZE=2><B>Keywords : </B><BR><I>(Commas, no spaces)</I></FONT></TD><TD VALIGN=TOP><INPUT NAME=keywords TYPE=text SIZE=35 MAXLENGTH=$maxkeywords><BR></TD></TR>\n";
	print "</TABLE>\n";
	print "<INPUT TYPE=checkbox NAME=send CHECKED><FONT $font SIZE=2> I would like to receive occassional news from $sitetitle.</FONT><P>\n";
	print "<INPUT TYPE=hidden NAME=action VALUE=addurl>\n";
	print "<INPUT TYPE=submit VALUE=Submit> <INPUT TYPE=reset VALUE=Clear></FORM></CENTER>\n";
	&footer;
}

sub addurl {
	&heading;
	$noshow = 1;
	unless ($FORM{'url'} =~ /http:\/\//) {
		&submiterror;
	}

	if ($FORM{'url'} eq "" || $FORM{'title'} eq "" || $FORM{'email'} eq "" || $FORM{'description'} eq "" || $FORM{'keywords'} eq "") {
		&empty;
	}

	unless (open (DATA,"$base")) {die (&error);}
	if ($uselock eq '1') {
		flock DATA, 2;
		seek DATA, 0, 0;
	}
	@input = <DATA>;
	if ($uselock eq '1') {
		flock DATA, 8;
	}
	close (DATA);
	$entries = @input;

	$urlsearch = "$FORM{'url'}"."::";
	$urltemp = $FORM{'url'};
	chomp($urltemp);
	chop($urltemp);
	$urlsearchtwo = "$urltemp"."::";
	$urlsearchthree = "$FORM{'url'}"."/::";
	$currentline = 0;
	until ($currentline == $entries) {
		if ($input[$currentline] =~ /$urlsearch/i) {
			&exists;
		}
		if ($input[$currentline] =~ /$urlsearchtwo/i) {
			&exists;
		}
		if ($input[$currentline] =~ /$urlsearchthree/i) {
			&exists;
		}
		++$currentline;
	}  

	$testline = $input[$currentline-1];
	$testline2 = $input[$currentline-2];
	$testline3 = $input[$currentline-3];
	$testline4 = $input[$currentline-4];
	$testline5 = $input[$currentline-5];
	$testline6 = $input[$currentline-6];
	$testline7 = $input[$currentline-7];
	$testline8 = $input[$currentline-8];
	$testline9 = $input[$currentline-9];
	$testline10 = $input[$currentline-10];

	if ($testline =~ /$FORM{'description'}/) {
		&samestuff;
	}

	if ($testline =~ /$FORM{'title'}/) {
		&samestuff;
	}

	if ($testline =~ /$FORM{'keywords'}/) {
		&samestuff;
	}

	if ($testline =~ /$FORM{'email'}/i || $testline2 =~ /$FORM{'email'}/i || $testline3 =~ /$FORM{'email'}/i || $testline4 =~ /$FORM{'email'}/i || $testline5 =~ /$FORM{'email'}/i || $testline6 =~ /$FORM{'email'}/i || $testline7 =~ /$FORM{'email'}/i || $testline8 =~ /$FORM{'email'}/i || $testline9 =~ /$FORM{'email'}/i || $testline10 =~ /$FORM{'email'}/i) {
		&justsubmitted;
	}

	$newemail = $FORM{'email'};
	if ($FORM{'send'} ne "on") {
		$newemail = "X"."$newemail"."X";	
	}
	$newtitle = substr($FORM{'title'},0,$maxtitle);
	$newdesc = substr($FORM{'description'},0,$maxdescription);
	$newkeywords = substr($FORM{'keywords'},0,$maxkeywords);
	$line = join ("::","$FORM{'url'}","$newtitle","$newkeywords","$newemail","$newdesc");

	unless (open (DATA,">>$base")) {die (&error);}
		if ($uselock eq '1') {
			flock DATA, 2;
			seek DATA, 0, 2;
		}
		print DATA "$line\n";
		if ($uselock eq '1') {
			flock DATA, 8;
		}
	close (DATA);

	print "<center><HR WIDTH=400><FONT $font SIZE=3><b>Submission Received</b></font><p>\n";
	print "<FONT $font SIZE=2><B>The following submission has been received by $sitetitle :</B></font><br>\n";
	print "<table width=400><tr><td align=right><FONT $font SIZE=2><B>URL :</B></font></td>\n";
	print "<td><a href=\"$FORM{'url'}\"><FONT $font SIZE=2><b>$FORM{'url'}</b></font></a></td></tr>\n";
	print "<tr><td align=right><FONT $font SIZE=2><B>Title :</B></font></td><td><FONT $font SIZE=2> $FORM{'title'}</td></tr>\n";
	print "<tr><td align=right><FONT $font SIZE=2><B>Keywords :</B></font></td><td><FONT $font SIZE=2> $FORM{'keywords'}</td></tr>\n";
	print "<tr><td align=right><FONT $font SIZE=2><B>Description :</B></font></td><td><FONT $font SIZE=2> $FORM{'description'}</td></tr>\n";
	print "<tr><td align=right><FONT $font SIZE=2><B>E-mail :</B></font></td><td><FONT $font SIZE=2> $FORM{'email'}</font></td></tr></table></center><p>\n";

	if ($userespond eq '1') {
		unless (open (DATA,"$respondfile")) {die (&error);}
			if ($uselock eq '1') {
				flock DATA, 2;
				seek DATA, 0, 0;
			}
			@respondinfo = <DATA>;
			if ($uselock eq '1') {
				flock DATA, 8;
			}
		close (DATA);
		foreach $respondline (@respondinfo){
			$respondmessage = $respondmessage.$respondline;
		}

		open (MAIL, "|$mailprogram -t");
		print MAIL "To: $FORM{'email'}\n";
		print MAIL "From: $searchemail\n";
		print MAIL "Subject: Got it!\n\n";
		print MAIL "Welcome to $sitetitle!\n";
		print MAIL "We are located at $searchurl\n\n";
		print MAIL "YOUR SUBMISSION:\n";
		print MAIL "------------------------------------------------------------------\n";
		print MAIL "URL : $FORM{'url'}\n"; 
		print MAIL "Title : $FORM{'title'}\n";
		print MAIL "Description : $FORM{'description'}\n";
		print MAIL "Keywords : $FORM{'keywords'}\n";
		print MAIL "E-mail : $FORM{'email'}\n";
		print MAIL "------------------------------------------------------------------\n\n";
		print MAIL "$respondmessage";
		print MAIL "------------------------------------------------------------------\n\n";
		print MAIL "Thanks again,\n";
		print MAIL "---------------------------------------------\n";
		print MAIL "$sitetitle\n";
		print MAIL "$searchemail\n";
		print MAIL "$searchurl\n";
		print MAIL "---------------------------------------------\n";
		print MAIL "Powered by :\n";
		print MAIL "EasySearch - Copyright 1999\n";
		print MAIL "[URL unfurl="true"]http://www.getperl.com\n";[/URL]
		print MAIL "---------------------------------------------\n";
		close (MAIL);
	}
	&footer;
}

sub exists {
	$noshow = 1;
      print "<CENTER><FONT $font SIZE=2><h2>URL Already Exists</h2><BR>$FORM{'url'}<HR WIDTH=400><B>Sorry...Each URL is only allowed one entry.</B><P>\n";
	&footer;
}

sub samestuff {
	$noshow = 1;  
      print "<CENTER><FONT $font SIZE=2><h2>Recent URL Submission</h2><HR WIDTH=400><TABLE WIDTH=400><TR><TD><B>You have recently submitted an URL with either the exact same title, description, or keywords. Since this is a different URL, please change your title, description and keywords to match this new page.</B></TD></TR></TABLE><P>\n";
	&footer;
}

sub justsubmitted {
	$noshow = 1;
      print "<CENTER><FONT $font SIZE=2><h2>Recent URL Submission</h2><HR WIDTH=400><B>In order to avoid domain name overflow on the Newest URLs page,<BR> we ask that you try your submission again later.</B><P>\n";
	&footer;
}


sub empty {
	$noshow = 1;
      print "<CENTER><FONT $font SIZE=2><h2>Field Empty</h2><HR WIDTH=400><B>Please make sure that you have filled in all fields on the form.</B><P>\n";
	&footer;
}

sub submiterror {
	$noshow = 1;
      print "<CENTER><FONT $font SIZE=2><h2>Invalid URL</h2><HR WIDTH=400><B>Please make sure that your URL contains <B>[URL unfurl="true"]http://</B>[/URL] and is correct.</B><P>\n";
	&footer;
}

sub newurls {
	&heading;
	unless (open (DATA,"$base")) {die (&error);}
	if ($uselock eq '1') {
		flock DATA, 2;
		seek DATA, 0, 0;
	}
	@input = <DATA>;
	if ($uselock eq '1') {
		flock DATA, 8;
	}
	close (DATA);
	$entries = @input;
	print "<FONT $font SIZE=3><CENTER><B>Newest URLs : </B><HR WIDTH=400></CENTER></FONT>";
	$currentline = $entries;
	print "<FONT $font SIZE=2>";
	$count = 0;
	until ($count == $numnew) {
		@data = split (/::/,$input[$currentline]);
        	if ($data[4] ne "") {
			foreach $smutword (@smutwords) {
				if ($input[$currentline] =~ /$smutword/i) {
					$smut = 1;
				}
			}
			unless ($smut == 1) {
				print ("<a href=\"$data[0]\"><B>$data[1]</B></a><br>");
                		print ("$data[4]<br>");
		    		print ("<I>$data[0]</I><P>");
				++$count;
			}
			$smut = 0;
		}
		--$currentline;
	}
	print "</FONT>";
	&footer;
}

sub randomurl {
	&heading;
	unless (open (DATA,"$base")) {die (&error);}
	if ($uselock eq '1') {
		flock DATA, 2;
		seek DATA, 0, 0;
	}
	@input = <DATA>;
	if ($uselock eq '1') {
		flock DATA, 8;
	}
	close (DATA);
	$entries = @input;
	print "<FONT $font SIZE=3><CENTER><B>Random URL : </B><HR WIDTH=400></CENTER></FONT>";
	$count=0;
	while ($count != 1)  {
		srand (time + $$);
		$currentline = int( rand ($entries));
		print "<FONT $font SIZE=2>";
		@data = split (/::/,$input[$currentline]);
		foreach $smutword (@smutwords) {
			if ($input[$currentline] =~ /$smutword/i) {
				$smut = 1;
			}
		}
		unless ($smut == 1) {
			if ($data[4] ne "") {
                			print "<A HREF=\"$data[0]\"><B>$data[1]</B></A><BR>";
                			print "$data[4]<BR>";
		    		print "<I>$data[0]</I></FONT><P>";
				$count=1;
			}
		}
		$smut = 0;
	}
	&footer;
}

Thanks for interesting in

Sincerely
 
Wow, that is one tortured script. But anyway, I am not even sure why you are bothering to want to filter out words like "and" when it seems you are only searching the one file: base.txt. The benefit of not searching for those common words would be negligible at best. If you were searching a very large database and had to reduce the number of irrelevant hits I could see filtering out common words, but how much data is there in the searh file (base.txt) anyway?
 
If you were searching a very large database and had to reduce the number of irrelevant hits I could see filtering out common words, but how much data is there in the searh file (base.txt) anyway?

It's just the crucial question:

Don't let you deceive from the fact that script search in a .txt file.The file base.txt act as a database and actually contains hundreds of records to search and it could be furtherly extended (or,in the future, replaced with a real database).
So I really have the need to filter out the search excluding irrelevant terms,you see?

Could you help me to achieve this result?

Again thanks for your interesting

Sincerely
 
first back-up your existing script. Now make these changes and see how it works.

change this:

$minword = '3';

to:

$minword = 2;

change this:

Code:
        foreach $word (@words) {
            $wordlength = length($word);
            if ($wordlength < $minword) {
                &stringshort;
            }

to:

Code:
        foreach $word (@words) {
            next if (length($word) < $minword);                
        }

this will skip words less than three bytes in length instead of displaying the error messages about short words.

change this:

Code:
    if ($FORM{'mode'} eq "words") {
        $searchstring=$FORM{'keywords'};
        @words = split (/ /,$searchstring);

to:

Code:
    if ($FORM{'mode'} eq "words") {
        $searchstring=$FORM{'keywords'};
        @words = split (/[b]\s+[/b]/,$searchstring);

this will make sure words don't have spaces before or after them like your current script will.

Just below this line:

$userespond = '1';

add:

Code:
# common words I want to skip in searches
%skip = (
   and => 'and',
   the => 'the',
   for => 'for',
   too => 'too',
);

add more words you want to skip in the same format as you see above (keep the words all lower case):

word => 'word',


now the part above you changed to:

Code:
        foreach $word (@words) {
            next if (length($word) < $minword);                
        }

change it to:

Code:
        foreach $word (@words) {
            next if (length($word) < $minword);                
            next if (exists $skip{lc($word)}); 
        }

The script you are using is terrible, not just in my opinion. It was obviously written by a person with little understanding of security or even good perl programming experience. I would not use that script for anything. Use it at your own risk as-is or with the edits I have suggested.
 
oops, I see a mistake on my part above, this part:

Code:
        foreach $word (@words) {
            $wordlength = length($word);
            if ($wordlength < $minword) {
                &stringshort;
            }

just comment it out:

Code:
#        foreach $word (@words) {
#            $wordlength = length($word);
#            if ($wordlength < $minword) {
#                &stringshort;
#            }

find these two lines:

Code:
        until ($found > 9 || $currentline == 0) {
            foreach $word (@words) {

and replace with:

Code:
        until ($found > 9 || $currentline == 0) {
            foreach $word (@words) {
               next if (length($word) < $minword);                
               next if (exists $skip{lc($word)});
 
Thanks!

I'll try these modifies and I'll let you know.

Stay around...

Sincerely
 
Well,

Unfortunately it doesn't work...
I post the modifies,just to check if I made some synthax error:

Code:
...

	$uselock = '1';

# If you want to automatically send an autorespond e-mail to visitors
# who submit their URL to the database then leave this line as is
# If you don't, then change the 1 to a 0

	$userespond = '1';

[b]# common words I want to skip in searches
%skip = (
   and => 'and',
   the => 'the',
   for => 'for',
   too => 'too',
);[/b]


# Get the form variables

	if ($ENV{'REQUEST_METHOD'} eq 'GET') {
        	$buffer = $ENV{'QUERY_STRING'};
	}	
	else {
        	read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
	}

# Break em up into a format the script can read

	@pairs = split(/&/, $buffer);
	foreach $pair (@pairs) {
        	($name, $value) = split(/=/, $pair);
        	$value =~ tr/+/ /;
        	$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
        	$FORM{$name} = $value;
	}

# Get the heading information

	unless (open (DATA,"$headfile")) {die (&error);}
		if ($uselock eq '1') {
			flock DATA, 2;
			seek DATA, 0, 0;
		}
		@headinfo = <DATA>;
		if ($uselock eq '1') {
			flock DATA, 8;
		}
	close (DATA);
	foreach $headline (@headinfo){
		$heading = $heading.$headline;
	}

# Get the footer information

	unless (open (DATA,"$footfile")) {die (&error);}
		if ($uselock eq '1') {
			flock DATA, 2;
			seek DATA, 0, 0;
		}
		@footinfo = <DATA>;
		if ($uselock eq '1') {
			flock DATA, 8;
		}
	close (DATA);
	foreach $footline (@footinfo){
		$footer = $footer.$footline;
	}

# Get the smut filter information

open(DATA, "<$smutfile") or &error("bla-bla about the reason : $!"); 
if($uselock) {flock(DATA,1)} # note - no need for "2"; no need for seek() too 
@smutinfo = <DATA>; 
close(DATA);
	foreach $smutline (@smutinfo){
		$smutfilter = $smutfilter.$smutline;
		@smutwords = split (/\n/,$smutfilter);
	}
	

# Determine what part of the script we need

	if ($FORM{'action'} eq "showadd") {
		&showadd;
	}

	if ($FORM{'action'} eq "addurl") {
		&addurl;
	}

	if ($FORM{'action'} eq "newurls"){
		&newurls;
	}

	if ($FORM{'action'} eq "randomurl"){
		&randomurl;
	}


# Assign shorter variable names 
# (Laziness on my part - but I find the longer 
# a script gets the more work typing long 
# variable names becomes.)

	$position = $FORM{'code'};
	$addshow = 0;
	$noshow = 0;
	$match = 0;
	if ($FORM{'safe'} ne "on") {
		$safekey = "off";
	} else {
		$safekey = "on";
	}

# Begin the search process and output the results

	unless (open (DATA,"$base")) {die (&error);}
	if ($uselock eq '1') {
		flock DATA, 2;
		seek DATA, 0, 0;
	}
	@input = <DATA>;
	if ($uselock eq '1') {
		flock DATA, 8;
	}
	close (DATA);

# Routine for 'words' search

	[b]if ($FORM{'mode'} eq "words") {
        $searchstring=$FORM{'keywords'};
        @words = split (/\s+/,$searchstring);
#            foreach $word (@words) {
#            $wordlength = length($word);
#            if ($wordlength < $minword) {
#                &stringshort;
#           }[/b]
		}
            &heading;
		$entries = @input;
		if ($position == 0) {
			$currentline = $entries;	
		} else {
			$currentline = $position;
		}
		$found="0";
		print "<CENTER><FONT $font SIZE=3><B>Search Results : </B>'$FORM{'keywords'}'<P></FONT></CENTER>";
		print "<HR WIDTH=400>";
            print "<TABLE WIDTH=500><TR><TD ALIGN=LEFT><FONT $font SIZE=2>";
		[b]until ($found > 9 || $currentline == 0) {
            foreach $word (@words) {
               next if (length($word) < $minword);                
               next if (exists $skip{lc($word)}); 
				if ($input[$currentline] =~ /$word/i) {[/b]
					@data = split (/::/,$input[$currentline]);
					if ($data[4] ne "") {
						if ($safekey eq "on" && $match == 0) {
							foreach $smutword (@smutwords) {
								if ($input[$currentline] =~ /$smutword/i) {
									$smut = 1;
								}
							}	
							unless ($smut == 1) {
                                                print "<A HREF=\"$data[0]\"><B>$data[1]</B></A><BR>";
                                				print "$data[4]<BR>";
								print "<I>$data[0]</I><P>";
								++$found;
								++$match;
							}
							$smut = 0;
						}
						if ($safekey eq "off" && $match == 0) {
							print "<A HREF=\"$data[0]\"><B>$data[1]</B></A><BR>";
                                			print "$data[4]<BR>";
							print "<I>$data[0]</I><P>";
							++$found;
							++$match;
						}
					}
				}
			}
			--$currentline;
			$match = 0;
		}  
	}
...
 
right here:

Code:
#            foreach $word (@words) {
#            $wordlength = length($word);
#            if ($wordlength < $minword) {
#                &stringshort;
#           }
        }

looks like you need to comment that last }:

Code:
#            foreach $word (@words) {
#            $wordlength = length($word);
#            if ($wordlength < $minword) {
#                &stringshort;
#           }
#        }

give that a try
 
Hello,

I apologize for the lateness in my reply...
I got a try with your suggestions and I would say you a thing:

IT WORKS FINE !!!!!!!!

Finally we get away!

Well,

I thank you very very much,you've been very precious.
At your disposal if you need.

I see you,mate.

Sincerely
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top