Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regular Expression Implementation

Status
Not open for further replies.

ArcAnjel

Programmer
Feb 9, 2005
13
US
Hello folks,

I'm sort of new at this, so bare with me. I have a search function on my website for a part number cross reference and I'd like to add the capability to use regular expressions in the search. Right now the user must type the search string almost exactly as it is in the database (it does ignore a few special characters), but I would like to add the option to include some of PERL's regular expression characters like *, [], and ?. Anyone know of an easy way to do this? I could spend a few days trying to figure it out how to program my own code, but I'd rather not.

-Carl
 
If it is something that must be looked up in a mySQL database anyway, then why dont you let mySQL do the job?

e.g. write on your website that people can make an * if they dont know the last/some part of the partnumber. And then let mysql do the work.
 
Most DBMSs don't have regex support at the SQL level, but they do have limited search capabilities using LIKE '%string%'. The main issue with this is that if the column you want isn't in an index, it will have to scan the table to do the match. For small tables it may be quick enough, but if you have millions of rows it's a non-starter. For indexed columns, LIKE 'string%' is more efficient than LIKE '%string%' because it can use the index to find things beginning with string. But of course, that's not usuallly what you want. SQL Server 2000 has a full-text indexing capability, and you could probably simulate this on other systems with a bit of work (extract the words to another table and use it as an index back to the original table).

To get the perl regex capability, you'd have to retrieve all the rows, and then do the matching yourself, which would be hugely inefficient.
 
Actually, it isn't running on a SQL database. It is running out of a simple text file with a PERL script doing the searching. The text file is setup like:

CompetitorsPN,OurPN
CompetitorsPN,OurPN
CompetitorsPN,OurPN
etc...

The script just reads a line, splits at the comma, compares the CompetitorsPN to the user's input, and then prints it if it matches. The entire thing is like 20 lines of code.

I know this isn't the most efficient way, but the file isn't very long (a few hundred lines) and I have never used SQL or any other Database. I told you I was new at this...

-Carl
 
Hi Carl

Can you give an example of just how 'vague' an input pattern might be? i.e. What might you try to look for should the regex feature be implemented within your script.


Kind Regards
Duncan
 
Sure thing!

Currently the user can input the part number with almost any punctuation they want. The search copies the number and then removes the punctuation so only the alphanumerics are left. Then compares to the list. I will still ignore punctuation (because there is no telling how a user will input this), but I would also like the ability to allow the user some vagueness in their search. There are two ways that I have considered to implement this.

1.Either an "Exactly.../Begins With.../Contains..." choice where the user inputs into the field a string and then chooses one of the above.
2. A regex choice that would allow the user to put in * (for anything or nothing else) or ? (for exactly one character). I think these two functions are all I will really need.

I would prefer case 2 as it can be re-presented as case 1 if the need arises (i.e. Contains = *STRING*, Begins With = STRING*).
 
For those interested here is the code. The HTML for presentation is replaced with <HTML>.

#!/usr/local/bin/perl
require "cgi-lib.pl";
&ReadParse(*input);
$in_co = $input{Company};
$in_pn = $input{PN};
$in_pn2 = $in_pn;
while($in_pn2 =~ m/(\.|\,|\+|\-|[a-z])/) {
$in_pn2 =~ s/(\.|\,|\+|\-)//;
$in_pn2 =~ tr/[a-z]/[A-Z]/;
}
print "Content-type: text/html\n\n";
$XREF = '/data/xref-'.$in_co.'.data';
open(INPUT, "$XREF");
while(<INPUT>) {
push (@list, $_);
}
close(INPUT);
print qq!<HTML>!;
for($i=0;$i<@list;$i++) {
($xref_pn, $lamb_pn) = split(/,/, $list[$i]);
$xref_pn2 = $xref_pn;
while($xref_pn2 =~ m/(\.|\,|\+|\-)/) {
$xref_pn2 =~ s/(\.|\,|\+|\-)//;
}
if($in_pn2 eq $xref_pn2) {
print qq!<HTML>$in_co<HTML>$xref_pn<HTML>$lamb_pn!;
}
}
 
I would suggest - depending on how large the database is - simply inserting there pattern into a regex with .* either side - this would return any matching records with the pattern anywhere in the string

or you could split up the string - e.g. DUNCAN - and convert to .*D.*U.*N.*C.*A.*N.* this would be cute as if the user could remember sections of the part number (i guess!?) then it would still find. I use this method for something i do and it is very successful. Also ignore case on any searches. Your regex capability certainly seems up to scratch so i am sure you will understand what i am trying to get at.

a strange example of the capability of this:-
if someone knew of MRSA they could type this in the pattern
this would be converted (using split and join) to .*M.*R.*S.*A.*
this would find methicillin resistant staphylococcus aureus

... an odd example i am aware - but it explains my thought process!


Kind Regards
Duncan
 
What happens if you replace the line:
Code:
if($in_pn2 eq $xref_pn2) {
with
Code:
if($xref_pn2 =~ /$in_pn2/) {

Does that let you enter regular expressions? (I'm not somewhere that I can test this right now, but I think it should work.)
 
As long as the user understands regular expressions and enters them correctly that will work fine Aardvark.

Mike

You cannot really appreciate Dilbert unless you've read it in the
original Klingon.

Want great answers to your Tek-Tips questions? Have a look at faq219-2884

 
Actually, you're almost right:

if($xref_pn2 =~ /($in_pn2)/) {

Thanks ALL!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top