Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regex "AND" Operator 2

Status
Not open for further replies.

fischadler

Programmer
May 31, 2002
258
MT
I am very new to regular expressions. I have the code below which I use to find news records which match a particular pattern:

Code:
strPattern = "(cat)|(dog)"

StrippedString = stripHTMLtags(rstNews("NewsDetails"))
Set RegularExpressionObject = New RegExp
With RegularExpressionObject
  .Pattern    = strPattern
  .IgnoreCase = True
  .Global     = True
End With
Set ExpressionMatch = RegularExpressionObject.Execute(StrippedString)
If ExpressionMatch.Count > 0 Then
  'Do something
END IF

This code is inside a loop which goes through a series of records, and stripHTMLtags is a function to remove HTML tags.

This code works fine when looking for all articles which have either the word "cat" OR the word "dog". But I want to set it to find only the articles which have both "cat" AND "dog".

Tried replacing the "Pipe" character with other characters to no avail.

-Fischadler
 
[tt]strPattern = "cat.*?dog|dog.*?cat"[/tt]
 
OK, Thanks tsuji, it works. If I understand correctly, you've put two combinations so that if cat comes before dog or vice versa, it triggers a match. However, the match count only shows one, not two (one for each word). Beside, if I have more than two words, I would need to create all the combinations.

I think I am using the wrong approach here, perhaps you can give me a suggestion.

I am trying to make a search feature for my web site which daily news item stored in an Access DB. The search needs to allow searchers to search for "all words", "any word" and "exact phrase". I managed to arrive to "any word".

The mechanism I am using is probably primitive. First I get a recordset of just the records that meet the criteria using a LIKE operator. Then I go through each resulting record one by one and count the occurances and store the id and occurance count in an array and finally sort the results in order to bring those with the highest occurance at the top.

Any suggestions welcome. :)

-Fischadler
 
Disclaimer: this may not the best, easiest, or fastest method to use

You could score the records in your SQL statement. basically create your SQL statement to check if each word is in a record in the SELECT clause, and then order your records by the total number of words that are in the record. Your SQL would end up looking something like (assuming our search phrase was "search words":
SELECT title, article, author, etc, (IIF(InStr(article,'search') > 0,'search ','') & IIF(InStr(article,'words') > 0,'words ','')) AS words
FROM YourTable
WHERE article LIKE '%search%' OR article LIKE '%words%'
ORDER BY (InStr(article,'search') > 0) + (InStr(article,'words') > 0)

Since boolean True will be converted to -1, this will order your records by the ones having all of the search words. Additionally you will have a list of matching search words in the SELECT statement so you can output next to the results which words occur in the article.

The way you would implement something like this would be something like:
Code:
'assuming our search terms are in Request.Form("search")

Dim conn, rs, arrSearch, sql

'-- Create array of search terms
'quick check that there are actually search words
If Len(Trim(Request.Form("search"))) = 0 Then
   Response.Write "Can't search if you don't enter words"
   Response.End
End If

'might want to add a regular expression here to get rid of duplicate spaces

arrSearch = Split(Request.Form("search")," ")

'-- set up connection
Set conn = Server.CreateObject("ADODB.Connection")
conn.Open "connection string"

'-- create SQL statement
Dim i
sql = "SELECT title, article, author, etc, ("
'loop because we need the word in two positions
For i = 0 to UBound(arrSearch)
   If i > 0 Then sql = sql & " & "
   sql = sql & "IIF(InStr(article,'" & arrSearch(i) & "') > 0,'" & arrSearch(i) & " ','')
Next

sql = sql & ") AS words"
sql = sql & "FROM MyTable WHERE "

'build the OR's using a Join
sql = sql & "article LIKE '" & Join(arrSearch,"' OR article LIKE '") & "'"

'build the Order By with another Join
sql = sql & " ORDER BY (InStr(article,'" & Join(arrSearch,"') > 0) + (InStr(article,'") & "') > 0)"

'-- Output results
'execute the query
Set rs = conn.Execute(sql)
If rs.EOF Then
   Response.Write "no results"
Else
   rs.MoveFirst
End If

Response.Write "<table><tr><th>Title</th><th>Author</th><th>Search terms</th></tr>"
Do Until rs.EOF
   Response.Write "<tr><td>" & rs("title") & "</td><td>" & rs("author") & "</td><td>" & rs("words") & "</td></tr>"
   rs.MoveNext
Loop
Response.Write "</table>"

'-- cleanup
Set rs = Nothing
conn.Close
Set conn = Nothing

Thats basically an example, untried and possibly with a few errors. As I said earlier, this may not be the best method and I don't know how fast or slow it will run, but it does remove the need for your ASP code to handle the matching and re-ordering of the results.

-T

 
OK. My original question was about regular expressions, but it turns out that thanks to Tarwn's suggestion, I won't be needing to use them!

Thank you both!

-Fischadler
 
Just be careful to test a little, I wrote most of that Access SQL from a very rusty and dim memory, so there may be some issues hidden in it :)

 
Don't worry, it was clear enough for me to understand the logic behind it so that I could adapt it to my needs. I just had to add the "%" for the LIKE patterns. I knew about the SPLIT command but didn't know about the JOIN...so I also got to learn a new command. :)

One thing I noticed. Your sorting gives priority to those records in which all words occur over those in which less of the keywords match. Is it possible, in case there is just one search word, to give priority to those articles in which the keyword appears most? Kinda like counting the number of times the keywords occur. I think that when there are multiple keywords things will get complicated.

-Fischadler
 
Finding the number of occurrences of a word in a particular text field is going to be much, much more difficult. You would probably slow down your app quite a bit,a s you don't have the option of creating a stored procedure that is compiled and running independently like in SQL Server.
Most of the examples of this I have seen use loops and InStr to find the number of occurrences. I have seen another method that might work for you, as it should be faster than a set of loops.

By taking the length of the original field minus the length of the field after replacing your search word with an empty string, divided by the length of your search word, you get the number of occurrences for that word in the string. Similar to the previous method I outlined, you would probably want to do this for each individual word or as trying to do this for multiple words and using the average length would skew yor ranking towards shorter words.

To do this you would want your SQL statement to come out something like:
Code:
SELECT title, article, author, etc, (IIF(InStr(article,'search') > 0,'search ','') & IIF(InStr(article,'words') > 0,'words ','')) AS words, [highlight]((LEN(article) - LEN(REPLACE(article,'search')))/LEN('search')) + ((LEN(article) - LEN(REPLACE(article,'search')))/LEN('words')) AS Ranking[/highlight]
FROM YourTable
WHERE article LIKE '%search%' OR article LIKE '%words%'
ORDER BY [highlight]((LEN(article) - LEN(REPLACE(article,'search')))/LEN('search')) + ((LEN(article) - LEN(REPLACE(article,'search')))/LEN('words'))[/highlight] DESC

By using this length and replace method, we're still adding that much more computation to each record, but using the InStr in a loop method would be much, much more intense.

You could additionally create a search result field in the select for each word and replace only the number of results for that word, instead of a string of words and number.

-T

 
In the arena of string, from the point of view of general alphabetic building particles, it is an, not often consciously realized, incredibly large and complicate (free) group. No wonder it is always not a trial task to compare text...

From the approach using regexp, identifying n targets (cat, dog...), there are n! (behaving constant*n^(n+1/2)*exp(-n) for large n) combinations. Large though, the complete pattern can be built from the script. For realistic n, hopefully, it won't blow the system. Just build the pattern by the script. It does not need to hard script it. If the target strings are definitely identify, the regexp is still manageable.
 
I cannot get the REPLACE function to work in ASP. I get the error:

Microsoft OLE DB Provider for ODBC Drivers error '80040e14'

[Microsoft][ODBC Microsoft Access Driver] Undefined function 'REPLACE' in expression.

The exact same SQL query works perfectly when pasted in Access SQL view and run (after replacing the "%" with "*").

I looks like Access 2003 and ASP do not use the same engine.

-Fischadler
 
ADO doesn't offer all of the same capabilities as a query placed directly in Access, but I can't believe something as basic as Replace doesn't work. I know there is some oddness with built-in queries, but maybe try to build the query as a saved query in access then run it from your ASP code?

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top