Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Highlighting Keywords Regex Issue

Status
Not open for further replies.

DonP

IS-IT--Management
Jul 20, 2000
684
US
I have a simple function that is supposed to highlight keywords and it seems to do fairly well. However, I would like to make it highlight the complete word and not just any portion that was searched but have no idea how to do it as I am weak in regular expressions.

Code:
function Highlight($keywords, $text) {
   $StripCommon = array("the", "of", "to", "for");
   $keywords = str_replace($StripCommon, "", $keywords);
   $keywords = trim($keywords);

   if (strlen($keywords)) {
       $keywords = explode(" ", $keywords);

       if (count($keywords)) {
         foreach($keywords as $keyword) {
           $text = preg_replace('/(' . $keyword . ')/i', '<span class="HighlightKeywords">\\1</span>', $text);
         }
      }

   }

  return $text;
}

$keywords is space-delimited and $text contains HTML, both of which might be factors. Can someone help? Thanks.

Don
Experienced in HTML, Perl, PHP, VBScript, PWS, IIS and Apache and MS-Access, MS-SQL, MySQL databases
 
>$text = preg_replace('/(' . $keyword . ')/i', '<span class="HighlightKeywords">\\1</span>', $text);

How about this.
[tt]
$text = preg_replace('/\b\w*?' . $keyword . '\b\w*?'/i', '<span class="HighlightKeywords">\\0</span>', $text);
[/tt]
Such as this
[tt][green]Highlight("important", "<div>It is important to note that ...</div>");[/green][/tt]
returns
[tt][green]"<div>It is <span class="HighlightKeywords">important</span> to note that ...</div>");[/green][/tt]
whereas
[tt][green]Highlight("port", "<div>It is important to note that ...</div>");[/green][/tt]
returns the same.

But I have not look into more complicated keywords pattern with "common"'s or with space(s) in it. You may have to tweak it further.
 
Thanks! It looks like it might do the trick and I'll try it tonight.

One other thing I would like to do but I am not sure if it's possible and that's to make it highlight phrases if a portion is entered as a keyword and it's not otherwise a common word entered by itself (such as "the" or "it"), although some of the common words might be in the titles. I am thinking of recording and song titles primarily, which it can get for a match from the site's database. I already use some programming that automatically creates links from this items when I edit the entry. Can this be done and are there any tips to get me started? It's just a thought as otherwise just highlighting the keywords is great and thanks again.

Don
Experienced in HTML, Perl, PHP, VBScript, PWS, IIS and Apache and MS-Access, MS-SQL, MySQL databases
 
Amendment
At least I would like to correct a typo of writing up realtime. The corresponding line should be read this, reversing the second \b and \w*? and a surplus apos. Sorry for the confusion.

[tt]$text = preg_replace('/\b\w*?' . $keyword . '[red]\w*?\b[/red][highlight]/[/highlight]i', '<span class="HighlightKeywords">\\0</span>', $text);[/tt]
 
>if it's possible and that's to make it highlight phrases if a portion is entered as a keyword

Again without taking further consideration of common words, by its own it would already be pretty tough, I would say, for general $text. Normally, with some preprocessing the $text to some normal form, the task would be drastically reduced; but not without.

In the context of structurally marked up document, to add some semantic markup like highlight, I would propose something like this to start with. It supposes
[1] ". ? !" as the signature characters to end a phrase;
[2] semantic markup on text node only;
[3] a decimal separator (.) would be preprocessed as it will be the trouble-spot;
[4] the angle bracket, mathmetical operators would be preprocessed, except the mark up; ie like CDATA section;
[5] other unmentioned/uninvestigated holes.
[tt]
$text = preg_replace('/\b[^.?!><]*?' . $keyword . '[^.?!><]*?(\.+|\?+|!+)/i', '<span class="HighlightKeywords">$0</span>', $text);
[/tt]
 
Thanks, I'll check it out. Thanks also for the corrections, which I had noticed too. It seems to work just fine now although a couple of my original lines of code for common words seem to cause the script to time-out but only if the words are entered in the keyword search. If they are not, it works as it should.

Code:
$StripCommon = array("the", "of", "to", "for");
$keywords = str_replace($StripCommon, "", $keywords);




Don
Experienced in HTML, Perl, PHP, VBScript, PWS, IIS and Apache and MS-Access, MS-SQL, MySQL databases
 
How can this be made NOT link anything within URLs except for the linked text? It seems to be messing them up.

Also, when more than one keyword is entered, it does not highlight anything.

How can these things be fixed? It is currently using:

Code:
$text = preg_replace('/\b\w*?'. $keyword . '\w*?\b/i', '<span class="HighlightKeywords">$0</span>', $text);

Thanks!

Don
Experienced in HTML, Perl, PHP, VBScript, PWS, IIS and Apache and MS-Access, MS-SQL, MySQL databases
 
What I was trying for was probably overkill so now I am just trying again to make it highlight the keywords (the whole words if only a portion was searched). It works but it's highlighting only when a single keyword is entered or when a phrase is entered. If the keywords found in the document are separate, it highlights nothing. It is also finding and highlighting keywords within links, which breaks the link.

I have it broken down into a simple function. Can someone help me get the regex in order so as to not have the above problems? I am weak on regex myself and the tek-tips.com search engine has been down for nearly a week:

Code:
function Highlight($keywords, $text) {
  $keywords = trim($keywords);
    if (strlen($keywords)) {
      $keywords = explode(", ", $keywords);
        if (count($keywords)) {
          foreach($keywords as $keyword) {
            $pattern = '/\b\w*?'. quotemeta($keyword) .'\w*?\b/i';
            $replacement = '\\1<span class="HighlightKeywords">$0</span>';
            $text = preg_replace($pattern, $replacement, $text);
          }
        }
      }

    return $text;
}

Thanks!


Don
Experienced in HTML, Perl, PHP, VBScript, PWS, IIS and Apache and MS-Access, MS-SQL, MySQL databases
 
Anybody? Now I have this which does not break links that happen to have the keywords inside but it no longer highlights complete words and it still highlights only one keyword if several are searched that are not together. I know nothing about regular expressions so I hope someone can help!

Code:
function Highlight($keywords, $text) { 
   $keywords = array($keywords);
      if (count($keywords)) {
         foreach($keywords as $keyword) {
            $ind = eregi("([^>]*<)", $text, $ind);
            $replacement = "<b>$1</b>";
               if ($ind) {
                  $pattern = "/($keyword)(?=[^>]*<)/i";
				$text = preg_replace($pattern, $replacement, $text);
               } else {
                  $pattern = "/($keyword)/i";
				$text = preg_replace($pattern, $replacement, $text);
               }
          }
      }
   return $text;
}

To give credit where it's due, portions of this were gleaned from this site

Don
Experienced in HTML, Perl, PHP, VBScript, PWS, IIS and Apache and MS-Access, MS-SQL, MySQL databases
 
Status
Not open for further replies.

Similar threads

Part and Inventory Search

Sponsor

Back
Top