Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Dictionary preg replace 1

Status
Not open for further replies.

ThomasJSmart

Programmer
Sep 16, 2002
634
Hi,

Iv been struggeling with this script for like a week -_- its giving me a headache...

what i want to make:

a php function that searches $string for $word and replaces $word for $replace. sounds simple enough right..

thing is... it cant replace $word if $word is not replacable by html code. For example if $word is part of a html link, or if $word is the name of an html tag or if $word is inside some javascript.. .etc.....

so lets say these are the vars:


Code:
$string = 'foo the bar whent over the <a href="foo.html">FOO</a> <b>foo!!</b> <script language="foo>var foo = 123;</script>';

$word = 'foo';

$replace = '<div name="bar1"><a href="foo.html">FOO!</a></div>';


now 'foo' should only be replaced in 2 of the above instances, the first and the foo between <b> tag.


now i more or less have this kind of working... with a regexp: /(?<!\<)$word(?!\>)/i

but... to make things more complicated... you notice the <div> in the $replace... this div needs to have a unique name in each replace.... barv1 bar2 bar3 etc.... now im seriously stuck.......


the final script should be a function to which the $string is passed. the $word is gotton from a datbase within the function, so several words could be searched for in the $string. im going to pass the word ID to the div name, but then also the replace ID (?) will need to be added to make sure every div has a unique name.

please help :) with tips, links, or suggestions ALL VERY welcome and much apreciated!!

Thank you





I learned a bit yesterday, today i learned a lot, imagine what i'll learn tomorrow!
 
i'm not sure that i understand the input rules properly. can you give the explanation another go?
 
the input is a $string. This is a long chunk of html code and text.

$word is a word loaded from a database. just a simple word, no html tags or anything.

any $word in the $string should be replaced with $replace:


assuming that $word is "foo"

foo should be replaced with the $replace when:

its just "foo" in the text, with no tags around it.
inside tags that are ok with having an <a href> inside it, like <b>foo</b> or <td>foo</td>

foo should not be replaced if its inside a <script> tag, if foo is a parameter, like: <font face="foo"></font> or if foo is part of a link <a href="foo.html">foo</a>.

and a unique id needs to be asigned to each replace.


as a theory: is it not possible to build an array with all instances of $word with a preg match, using a regexp in the preg to make sure the above rules are followed.

then to replace each match with $replace but giving the div in the $replace the match array key so they are unique. and then puttin each updated match from the preg-found-array back into string on the right location.?





I learned a bit yesterday, today i learned a lot, imagine what i'll learn tomorrow!
 
ok

i'm not great at regex - i've often thought there should be a separate forum for this.

as a result i think this solution is a bit clunky, but i think it works

Code:
<?
$html = <<<HTML
this is a line with foo in it
this is a line with an attribute <span id="foo"> test </span>
this is a link with <a href="foo.html">foo</a>
HTML;
$i =0 ; //create counter

// set up the patterns
$pattern[0] = '/(=["\'].*)(foo)(.*["\'])/';
$uid = uniqid("tmp_");
$replace[0] = "$1".$uid."$3";
$pattern[1] = '/foo/';

//do the replacements
$tmp = preg_replace($pattern[0], $replace[0], $html);
$tmp = preg_replace_callback($pattern[1], "replace", $tmp); //use callback because we need an incrementing id
$output = str_replace($uid, 'foo', $tmp); // using str_replace as it is quicker.

//debug output to the screen
echo "<pre>".htmlentities($output);

function replace($matches){
	global $i;
	$tmp= '<div id="bar'.$i.'"><a href="foo.html">FOO!</a></div>';
	$i++;
	return $tmp;
}
?>
 
Hi jpadie.

Very nice solution with the changing the none-changeables to a tmp var :) and of course a callback... never used it but very usefull in this case :D

not quite sure what the $1 and $3 is for here tho: $replace[0] = "$1".$uid."$3"; ?

and i had some trouble with the globals, maybe because i put all the replacement part in a function too? changed them for sessions which works great but it might be a bit of an overkill situation, if theres a way to get the globals to work, or to pas the vars with the callback function it would be much nicer.


Thank you very much!


here is my full function as it stands:

Code:
function dictionary($string){

	global $db;
	$result = mysql_query("SELECT * FROM cms_dictionary ORDER BY dictionary_word",$db);
	while ($myrow = mysql_fetch_array($result)){
			
		$_SESSION['BC_DDI'] = 0;	
		$_SESSION['BC_DID'] = $myrow['dictionary_id'];
		$_SESSION['BC_DIN'] = $myrow['dictionary_info'];
		
		$_SESSION['BC_DWO'] = $dictionary_word	 	= $myrow['dictionary_word'];
		
		// set up the patterns
		$pattern[0] = '/(=["\'].*)('.$dictionary_word.')(.*["\'])/';
		$uid = uniqid("tmp_");
		$replace[0] = "$1".$uid."$3";
		$pattern[1] = '/'.$dictionary_word.'/';
			
		//do the replacements
		$tmp = preg_replace($pattern[0], $replace[0], $string);	// changes all D-words that should not be changed to a tmp var.
		$tmp = preg_replace_callback($pattern[1], "Dreplace", $tmp); //use callback because we need an incrementing id, changes alowable D-words to the replace.
		$string = str_replace($uid, $dictionary_word, $tmp); // using str_replace as it is quicker. changes all tmp vars back to D-word.
	}

	return $string; 
}



	
function Dreplace($matches){
	//global $di; global $dictionary_id; global $dictionary_word; global $dictionary_info;
	$di = $_SESSION['BC_DDI']; $dictionary_id = $_SESSION['BC_DID']; $dictionary_word = $_SESSION['BC_DWO']; $dictionary_info = $_SESSION['BC_DIN'];
	$dictionary_window		= '<div class="dictionary" id="dDiv'.$dictionary_id.'_'.$di.'" name="dDiv'.$dictionary_id.'_'.$di.'" style="display:none" onMouseOver="showDictionary(\'dDiv'.$dictionary_id.'_'.$di.'\',1)" onMouseOut="showDictionary(\'dDiv'.$dictionary_id.'_'.$di.'\',0)">'.$dictionary_info.'</div>';
	$dictionary_link 		= '<a href="javascript:void(0)" onMouseOver="showDictionary(\'dDiv'.$dictionary_id.'_'.$di.'\',1)">'.$dictionary_word.'</a>';
	$result = $dictionary_window.$dictionary_link;
	$_SESSION['BC_DDI']++;
	return $result;
}



I learned a bit yesterday, today i learned a lot, imagine what i'll learn tomorrow!
 
also 1 more thing, this one did not seem to be replaced:


<font face="Verdana" size="3"><b>foo</b></font>


not sure why not?



I learned a bit yesterday, today i learned a lot, imagine what i'll learn tomorrow!
 
the global keyword only brings global variables into the scope of the relevant function. it does not make a variable that is first declared in a function into a global variable. if you declare the counter in the global scope, that is the simplest way of handling things.

you cannot add further information to the callback .

one way around this is to use a static variable just in the function
Code:
function replace($matches){
    [red]static[/red] $i =0;
    $tmp= '<div id="bar'.$i.'"><a href="foo.html">FOO!</a></div>';
    $i++;
    return $tmp;
}

for the other variables, consider using the $GLOBALS superglobal in place of sessions.

the $1 and $2 are backreferences to the information captured from matching the round brackets. in this case we need to keep the '="' on either side as otherwise the link will get broken.

on my screen the foo in <b>foo</b> gets changed as you would expect.
 
thanks for the $GLOBALS and static tip, thats working great :)

still having problems with that 1 word not being replaced tho..

this is the output after the function, as you see the first one is missing a DIV.


Code:
<table width="682" height="100%" border="0" cellpadding="0" cellspacing="0">
  <tr>
    <td width="682" height="25">&nbsp;</td>
  </tr>
  <tr>
    <td width="682" align="left" valign="top"><font face="Verdana" size="3"><b>CMS</b></font><font class="titel"><br>
      </font><br>
      <font face="Verdana" size="2">Vroeger 
        zaten er aan een website drie prijskaartjes: een eerste voor de aanschaf, 
        een tweede voor de hosting en de registratie en een derde voor het updaten 
        c.q. het periodieke onderhoud. Veelal had daarbij het derde prijskaartje 
        ... onderhoud en updates ... verreweg het grootste aandeel in de totale 
        kosten.<br>
        <br>
        Tegenwoordig echter zijn er <b>
          <div class="dictionary" id="dDiv1_0" name="dDiv1_0" style="display:none" onMouseOver="showDictionary('dDiv1_0',1)" onMouseOut="showDictionary('dDiv1_0',0)">
            <p style="margin-top: 0"> C.M.S. staat voor <b>C</b>ontent <b>M</b>anagement <b>S</b>ystem </p>
          </div>
          <a href="javascript:void(0)" onMouseOver="showDictionary('dDiv1_0',1)" onMouseOut="showDictionary('dDiv1_0',0)">CMS</a></b> websites oftewel <b>C</b>ontent <b>M</b>anagement <b>S</b>ystem websites; websites die door iedereen die 
        met MS Word<sup>&#174;</sup> kan omgaan - en dat kan vrijwel elk van uw 
        medewerkers - kunnen worden updated en onderhouden. Een goede zaak, want 
        bedrijven veranderen ... groeien ... en <b>
          <div class="dictionary" id="dDiv1_1" name="dDiv1_1" style="display:none" onMouseOver="showDictionary('dDiv1_1',1)" onMouseOut="showDictionary('dDiv1_1',0)">
            <p style="margin-top: 0"> C.M.S. staat voor <b>C</b>ontent <b>M</b>anagement <b>S</b>ystem </p>
          </div>
          <a href="javascript:void(0)" onMouseOver="showDictionary('dDiv1_1',1)" onMouseOut="showDictionary('dDiv1_1',0)">CMS</a></b> websites veranderen 
        moeiteloos mee. </font><br></td>
  </tr>
</table>

I learned a bit yesterday, today i learned a lot, imagine what i'll learn tomorrow!
 
have you remembered to add the case insensitivity modifier to the pattern?

Code:
$pattern[0] = '/(=["\'].*)(foo)(.*["\'])/[red]i[/red]';
$pattern[1] = '/foo/[red]i[/red]';
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top