Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Matching unicode with \x{02E2} --NOT!

Status
Not open for further replies.

sen5241b

IS-IT--Management
Sep 27, 2007
199
US
PHP 5.2.5
PCRE 7.3

Trying to match unicode characters for a search algorithm.
I thought REGEXs handled unicode just fine and you leave of the lower u modifier off so it won't treat the string as UTF8?? No?

THE CODE:

<?php
echo mb_detect_encoding($str, $ary);
echo '<br>';
echo '<br>';
echo '<br>';
$TempArray = array();
$iOffset = 0;
echo '<br> try matching 2 char unicode like 007C';
$str1 = 'hey you PQRà hey the end ||| oh';
if (preg_match_all('/\x{007C}\x{007C}\x{007C}/i', $str1, $TempArray, PREG_OFFSET_CAPTURE, $iOffset) > 0)
{ echo '<BR> THIS ALL MATCHES $TempArray[0]=';
var_dump($TempArray[0]); }
else { echo '<br> no match'; }
echo '<br><BR> now try matching 3 char unicode like 02E2';
$str1 = 'you ??? oh';
if (preg_match_all('/\x{02E2}/i', $str1, $TempArray, PREG_OFFSET_CAPTURE, $iOffset) > 0)
{ echo '<BR> THIS ALL MATCHES $TempArray[0]=';
var_dump($TempArray[0]); }
else { echo '<br> no match'; }
?>


THE OUTPUT:
ASCII

try matching 2 char unicode like 007C
THIS ALL MATCHES $TempArray[0]=array(1) { [0]=> array(2) { [0]=> string(3) "|||" [1]=> int(25) } }

now try matching 3 char unicode like 02E2
Warning: preg_match_all() [function.preg-match-all]: Compilation failed: character value in \x{...} sequence is too large at offset 7 in /home/content/s/c/o/scottlmoore111/html/ObsceneClean/words2.php on line 16

no match
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top