PHP 5.2.5
PCRE 7.3
Trying to match unicode characters for a search algorithm.
I thought REGEXs handled unicode just fine and you leave of the lower u modifier off so it won't treat the string as UTF8?? No?
THE CODE:
<?php
echo mb_detect_encoding($str, $ary);
echo '<br>';
echo '<br>';
echo '<br>';
$TempArray = array();
$iOffset = 0;
echo '<br> try matching 2 char unicode like 007C';
$str1 = 'hey you PQRà hey the end ||| oh';
if (preg_match_all('/\x{007C}\x{007C}\x{007C}/i', $str1, $TempArray, PREG_OFFSET_CAPTURE, $iOffset) > 0)
{ echo '<BR> THIS ALL MATCHES $TempArray[0]=';
var_dump($TempArray[0]); }
else { echo '<br> no match'; }
echo '<br><BR> now try matching 3 char unicode like 02E2';
$str1 = 'you ??? oh';
if (preg_match_all('/\x{02E2}/i', $str1, $TempArray, PREG_OFFSET_CAPTURE, $iOffset) > 0)
{ echo '<BR> THIS ALL MATCHES $TempArray[0]=';
var_dump($TempArray[0]); }
else { echo '<br> no match'; }
?>
THE OUTPUT:
ASCII
try matching 2 char unicode like 007C
THIS ALL MATCHES $TempArray[0]=array(1) { [0]=> array(2) { [0]=> string(3) "|||" [1]=> int(25) } }
now try matching 3 char unicode like 02E2
Warning: preg_match_all() [function.preg-match-all]: Compilation failed: character value in \x{...} sequence is too large at offset 7 in /home/content/s/c/o/scottlmoore111/html/ObsceneClean/words2.php on line 16
no match
PCRE 7.3
Trying to match unicode characters for a search algorithm.
I thought REGEXs handled unicode just fine and you leave of the lower u modifier off so it won't treat the string as UTF8?? No?
THE CODE:
<?php
echo mb_detect_encoding($str, $ary);
echo '<br>';
echo '<br>';
echo '<br>';
$TempArray = array();
$iOffset = 0;
echo '<br> try matching 2 char unicode like 007C';
$str1 = 'hey you PQRà hey the end ||| oh';
if (preg_match_all('/\x{007C}\x{007C}\x{007C}/i', $str1, $TempArray, PREG_OFFSET_CAPTURE, $iOffset) > 0)
{ echo '<BR> THIS ALL MATCHES $TempArray[0]=';
var_dump($TempArray[0]); }
else { echo '<br> no match'; }
echo '<br><BR> now try matching 3 char unicode like 02E2';
$str1 = 'you ??? oh';
if (preg_match_all('/\x{02E2}/i', $str1, $TempArray, PREG_OFFSET_CAPTURE, $iOffset) > 0)
{ echo '<BR> THIS ALL MATCHES $TempArray[0]=';
var_dump($TempArray[0]); }
else { echo '<br> no match'; }
?>
THE OUTPUT:
ASCII
try matching 2 char unicode like 007C
THIS ALL MATCHES $TempArray[0]=array(1) { [0]=> array(2) { [0]=> string(3) "|||" [1]=> int(25) } }
now try matching 3 char unicode like 02E2
Warning: preg_match_all() [function.preg-match-all]: Compilation failed: character value in \x{...} sequence is too large at offset 7 in /home/content/s/c/o/scottlmoore111/html/ObsceneClean/words2.php on line 16
no match