Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Querying mySQL data containing apostrophes 1

Status
Not open for further replies.

cmayo

MIS
Apr 23, 2001
159
US
Apologies for the cross-post, but it seems this thread is better asked here than in the MySQL forum where it began.

My client's database has been escaping character fields using PHP's addslashes(), resulting in records with fields containing values like "O\'Henry"

Querying for that data is problematic as those fields aren't returned by LIKE "O'Henry%" or escaped, "O\\\'Henry%"

Hand-munging the escaping, I find that escaping the embedded backslash instead of the apostrophe, i.e. LIKE "O\\\\'Henry%" will fetch the desired records, as will escaping both the backslash and the apostrophe with LIKE "O\\\\\\\'Henry%"

So my problem, I guess, is how to have PHP do that munging on user-entered search strings, i.e. when the user enters "O'Henry" into a form field. If I use addslashes() or mysql_real_escape_string() on the user text, i.e.

Code:
name like '".trim(addslashes($_POST['name']))."%'

the resulting query will read "LIKE O\\\'Henry%" and fail to find the records. If I double-escape the search string, i.e.

Code:
name like '".trim(addslashes(addslashes($_POST['name'])))."%'

I'll get a query containing

Code:
name like 'Bob\\\\\\\'s Burgers%'

which does find the records, but that's really ugly, and there's no telling how many other queries such awkward double-escaping would break.

I do realize that the data has been improperly written to the database and that the apostrophes shouldn't have been escaped with backslashes, but again, I'm working with an existing database behind an existing CRM front end, and fixing the data and all the app's queries would be so involved that some kind of PHP workaround would be much preferable.

Any ideas?
 
Reference thread436-1546156 for what already has been suggested regarding this issue.




----------------------------------
Phil AKA Vacunita
----------------------------------
Ignorance is not necessarily Bliss, case in point:
Unknown has caused an Unknown Error on Unknown and must be shutdown to prevent damage to Unknown.
 
the thread seems stalled or paused in the mysql forum and the answer is in php, i'd bet.

@OP you most probably have magic_quotes_gpc switched on, or magic_quotes_runtime(). burn in hell for such travesty. turn them off at once!

to test the hypothesis that doing so will solve the problem, add the following to the beginning of your script

Code:
function undoTheDamage(){
  magic_quotes_runtim(FALSE);
  if (get_magic_quotes_gpc()){
   array_walk_recursive($_GET, 'stripslashes');
   array_walk_recursive($_POST,'stripslashes');
   array_walk_recursive($_FILES, 'stripslashes');
   array_walk_recursive($_COOKIES, 'stripslashes);
  }
}
undoTheDamage();

i cannot remember whether you also have to do $_REQUEST or whether this is an amalgam of POST, GET, COOKIES. doubtless you can test.

but basic rule - never never use magic_quotes of any variety.
 
jpadie, great to see you! Hope you're doing well.

magic_quotes_gpc is indeed on, magic_quotes_runtime is off, but I think the real problem is that none of the system's queries are unescaping the $_REQUEST vars before escaping them for PHP, resulting in the insert queries escaping the HTTP escape character when inserting:

Code:
REQUEST
Array
(
    [search] => O\'Henry
    [PHPSESSID] => 233d3f20ff1ec4587785c040bb0cb7c9
)

Code:
$query = "INSERT INTO job_sites (name) VALUES ('".AddSlashes($_REQUEST['search'])."')";

Which inserts:

Code:
mysql> select job_site_id, name from job_sites where job_site_id = 1446;
+-------------+----------+
| job_site_id | name     |
+-------------+----------+
|        1446 | O\'Henry |
+-------------+----------+
1 row in set (0.00 sec)

In which case, it doesn't look like cleaning up the $_REQUEST vars to be used as search strings is still going to return any hits on "O\'Henry"
 
Hi

so ... turn magic_quotes OFF!!

the code i posted above should undo the magic_quotes damage too

Code:
function undoTheDamage(){
  magic_quotes_runtim(FALSE);
  if (get_magic_quotes_gpc()){
   array_walk_recursive($_REQUEST, 'stripslashes');
   array_walk_recursive($_GET, 'stripslashes');
   array_walk_recursive($_POST,'stripslashes');
   array_walk_recursive($_FILES, 'stripslashes');
   array_walk_recursive($_COOKIES, 'stripslashes);
  }
}
undoTheDamage();
 
I did try the code you provided (it took a little adjustment to get it to run to this point)

Code:
function undoTheDamage(){
  magic_quotes_runtime(FALSE);
  if (get_magic_quotes_gpc()){
   if(is_array($_REQUEST)) array_walk_recursive($_REQUEST, 'stripslashes');
   if(is_array($_GET)) array_walk_recursive($_GET, 'stripslashes');
   if(is_array($_POST)) array_walk_recursive($_POST,'stripslashes');
   if(is_array($_FILES)) array_walk_recursive($_FILES, 'stripslashes');
   if(is_array($_COOKIES)) array_walk_recursive($_COOKIES, 'stripslashes');
  }
}
undoTheDamage();

but even then, I get an error on the call to stripslashes()

Code:
Warning: Wrong parameter count for stripslashes() in /home/redfe0/public_html/tandem/tdom/job_site_search_service.php on line 11

Since I didn't see how cleaning up the query string would return hits on the improperly escaped data in the database, I didn't pursue the function farther. I guess I'll work on getting the function to run and see what happens.
 
the easiest solution is to turn off magic_quotes. are you saying that you are not able to do this?
 
No, it's just that my primary concern was how to deal with an existing database full of improperly escaped data, not so much how to fix things going forward, and I didn't know how turning off magic quotes would affect existing queries.

After a just a bit of testing, it doesn't seem as though turning magic quotes off will mess up the existing queries... no double-escaping, no unescaped data being written that should be escaped, no MySQL errors, and strings containing apostrophes are being correctly written to the database with no modification to existing queries, making them quite searchable.

So it looks like you're absolutely correct that turning off magic quotes is the answer, but if I do that, I'm still stuck with an database full of improprly escaped apostrophes.

If I'm going to do this, I'll have to code something up to iterate through all the tables and text/character/varchar fields and strip the existing backslashes before I throw the switch and turned magic quotes off.
 
i believe that the following iterator should work PROVIDING THAT YOU HAVE MAGIC_QUOTES TURNED OFF

Health warning: i have not tested this at all. not even run it.
you need to have create databases privileges to use this script. if you do not then i recommend adapting it to create new tables as opposed to identical tables in a new database. a new database just seemed 'safer' and easier to switch to.

Code:
<?php
$host = '';
$uname = '';
$pwd = '';
$database = '';

$newDatabase = 'new_'.$database;

//connect to first database
$c1 = mysql_connect($host, $uname, $pwd or die (mysql_error()));
mysql_select_db($database, $c1);

//create second connection
$c2 = mysql_connect($host, $uname, $pwd,$pwd or die (mysql_error()));

//create new database
$sql = 'CREATE DATABASE IF NOT EXISTS ' . $newDatabase . "CHARACTER SET=UTF-8";
mysql_query($sql) or die(mysql_error());

//select new database with new connection
mysql_select_db($newDatabase, $c2);

//get tables from database
$tables = array();


$sql = 'SHOW TABLES';
$result = mysql_query($sql, $c1) or die(mysql_error($c1));
while ($row = mysql_fetch_array($result, MYSQL_NUM)){
	$tables[] = $row[0];
}

$alterTables = array(); //holding variable for the necessary db alterations.
//get create statements for each table
foreach ($tables as $table){
	$creates = array();
	$sql = "SHOW CREATE TABLE $table";
	$result = mysql_query($sql, $c1) or die (mysql_error($c1));
	while ($row = mysql_fetch_array($result, MYSQL_NUM)){
		//create new table
		//$row[1] has the create statement.  we need to undo the autoincrement and capture the column
		$pattern = '/[,\(][\s\\n]*(([a-z`]+).?*auto_increment\s*)[,\)]/imx';
		$curPattern = '/auto_increment\s*=\s*([\d]+)/imx';
		if (preg_match($pattern, $row[1], $match)){
			$alterTables[$table]['name'] = $match[2];
			$alterTables[$table]['definition'] = $match[1];
			if (preg_match ($curPattern, $row[1], $curMatch)){
				$alterTables[$table]['current'] = $curMatch[1];
			}
			$createSQL = preg_replace ($curMatch, '', $row[1]); //get rid of the table def for current sequence
			$createSQL = str_ireplace('AUTO_INCREMENT', ' ', $createSQL); //get rid of inline auto_increment
		} else {
			$createSQL = $row[1];
		}
		mysql_query($createSQL, $c2) or die (mysql_error($c2));
	}
}

//reiterate the tables to transfer the data
foreach ($tables as $table){
	$sql = "Select *  from $table";
	$result = mysql_query($sql) or die(mysql_error());
	$insertSQL = "Insert into $table set (";
	while ($row = mysql_fetch_assoc($result)){
		$flds = array();
		$query  = '';
		foreach ($row as $field=>$val){
			$flds[] = "$field='" . mysql_real_escape_string(stripslashes($val), $c2) . "'";
		}
		$query = $insertSQL . implode (',', $fields) . ')';
		mysql_query($query, $c2) or die (mysql_error());
	}
}

//run alter tables to reset autonumbering
foreach ($alterTables as $table=>$info){
	$sql = "alter table $table change $info[field] $info[definition]";
	mysql_query($sql, $c2) or die (mysql_error($c2));
	if (isset($alterTables[$table]['current'])){
		$sql = "ALTER TABLE  $table AUTO_INCREMENT = $alterTables[current]";
		mysql_query($sql, $c2) or die (mysql_error($c2));
	}
}

?>
 
That's awesome, thank you!

Quick question, though. The new database $c2 is selected before pulling the list of tables, so the table list is empty when SHOW TABLES is run.

Code:
//create second connection
$c2 = mysql_connect($host,$uname,$pwd);

//select new database with new connection
mysql_select_db($newDatabase, $c2);

//get tables from database
$tables = array();

$sql = 'SHOW TABLES';
$result = mysql_query($sql, $c1) or die(mysql_error($c1));

print "ROWS:" . mysql_num_rows($result) . "<br />\n";

while ($row = mysql_fetch_array($result, MYSQL_NUM)){
    $tables[] = $row[0];
}

If I comment out the mysql_select_db($newDatabase, $c2) statement, the $tables[] array does populate from $c1.

I'm unaccustomed to handling multiple connections so I'm really not sure what the correction should be. Can multiple databases be selected simultaneously (I assume they can or you wouldn't have written it that way), or should each database be selected prior to each query?
 
$c1 is the connection to the existing database and show tables is run against the $c1 connection. the show tables query should come back populated.

the connection function for the second connection should have been
Code:
$c2 = mysql_connect($host, $uname, $pwd, true) or die (mysql_error()));

the appended TRUE at the end is important. i think that this will fix it.
 
Thanks for the correction, The show tables query is fine now, but I'm getting a regex error

Code:
nothing to repeat at offset 24 in /home/public_html/tdom/fix_quotes.php

in the last line of

Code:
foreach ($tables as $table){
    $creates = array();
    $sql = "SHOW CREATE TABLE $table";
    $result = mysql_query($sql, $c1) or die (mysql_error($c1));
    while ($row = mysql_fetch_array($result, MYSQL_NUM)){
        //create new table
        //$row[1] has the create statement.  we need to undo the autoincrement and capture the column
        $pattern = '/[,\(][\s\\n]*(([a-z`]+).?*auto_increment\s*)[,\)]/imx';
        $curPattern = '/auto_increment\s*=\s*([\d]+)/imx';
        if (preg_match($pattern, $row[1], $match)){

To say that regexes aren't my strong suit is the understatement of the century... any idea why the error?
 
if that's the last line could you change this line:

Code:
$pattern = '/[,\(][\s\\n]*(([a-z`]+).?*auto_increment\s*)[,\)]/imx';

to

Code:
$pattern = '/[,\(][\s\\n]*(([a-z`]+).*?auto_increment\s*)[,\)]/imx';
 
Thanks again, but there's apparently another problem with the regex and I'm tempted to bail on this fix since I just don't know enough about regular expressions to get this code to run. I'm game if you want to hang in and help me debug the regex block, but I'm also okay blowing this off if you're getting tired of messing with it.

Code:
Warning: preg_replace() [function.preg-replace]: Delimiter must not be alphanumeric or backslash in /home/redfe0/public_html/tandem/tdom/fix_quotes2.php on line 64
Query was empty

where line 64 is the preg_replace statement below ($createSQL = preg_replace ($curMatch, '', $row[1]); //get rid of the table def for current sequence).

Code:
$alterTables = array(); //holding variable for the necessary db alterations.
//get create statements for each table
foreach ($tables as $table){
    $creates = array();
    $sql = "SHOW CREATE TABLE $table";
    $result = mysql_query($sql, $c1) or die (mysql_error($c1));
    while ($row = mysql_fetch_array($result, MYSQL_NUM)){
        //create new table
        //$row[1] has the create statement.  we need to undo the autoincrement and capture the column
        //$pattern = '/[,\(][\s\\n]*(([a-z`]+).?*auto_increment\s*)[,\)]/imx';
        $pattern = '/[,\(][\s\\n]*(([a-z`]+).*?auto_increment\s*)[,\)]/imx';
        $curPattern = '/auto_increment\s*=\s*([\d]+)/imx';
        if (preg_match($pattern, $row[1], $match)){
            $alterTables[$table]['name'] = $match[2];
            $alterTables[$table]['definition'] = $match[1];
            if (preg_match ($curPattern, $row[1], $curMatch)){
                $alterTables[$table]['current'] = $curMatch[1];
            }
            $createSQL = preg_replace ($curMatch, '', $row[1]); //get rid of the table def for current sequence
            $createSQL = str_ireplace('AUTO_INCREMENT', ' ', $createSQL); //get rid of inline auto_increment
        } else {
            $createSQL = $row[1];
        }
        mysql_query($createSQL, $c2) or die (mysql_error($c2));
    }
}
 
that's my own fault. not thinking when i wrote the code, i think! the reference to curMatch should have been curPattern.

Code:
<?php
$alterTables = array(); //holding variable for the necessary db alterations.
//get create statements for each table
foreach ($tables as $table){
    $creates = array();
    $sql = "SHOW CREATE TABLE $table";
    $result = mysql_query($sql, $c1) or die (mysql_error($c1));
    while ($row = mysql_fetch_array($result, MYSQL_NUM)){
        //create new table
        //$row[1] has the create statement.  we need to undo the autoincrement and capture the column
        //$pattern = '/[,\(][\s\\n]*(([a-z`]+).?*auto_increment\s*)[,\)]/imx';
        $pattern = '/[,\(][\s\\n]*(([a-z`]+).*?auto_increment\s*)[,\)]/imx';
        $curPattern = '/auto_increment\s*=\s*([\d]+)/imx';
        if (preg_match($pattern, $row[1], $match)){
            $alterTables[$table]['name'] = $match[2];
            $alterTables[$table]['definition'] = $match[1];
            if (preg_match ($curPattern, $row[1], $curMatch)){
                $alterTables[$table]['current'] = $curMatch[1];
            }
            $createSQL = preg_replace ($curPattern, '', $row[1]); //get rid of the table def for current sequence
            $createSQL = str_ireplace('AUTO_INCREMENT', ' ', $createSQL); //get rid of inline auto_increment
        } else {
            $createSQL = $row[1];
        }
        mysql_query($createSQL, $c2) or die (mysql_error($c2));
    }
}
?>
 
Jpadie, thanks for the extra push. After handling another couple minor bugs, the code ran well and created a new set of tables without the improperly escaped apostrophes.

The corrected, running code is below, with a couple changes:

1) I wanted to run the script from the shell but my host has the PHP5 binary jailed so I had to add a substitute 'str_ireplace' function.

2) My host denies database creation to shell accounts, so I had to create the new database within phpMyAdmin and explicitly declare the name of the new database in the code.

Thanks again, looking forward to cleaning up my client's database and continuing on with magic quotes finally turned off.

Code:
#!/usr/local/bin/php -q
<?php

$host="host";
$uname="username";
$pwd="password";
$database="database";

$newDatabase = 'new_database';

if(!function_exists('str_ireplace')){
  function str_ireplace($search,$replace,$subject){
    $token = chr(1);
    $haystack = strtolower($subject);
    $needle = strtolower($search);
    while (($pos=strpos($haystack,$needle))!==FALSE){
      $subject = substr_replace($subject,$token,$pos,strlen($search));
      $haystack = substr_replace($haystack,$token,$pos,strlen($search));
    }
    $subject = str_replace($token,$replace,$subject);
    return $subject;
  }
}

//connect to first database
$c1 = mysql_connect($host, $uname, $pwd) or die(mysql_error());
mysql_select_db($database, $c1);

//create second connection
$c2 = mysql_connect($host, $uname, $pwd, true) or die(mysql_error());

//create new database
$sql = 'CREATE DATABASE IF NOT EXISTS ' . $newDatabase . " CHARACTER SET=UTF8";
mysql_query($sql) or die(mysql_error());

//select new database with new connection
mysql_select_db($newDatabase, $c2);

//get tables from database
$tables = array();

$sql = 'SHOW TABLES';
$result = mysql_query($sql, $c1) or die(mysql_error($c1));
while ($row = mysql_fetch_array($result, MYSQL_NUM)){
    $tables[] = $row[0];
}

$alterTables = array(); //holding variable for the necessary db alterations.
//get create statements for each table
foreach ($tables as $table){
    $creates = array();
    $sql = "SHOW CREATE TABLE $table";
    $result = mysql_query($sql, $c1) or die (mysql_error($c1));
    while ($row = mysql_fetch_array($result, MYSQL_NUM)){
        //create new table
        //$row[1] has the create statement.  we need to undo the autoincrement and capture the column
        $pattern = '/[,\(][\s\\n]*(([a-z`]+).*?auto_increment\s*)[,\)]/imx';
        $curPattern = '/auto_increment\s*=\s*([\d]+)/imx';
        if (preg_match($pattern, $row[1], $match)){
            $alterTables[$table]['name'] = $match[2];
            $alterTables[$table]['definition'] = $match[1];
            if (preg_match ($curPattern, $row[1], $curMatch)){
                $alterTables[$table]['current'] = $curMatch[1];
            }

            $createSQL = preg_replace ($curPattern, '', $row[1]); //get rid of the table def for current sequence
            $createSQL = str_ireplace('AUTO_INCREMENT', ' ', $createSQL); //get rid of inline auto_increment

             mysql_query($createSQL, $c2) or die (mysql_error($c2));

        } else {

            $createSQL = $row[1];
            mysql_query($createSQL, $c2) or die (mysql_error($c2));

        }
    }
}

//reiterate the tables to transfer the data
foreach ($tables as $table){
    $sql = "Select *  from $table";
    $result = mysql_query($sql, $c1) or die(mysql_error());

    $insertSQL = "Insert into $table set ";

    while ($row = mysql_fetch_assoc($result)){
        $flds = array();
        $query  = '';
        foreach ($row as $field=>$val){
            $flds[] = "$field='" . mysql_real_escape_string(stripslashes($val), $c2) . "'";
        }
        $query = $insertSQL . implode (',', $flds);
        mysql_query($query, $c2) or die (mysql_error($c2));
    }
}

//run alter tables to reset autonumbering
foreach ($alterTables as $table=>$info){
    $sql = "alter table $table change $info[field] $info[definition]";
    mysql_query($sql, $c2) or die (mysql_error($c2));
    if (isset($alterTables[$table]['current'])){
        $sql = "ALTER TABLE $table AUTO_INCREMENT = $alterTables[current]";
        mysql_query($sql, $c2) or die (mysql_error($c2));
    }
}

?>
 
Looks like I spoke too soon. On closer examination, the table copy seems to have blown out some of the blob data, but there are only a few tables involved there and I can move those by hand - just wanted to post that caveat for anyone else who may wind up using the iteration code.

Thanks again!
 
Can you clarify? I'd like to get the code working smoothly for other people to use. Was it a memory issue?
 
One thing is that the auto_increments don't seem to be getting added back into the new tables:


Original table 1:

Code:
--
-- Table structure for table `auth_control`
--

DROP TABLE IF EXISTS `auth_control`;
SET @saved_cs_client     = @@character_set_client;
SET character_set_client = utf8;
CREATE TABLE `auth_control` (
  `auth_id` int(11) NOT NULL auto_increment,
  `type` varchar(32) default NULL,
  `level` varchar(32) default NULL,
  `name` varchar(32) default NULL,
  `page` varchar(255) default NULL,
  PRIMARY KEY  (`auth_id`)
) ENGINE=MyISAM AUTO_INCREMENT=179 DEFAULT CHARSET=latin1;
SET character_set_client = @saved_cs_client;

New table 1:

Code:
--
-- Table structure for table `auth_control`
--

DROP TABLE IF EXISTS `auth_control`;
SET @saved_cs_client     = @@character_set_client;
SET character_set_client = utf8;
CREATE TABLE `auth_control` (
  `auth_id` int(11) NOT NULL,
  `type` varchar(32) default NULL,
  `level` varchar(32) default NULL,
  `name` varchar(32) default NULL,
  `page` varchar(255) default NULL,
  PRIMARY KEY  (`auth_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
SET character_set_client = @saved_cs_client;

Original table 2:

Code:
--
-- Table structure for table `billable_rates`
--

DROP TABLE IF EXISTS `billable_rates`;
SET @saved_cs_client     = @@character_set_client;
SET character_set_client = utf8;
CREATE TABLE `billable_rates` (
  `rate_id` int(11) NOT NULL auto_increment,
  `customer_id` int(11) NOT NULL default '0',
  `location_id` int(11) default NULL,
  `location` varchar(20) NOT NULL default '',
  `cabling` varchar(11) default NULL,
  `telephony` varchar(11) default NULL,
  `it_computing` varchar(11) default NULL,
  `nationals` varchar(11) default NULL,
  PRIMARY KEY  (`rate_id`)
) ENGINE=MyISAM AUTO_INCREMENT=318 DEFAULT CHARSET=latin1;
SET character_set_client = @saved_cs_client;

New Table 2:

Code:
--
-- Table structure for table `billable_rates`
--

DROP TABLE IF EXISTS `billable_rates`;
SET @saved_cs_client     = @@character_set_client;
SET character_set_client = utf8;
CREATE TABLE `billable_rates` (
  `rate_id` int(11) NOT NULL,
  `customer_id` int(11) NOT NULL default '0',
  `location_id` int(11) default NULL,
  `location` varchar(20) NOT NULL default '',
  `cabling` varchar(11) default NULL,
  `telephony` varchar(11) default NULL,
  `it_computing` varchar(11) default NULL,
  `nationals` varchar(11) default NULL,
  PRIMARY KEY  (`rate_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
SET character_set_client = @saved_cs_client;

As far as the blob data goes, the only thing I see is that the script seems to be stripping double backslashes from binary data. Here area couple snippets from GIF files in fields defined as mediumblob:

Code:
OLD: ×æ8o \\A?B<µQE€
NEW: ×æ8o A?B<µQE€

OLD: [?\\??:??@v^+\\??K
NEW: [???:??@v^+??K
 
Following up:

Looks like the reason the auto_increment values aren't being applied to the new table in the first bit of SQL of the final block:

Code:
 $sql = "alter table $table change $info[field] $info[definition]";

At this point, $info doesn't hold a 'field' value. It does hold a 'name' value (below)

Code:
Array
(
    [name] => `auth
    [definition] => `auth_id` int(11) NOT NULL auto_increment
    [current] => 179
)

but the 'name' value isn't being properly isolated and set. It should be set to 'auth_id', not 'auth in the preg_match statement:


Code:
 $pattern = '/[,\(][\s\\n]*(([a-z`]+).*?auto_increment\s*)[,\)]/imx';
        $curPattern = '/auto_increment\s*=\s*([\d]+)/imx';
        if (preg_match($pattern, $row[1], $match)){



Array
(
    [0] => (
  `auth_id` int(11) NOT NULL auto_increment,
    [1] => `auth_id` int(11) NOT NULL auto_increment
    [2] => `auth
)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top