Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help with parsing a mysql insert query in php 1

Status
Not open for further replies.

jpadie

Technical User
Nov 24, 2003
10,094
FR
i am under the weather at the moment and unable to think my way out of a paper bag. so apologies for burdening people with the following question. All help gratefully received, however!

imagine a typical multi value mysql insert similar to the following:

Code:
Insert into table (field1, field2, field3) values ('value1', 'value2', 'value3'), ('a note about a php function, called is_function(), is sometimes useful', 'value2', 'value3'), (.......)

assumptions
1. there is no limit to the number of inserts in the sql
2. there is no restriction on the text that is contained in each value within each insert, although you can assume that the values are escaped where necessary.

I need to be able to separate out each of the sets of values for each insert. so that I want, for example, an array of inserts

Code:
0=> "('value1', 'value2', 'value3')",
1=> "('a note about a php function, called is_function(), is sometimes useful', 'value2', 'value3')", 
2=> (.......) etc

the various regex that i have written to solve this problem have worked for a while but recently barfed when dealing with code inside one of the value pairs. I was splitting on '),' as a pattern (actually a bit more complex than that but the point is clear) and that was obviously matching bits of the code too.

so I need a solution that will split the elements for me. this can either be a regex or a parser, i don't mind. I have a feeling that a regex is going to be very difficult indeed and that a parser might be the best bet. But, as I said, I don't trust my judgment at the moment and so am looking to you for ideas and assistance.

For those interested in the why of this request: I am the author and maintainer of a plugin for wordpress that allows users to switch from mysql to other database backends. The main alternative is sqlite. sqlite does not support compound inserts and so I need to split the query into multiples or convert it into a single union of selects. In either case I need to be able to dissect the values clause.

Justin
 
Hi

Assuming that :
[ul]
[li]the text is valid MySQL statement[/li]
[li]all values are enclosed in single quotes ( no 3.14 )[/li]
[li]no value contains escaped singe quotes ( no 'it''s bad, it\' really bad' )[/li]
[li]no value results from evaluation ( no concat('My','SQL') )[/li]
[/ul]
this would be enough :
PHP:
[COLOR=darkgoldenrod]preg_match_all[/color][teal]([/teal][green][i]"/\((?:\s*'.*?'\s*,?\s*)+\)/"[/i][/green][teal],[/teal][navy]$text[/navy][teal],[/teal][navy]$match[/navy][teal]);[/teal]
By the way, how are those [tt]insert[/tt]s generated ? Would not be easier to interfere with their creator instead ?

Feherke.
 
Regrettably it is not practical to interfere with the SQL creation. I need to be able to intercept any query and rewrite it on the fly. Anythig else is too burdensome to support when you're talking about an app the size of wordpress, the release cycle and the number of pkuins that would also need to be catered for.

Re your assumptions: you cannot assume that single quotes will not be escaped. The opposite will be true
you cannot assume that all values will be strings or equivalents. Numerical data is also submitted in the values as are null values. Bullets one and four hold valid though although non combination functions like now() are used in the values.

Thanks for taking a look.
 
Hi

Ah. Tricky. Well, then I would write a parser like this :
PHP:
[navy]$part[/navy][teal]=[/teal][b]array[/b][teal]();[/teal] [navy]$nrpart[/navy][teal]=-[/teal][purple]1[/purple][teal];[/teal]
[navy]$inquote[/navy][teal]=[/teal]false[teal];[/teal] [navy]$inparan[/navy][teal]=[/teal]false[teal];[/teal]
[b]for[/b] [teal]([/teal][navy]$i[/navy][teal]=[/teal][purple]0[/purple][teal];[/teal][navy]$i[/navy][teal]<[/teal][COLOR=darkgoldenrod]strlen[/color][teal]([/teal][navy]$text[/navy][teal]);[/teal][navy]$i[/navy][teal]++)[/teal] [teal]{[/teal]
  [b]if[/b] [teal]([/teal][navy]$text[/navy][teal][[/teal][navy]$i[/navy][teal]]==[/teal][green][i]'\\'[/i][/green][teal])[/teal] [teal]{[/teal] [navy]$i[/navy][teal]++;[/teal] [b]continue[/b][teal];[/teal] [teal]}[/teal]
  [b]elseif[/b] [teal]([/teal][navy]$text[/navy][teal][[/teal][navy]$i[/navy][teal]]==[/teal][green][i]"'"[/i][/green][teal])[/teal] [navy]$inquote[/navy][teal]=![/teal][navy]$inquote[/navy][teal];[/teal]
  [b]elseif[/b] [teal]([/teal][navy]$text[/navy][teal][[/teal][navy]$i[/navy][teal]]==[/teal][green][i]'('[/i][/green] [teal]&&[/teal] [teal]![/teal][navy]$inquote[/navy][teal])[/teal] [teal]{[/teal] [navy]$inparan[/navy][teal]=[/teal]true[teal];[/teal] [navy]$part[/navy][teal][++[/teal][navy]$nrpart[/navy][teal]]=[/teal][green][i]''[/i][/green][teal];[/teal] [teal]}[/teal]
  [b]if[/b] [teal]([/teal][navy]$inparan[/navy][teal])[/teal] [navy]$part[/navy][teal][[/teal][navy]$nrpart[/navy][teal]].=[/teal][navy]$text[/navy][teal][[/teal][navy]$i[/navy][teal]];[/teal]
  [b]if[/b] [teal]([/teal][navy]$text[/navy][teal][[/teal][navy]$i[/navy][teal]]==[/teal][green][i]')'[/i][/green] [teal]&&[/teal] [teal]![/teal][navy]$inquote[/navy][teal])[/teal] [navy]$inparan[/navy][teal]=[/teal]false[teal];[/teal]
[teal]}[/teal]
Notes :
[ul]
[li]this will place the field list in element 0, values starts from element 1[/li]
[li]I can not imagine less optimal implementation than this[/li]
[/ul]


Feherke.
 
it is reassuring that you have come to the same conclusion as me that a regex was not going to work with the degree of variability allowed by mysql. i had also concluded that i had to build a brace counter and matcher. i doubt whether mine would have been as compact as yours though. i will test this against the class of likely data and report back. probably later today, if not tomorrow.

thanks feherke.
 
hi

i used this implementation from feherke
Code:
<?php
$query = <<<HTML
('field1', 'field2', 'field3'), ('field4','field5','field6'), (now(), 33, 'information about a function(), is often useful')
HTML;
getValuePairs($query);

function getValuePairs($text){
	$part=array(); $nrpart=-1;
	$inquote=false; $inparan=false;
	for ($i=0;$i<strlen($text);$i++) {
		if ($text[$i]=='\\') { $i++; continue; }
		elseif ($text[$i]=="'") $inquote=!$inquote;
		elseif ($text[$i]=='(' && !$inquote) { $inparan=true; $part[++$nrpart]=''; }
		if ($inparan) $part[$nrpart].=$text[$i];
		if ($text[$i]==')' && !$inquote) $inparan=false;
	}
var_dump($part);
}
?>
and received this result
Code:
array
  0 => string '('field1', 'field2', 'field3')' (length=30)
  1 => string '('field4','field5','field6')' (length=28)
  2 => string '(now' (length=4)
  3 => string '()' (length=2)

this is not what i was looking for as it misses off a chunk of data.

as i am only bothered about the outer pairs of matching brackets i am wondering whether a more elegant solution would be to count braces. something like this as a start

Code:
function getValuePairs($string){
	$a = str_split($string, 1);
	$starts = array_keys($a, '(');
	$ends = array_keys($a, ')');
	var_dump($starts, $ends);
}
p.code]
this would give us the positions of every starting brace and closing brace and should allow us to match the braces.  i have not got so far as to do the matching code as i am torn between this method and the lexer method proposed by feherke.
i also have horrible feelings about the possibility of a value like this
[code]
('field1', 'field2', 'field 3 contains only a closing bracket )')
really messing things up. which is why i started thinking about matching brackets.

i need to give this some more thought. clearly!
 
Hi

Sorry, I misunderstood you regarding the presence of functions. Anyway, counting parenthesis is not much to change. Plus a correction, because I forgot to keep the backslash escaped characters.
PHP:
[navy]$text[/navy][teal]=<<<[/teal]ENDOFTEXT
insert into table (field1, field2, field3) values('value1', 'v)a)l)u)e)2', 'value3'),
('a no\\te ab\\\\out a php functio''n, calle\'d is_function(), is sometime''s usefu\'l', 'value2', 'value3'), 
(a,b,c),
('d''e','f\'g','h''i\'j'),
(now(), 33, 'information about a function(), is often useful'),
('f(i(e(l(d(1', concat('y',upper('m'),upper('sql')), 'field 3 contains only a closing bracket )')
;
ENDOFTEXT[teal];[/teal]
[navy]$part[/navy][teal]=[/teal][b]array[/b][teal]();[/teal] [navy]$nrpart[/navy][teal]=-[/teal][purple]1[/purple][teal];[/teal]
[navy]$inquote[/navy][teal]=[/teal]false[teal];[/teal] [navy]$inparan[/navy][teal]=[/teal][purple][highlight]0[/highlight][/purple][teal];[/teal]
[b]for[/b] [teal]([/teal][navy]$i[/navy][teal]=[/teal][purple]0[/purple][teal];[/teal][navy]$i[/navy][teal]<[/teal][COLOR=darkgoldenrod]strlen[/color][teal]([/teal][navy]$text[/navy][teal]);[/teal][navy]$i[/navy][teal]++)[/teal] [teal]{[/teal]
  [b]if[/b] [teal]([/teal][navy]$text[/navy][teal][[/teal][navy]$i[/navy][teal]]==[/teal][green][i]'\\'[/i][/green][teal])[/teal] [teal]{[/teal] [highlight pink][navy]$part[/navy][teal][[/teal][navy]$nrpart[/navy][teal]].=[/teal][COLOR=darkgoldenrod]substr[/color][teal]([/teal][navy]$text[/navy][teal],[/teal][/highlight][navy]$i[/navy][teal]++[/teal][highlight pink][teal],[/teal][purple]2[/purple][teal])[/teal][/highlight][teal];[/teal] [b]continue[/b][teal];[/teal] [teal]}[/teal]
  [b]elseif[/b] [teal]([/teal][navy]$text[/navy][teal][[/teal][navy]$i[/navy][teal]]==[/teal][green][i]"'"[/i][/green][teal])[/teal] [navy]$inquote[/navy][teal]=![/teal][navy]$inquote[/navy][teal];[/teal]
  [b]elseif[/b] [teal]([/teal][navy]$text[/navy][teal][[/teal][navy]$i[/navy][teal]]==[/teal][green][i]'('[/i][/green] [teal]&&[/teal] [teal]![/teal][navy]$inquote[/navy][teal])[/teal] [teal]{[/teal] [highlight][b]if[/b] [teal](![/teal][navy]$inparan[/navy][teal])[/teal][/highlight] [navy]$part[/navy][teal][++[/teal][navy]$nrpart[/navy][teal]]=[/teal][green][i]''[/i][/green][teal];[/teal] [navy]$inparan[/navy][teal][highlight]++[/highlight];[/teal] [teal]}[/teal]
  [b]if[/b] [teal]([/teal][navy]$inparan[/navy][teal])[/teal] [navy]$part[/navy][teal][[/teal][navy]$nrpart[/navy][teal]].=[/teal][navy]$text[/navy][teal][[/teal][navy]$i[/navy][teal]];[/teal]
  [b]if[/b] [teal]([/teal][navy]$text[/navy][teal][[/teal][navy]$i[/navy][teal]]==[/teal][green][i]')'[/i][/green] [teal]&&[/teal] [teal]![/teal][navy]$inquote[/navy][teal])[/teal] [navy]$inparan[/navy][teal][highlight]--[/highlight];[/teal]
[teal]}[/teal]

[COLOR=darkgoldenrod]print_r[/color][teal]([/teal][navy]$part[/navy][teal]);[/teal]
Code:
Array
(
    [0] => (field1, field2, field3)
    [1] => ('value1', 'v)a)l)u)e)2', 'value3')
    [2] => ('a no\te ab\\out a php functio''n, calle\'d is_function(), is sometime''s usefu\'l', 'value2', 'value3')
    [3] => (a,b,c)
    [4] => ('d''e','f\'g','h''i\'j')
    [5] => (now(), 33, 'information about a function(), is often useful')
    [6] => ('f(i(e(l(d(1', concat('y',upper('m'),upper('sql')), 'field 3 contains only a closing bracket )')
)

Feherke.
 
thanks feherke. once again!
that works a treat.

for my next trick, i will try to adapt your code to extract the actual sql variables!
 
Hi

jpadie said:
extract the actual sql variables
What ? So the [tt]insert[/tt] could also have thing like this ?
Code:
[b]insert[/b] [b]into[/b] [b]table[/b] [teal]([/teal]field1[teal],[/teal]field2[teal],[/teal]field3[teal])[/teal] [b]values[/b] [teal]([/teal]@i[teal]:=[/teal][purple]1[/purple][teal],[/teal]@i[teal]:=[/teal]@i[teal]*[/teal][purple]2[/purple][teal],[/teal]@i[teal]:=[/teal]@i[teal]*[/teal][purple]2[/purple][teal]),([/teal]@i[teal]:=[/teal]@i[teal]*[/teal][purple]2[/purple][teal],[/teal]@i[teal]:=[/teal]@i[teal]*[/teal][purple]2[/purple][teal],[/teal]@i[teal]:=[/teal]@i[teal]*[/teal][purple]2[/purple][teal]);[/teal]
[hairpull3]

Feherke.
 
Hi

no i didn't mean sql variables in quite that sense. I should have said values instead.

so from a query like this

Code:
Insert into table (field1, field2, field3) values ('value1', 'value2', 'value3')
i will be trying to extract the values into an array
Code:
$a = array('value1', 'value2', 'value3');

what I have been doing to date is just to extract the variables that are in quotes. i.e. the string literals. since those are the only variables that I need to worry about the different escaping mechanisms between mysql and sqlite.

but i don't think that is a viable long term solution given the rate of change in wordpress core. i would ultimately like to capture all variables and test them for compatibility with sqlite. If non-compat then i perform rewrites or create UDF's.

truly ultimately i am looking for (or to write) a mysql query parser. I have looked at one or two that are available for php but it seems that the authors have all given up when things get tricky! so they all kind of half work...
 
Hi

I would just continue with the previous code.
PHP:
[navy]$text[/navy][teal]=<<<[/teal]ENDOFTEXT
[small]insert into table (field1, field2, field3) values
('value1', 'v)a)l)u)e)2', 'v,a,l,u,e,3'),
('a no\\te ab\\\\out a php functio''n, calle\'d is_function(), is sometime''s usefu\'l', 'value2', 'v(,)a,),(,l('')u(,'',)e('',\')3'), 
(a,b,c),
('d''e','f\'g','h''i\'j'),
(now(), 33, 'information about a function(), is often useful'),
('f(i(e(l(d(1', concat_ws('y',upper('m'),upper('sql')), 'field 3 contains only a closing bracket )')
;[/small]
ENDOFTEXT[teal];[/teal]

[navy]$part[/navy][teal]=[/teal][b]array[/b][teal]();[/teal] [navy]$nrpart[/navy][teal]=-[/teal][purple]1[/purple][teal];[/teal]
[navy]$inquote[/navy][teal]=[/teal]false[teal];[/teal] [navy]$inparan[/navy][teal]=[/teal][purple]0[/purple][teal];[/teal]
[b]for[/b] [teal]([/teal][navy]$i[/navy][teal]=[/teal][purple]0[/purple][teal];[/teal][navy]$i[/navy][teal]<[/teal][COLOR=darkgoldenrod]strlen[/color][teal]([/teal][navy]$text[/navy][teal]);[/teal][navy]$i[/navy][teal]++)[/teal] [teal]{[/teal]
  [b]if[/b] [teal]([/teal][navy]$text[/navy][teal][[/teal][navy]$i[/navy][teal]]==[/teal][green][i]','[/i][/green] [teal]&&[/teal] [teal]![/teal][navy]$inquote[/navy] [teal]&&[/teal] [navy]$inparan[/navy][teal]==[/teal][purple]1[/purple][teal])[/teal] [teal]{[/teal] [navy]$nrfield[/navy][teal]++;[/teal] [b]continue[/b][teal];[/teal] [teal]}[/teal]
  [b]if[/b] [teal]([/teal][navy]$text[/navy][teal][[/teal][navy]$i[/navy][teal]]==[/teal][green][i]')'[/i][/green] [teal]&&[/teal] [teal]![/teal][navy]$inquote[/navy][teal])[/teal] [navy]$inparan[/navy][teal]--;[/teal]
  [b]if[/b] [teal]([/teal][navy]$text[/navy][teal][[/teal][navy]$i[/navy][teal]]==[/teal][green][i]'\\'[/i][/green][teal])[/teal] [teal]{[/teal] [navy]$part[/navy][teal][[/teal][navy]$nrpart[/navy][teal]][[/teal][navy]$nrfield[/navy][teal]].=[/teal][COLOR=darkgoldenrod]substr[/color][teal]([/teal][navy]$text[/navy][teal],[/teal][navy]$i[/navy][teal]++,[/teal][purple]2[/purple][teal]);[/teal] [b]continue[/b][teal];[/teal] [teal]}[/teal]
  [b]if[/b] [teal]([/teal][navy]$inparan[/navy][teal])[/teal] [navy]$part[/navy][teal][[/teal][navy]$nrpart[/navy][teal]][[/teal][navy]$nrfield[/navy][teal]].=[/teal][navy]$text[/navy][teal][[/teal][navy]$i[/navy][teal]];[/teal]
  [b]if[/b] [teal]([/teal][navy]$text[/navy][teal][[/teal][navy]$i[/navy][teal]]==[/teal][green][i]"'"[/i][/green][teal])[/teal] [navy]$inquote[/navy][teal]=![/teal][navy]$inquote[/navy][teal];[/teal]
  [b]if[/b] [teal]([/teal][navy]$text[/navy][teal][[/teal][navy]$i[/navy][teal]]==[/teal][green][i]'('[/i][/green] [teal]&&[/teal] [teal]![/teal][navy]$inquote[/navy][teal])[/teal] [b]if[/b] [teal](![/teal][navy]$inparan[/navy][teal]++)[/teal] [navy]$part[/navy][teal][++[/teal][navy]$nrpart[/navy][teal]]=[/teal][b]array[/b][teal]([/teal][navy]$nrfield[/navy][teal]=[/teal][purple]0[/purple][teal]=>[/teal][green][i]''[/i][/green][teal]);[/teal]
[teal]}[/teal]

[COLOR=darkgoldenrod]print_r[/color][teal]([/teal][navy]$part[/navy][teal]);[/teal]
Code:
[small]
Array
(
    [0] => Array
        (
            [0] => field1
            [1] =>  field2
            [2] =>  field3
        )

    [1] => Array
        (
            [0] => 'value1'
            [1] =>  'v)a)l)u)e)2'
            [2] =>  'v,a,l,u,e,3'
        )

    [2] => Array
        (
            [0] => 'a no\te ab\\out a php functio''n, calle\'d is_function(), is sometime''s usefu\'l'
            [1] =>  'value2'
            [2] =>  'v(,)a,),(,l('')u(,'',)e('',\')3'
        )

    [3] => Array
        (
            [0] => a
            [1] => b
            [2] => c
        )

    [4] => Array
        (
            [0] => 'd''e'
            [1] => 'f\'g'
            [2] => 'h''i\'j'
        )

    [5] => Array
        (
            [0] => now()
            [1] =>  33
            [2] =>  'information about a function(), is often useful'
        )

    [6] => Array
        (
            [0] => 'f(i(e(l(d(1'
            [1] =>  concat_ws('y',upper('m'),upper('sql'))
            [2] =>  'field 3 contains only a closing bracket )'
        )

)
[/small]

Feherke.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top