I was wondering how to write a regexp that can handle "matching pairs" of the delimeters used in the regexp.
For example, if you were trying to parse script code (we'll say JavaScript to avoid the argument of "only Perl can parse Perl" and discussions of how complicated Perl is) and you wanted to get all the code of a function (specifically, everything after the opening { brace following the function name and arguments, all the way to the matching closing brace at the very end of the function, ignoring any closing braces within the code itself as long as those braces are matched with opening ones)
And so the regexp would look for "function * (*) {*}" but any closing } brace inside the function would terminate that regexp, so it would need to keep track of matching braces, so every opening brace inside of $3 would be paired with its matching closing brace and not assume that the first closing brace is the end of the regexp.
i.e.
How does one create a regexp that handles this?
-------------
Cuvou.com | My personal homepage
For example, if you were trying to parse script code (we'll say JavaScript to avoid the argument of "only Perl can parse Perl" and discussions of how complicated Perl is) and you wanted to get all the code of a function (specifically, everything after the opening { brace following the function name and arguments, all the way to the matching closing brace at the very end of the function, ignoring any closing braces within the code itself as long as those braces are matched with opening ones)
Code:
my $javascript = qq~
var url = "/ajax.cgi";
function request(q) {
var ajax = new XMLHTTPRequest();
if (defined ajax) {
ajax.onreadystatechange = ajax_handler;
ajax.get(url + "?" + query);
ajax.send(null);
}
else {
window.alert ("ajax object not defined!");
}
}
function ajax_handler() {
window.alert("testing");
}~;
# collect the functions
my $func = {};
while ($javascript =~ /function (\w+?)\s*\(.*?\) \{(.+?)\}/i) {
my $name = $1;
my $args = $2;
my $code = $3;
$func->{$name} = { args => $args, code => $code };
# some code to s/function*/ out of there
# so the while loop goes on to the next function
# instead of getting stuck
}
And so the regexp would look for "function * (*) {*}" but any closing } brace inside the function would terminate that regexp, so it would need to keep track of matching braces, so every opening brace inside of $3 would be paired with its matching closing brace and not assume that the first closing brace is the end of the regexp.
i.e.
Code:
# this is what would match here:
var url = "/ajax.cgi";
[COLOR=red]function request(q) {
var ajax = new XMLHTTPRequest();
if (defined ajax) {
ajax.onreadystatechange = ajax_handler;
ajax.get(url + "?" + query);
ajax.send(null);
}[/color]
else {
window.alert ("ajax object not defined!");
}
}
[COLOR=blue]function ajax_handler() {
window.alert("testing");
}[/color]
# this is what should match
var url = "/ajax.cgi";
[COLOR=red]function request(q) {
var ajax = new XMLHTTPRequest();
if (defined ajax) {
ajax.onreadystatechange = ajax_handler;
ajax.get(url + "?" + query);
ajax.send(null);
}
else {
window.alert ("ajax object not defined!");
}
}[/color]
[COLOR=blue]function ajax_handler() {
window.alert("testing");
}[/color]
How does one create a regexp that handles this?
-------------
Cuvou.com | My personal homepage
Code:
perl -e '$|=$i=1;print" oo\n<|>\n_|_";x:sleep$|;print"\b",$i++%2?"/":"_";goto x;'