Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Embeddable/Recursive Regexp?

Status
Not open for further replies.

Kirsle

Programmer
Jan 21, 2006
1,179
0
0
US
I was wondering how to write a regexp that can handle "matching pairs" of the delimeters used in the regexp.

For example, if you were trying to parse script code (we'll say JavaScript to avoid the argument of "only Perl can parse Perl" and discussions of how complicated Perl is) and you wanted to get all the code of a function (specifically, everything after the opening { brace following the function name and arguments, all the way to the matching closing brace at the very end of the function, ignoring any closing braces within the code itself as long as those braces are matched with opening ones)

Code:
my $javascript = qq~
var url = "/ajax.cgi";

function request(q) {
   var ajax = new XMLHTTPRequest();
   if (defined ajax) {
      ajax.onreadystatechange = ajax_handler;
      ajax.get(url + "?" + query);
      ajax.send(null);
   }
   else {
      window.alert ("ajax object not defined!");
   }
}

function ajax_handler() {
   window.alert("testing");
}~;

# collect the functions
my $func = {};
while ($javascript =~ /function (\w+?)\s*\(.*?\) \{(.+?)\}/i) {
   my $name = $1;
   my $args = $2;
   my $code = $3;
   $func->{$name} = { args => $args, code => $code };
   # some code to s/function*/ out of there
   # so the while loop goes on to the next function
   # instead of getting stuck
}

And so the regexp would look for "function * (*) {*}" but any closing } brace inside the function would terminate that regexp, so it would need to keep track of matching braces, so every opening brace inside of $3 would be paired with its matching closing brace and not assume that the first closing brace is the end of the regexp.

i.e.

Code:
# this is what would match here:
var url = "/ajax.cgi";

[COLOR=red]function request(q) {
   var ajax = new XMLHTTPRequest();
   if (defined ajax) {
      ajax.onreadystatechange = ajax_handler;
      ajax.get(url + "?" + query);
      ajax.send(null);
   }[/color]
   else {
      window.alert ("ajax object not defined!");
   }
}

[COLOR=blue]function ajax_handler() {
   window.alert("testing");
}[/color]

# this is what should match
var url = "/ajax.cgi";

[COLOR=red]function request(q) {
   var ajax = new XMLHTTPRequest();
   if (defined ajax) {
      ajax.onreadystatechange = ajax_handler;
      ajax.get(url + "?" + query);
      ajax.send(null);
   }
   else {
      window.alert ("ajax object not defined!");
   }
}[/color]

[COLOR=blue]function ajax_handler() {
   window.alert("testing");
}[/color]

How does one create a regexp that handles this?

-------------
Cuvou.com | My personal homepage
Code:
perl -e '$|=$i=1;print" oo\n<|>\n_|_";x:sleep$|;print"\b",$i++%2?"/":"_";goto x;'
 
I really think you would need a parser for something like that. Your example javascript is too contrived. LIke how would you handle a lone curly bracket that might be inside a javascript string that only gets printed and is not part of any actual functionality of the javascript code.

Syntax::Highlight::Engine::Kate has a javascript plugin you might want to look at.

If the point is to figue out a way with regular expressions you might want to look into @+ and @-. You can use recursive patterns in regular expressions with perl 5.10 but I have not deleved into that yet myself and don't know how to write a regular expression that does that for anything complicated.


------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
This is the only way I can see at the moment
Code:
my@funcs=split/function/i,$javascript;
shift@funcs;
for(@funcs){
  /\s*(\w+?)\s*\((.*?)\)\s*\{(.+)\}/s;
  my$name=$1;
  my$args=$2;
  my$code=$3;
  #...
}
This will be OK also for the situation outlined by Kevin, and will suppress any code lines outside the functions, provided they do not include braces.
Notice that the third match must not be greedy.

Franco
: Online engineering calculations
: Magnetic brakes for fun rides
: Air bearing pads
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top