Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help with parsing a Perl file in Perl. 1

Status
Not open for further replies.

ericse

Programmer
Apr 19, 2007
32
US
Hi guys-

I have a bit of a problem. Basically, I'm trying to loop through the code to our application, and find all instances of the function gettext(). I then want to extract the first parameter of this function and throw it in a file for later processing. All is well, I can do this, the problem occurs when the gettext() function spans more than one line (due to coding style and readability issues this occurs often). Another dilemma being, the gettext() function can contain nested functions (i.e. gettext(dumb_function(whatever));). The way I tried to handle this was, grab everything in between the gettext parenthesis, so in this case "dumb_function(whatever)" and return that. The function will not always have only one parameter. Sometimes there are two or three, but the first one is the only one I'm interested in. Anyway, while looping through the file, I use a goto statement and index jumping to do this. The problem with this is it's sloppy and prone to errors, impossible to read, and just plain ugly. I'd really like to use a recursive function to step through the file, pulling out the gettext() functions completely (including multi-lined functions if necessary). Once this is done, I can parse each one individually to pull out the first parameter. But I'm having trouble working through the logic. I've never really used recursive functions, so any input on this problem, or design examples or anything would be greatly appreciated. I hope someone can help me. Thanks

~Eric
 
Here is the portion of the code where I'm pulling the gettext() functions......:

Code:
    for (my $index = 0; $index <= $line_cnt; $index++) {
        my $line = $file[$index];
        my $line_pos = index($line, 'gettext(');
        $gt_function = undef;
        if ($line_pos > -1) {
            #we found a gettext() line, now we need to find the ending.
            $line_pos += length('gettext(');
            #lets check to see if we open any other parens
            my $open = 1;
            my $pos = $line_pos;
NEXTLINE:
            $line = $file[$index];
            @line_by_char = split('', $line);

            do {
                if ($line_by_char[$pos] eq '(') { $open++; }
                elsif ($line_by_char[$pos] eq ')') { $open--; }
                if ($line_by_char[$pos] eq "\n" || $pos > @line_by_char) {
                    $index++;
                    $pos = 0;
                    $gt_function .= $line;
                    goto NEXTLINE;
                }
                $pos++;
            } until ($open == 0);

            $gt_function .= $line;
            $gt_function =~ s/[\n]//g;

            my $add_str = cleanup_param(grab_first_param(grab_gettext_data($gt_function)));
            if ($add_str eq 'err_no_param' || $add_str eq 'err_index') {
                push @errors, "($index)$path: $add_str";
            }
            elsif ($add_str =~ /^(\$)/ || $add_str =~ /^([\w]{2}[:]{2})/) {
                push @vars, "($index)$path: $add_str";
            }
            else {
                push @collection, $add_str;
            }
        }
    }
 
Hello ericse,

The below code uses a recursive regular expression to get the information that you want. This is a gift, so I'm not going to explain this to you. If you want to learn how it does what it does, you'll have to read the perldocs on regular expressions.


The main thing that you need to know is that the regex works on the entire file at once instead of line by line as you're currently doing. Therefore you'll have to modify your code so that it slurps the file and then duplicate the while loop just like I've done.

Also, my logic currently does not cover one situation, and that is when a parameter is a string that contains a parenthesis. I consider this rather minor though, so didn't bother writing the code for it. However, the skills involved in doing so are demonstrated in the below code. So if you learn this, then you have the option to adapt the code.

Anyway, enough gibber gabber. Here's the code:

Code:
[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]

[url=http://perldoc.perl.org/functions/our.html][black][b]our[/b][/black][/url] [blue]$parens[/blue] = [red]qr{[/red][purple][/purple]
[purple]	[purple][b]\([/b][/purple][/purple]
[purple]	(?:[/purple]
[purple]		(?> [^[purple][b]\([/b][/purple][purple][b]\)[/b][/purple]]+ )[/purple]
[purple]	|[/purple]
[purple]		(??{ [blue]$parens[/blue] })[/purple]
[purple]	)*[/purple]
[purple]	[purple][b]\)[/b][/purple][/purple]
[purple][/purple][red]}[/red][red]sx[/red][red];[/red]

[gray][i]#gettext(params)[/i][/gray]
[black][b]our[/b][/black] [blue]$gettext[/blue] = [red]qr{[/red][purple][/purple]
[purple]	gettext[purple][b]\s[/b][/purple]*[purple][b]\([/b][/purple][/purple]
[purple]		(?:[/purple]
[purple]			(?> [^[purple][b]\([/b][/purple][purple][b]\)[/b][/purple]]+ )[/purple]
[purple]		|[/purple]
[purple]			(??{ [blue]$parens[/blue] })[/purple]
[purple]		)*[/purple]
[purple]	[purple][b]\)[/b][/purple][/purple]
[purple][/purple][red]}[/red][red]sx[/red][red];[/red]

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$data[/blue] = [url=http://perldoc.perl.org/functions/do.html][black][b]do[/b][/black][/url] [red]{[/red][url=http://perldoc.perl.org/functions/local.html][black][b]local[/b][/black][/url] [blue]$/[/blue][red];[/red] <DATA>[red]}[/red][red];[/red]

[olive][b]while[/b][/olive] [red]([/red][blue]$data[/blue] =~ [red]/[/red][purple]([blue]$gettext[/blue])[/purple][red]/[/red][red]sg[/red][red])[/red] [red]{[/red]
	[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple][blue]$1[/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
[red]}[/red]

[fuchsia]1[/fuchsia][red];[/red]

[teal]__END__[/teal]

[teal]# These are just simple parameters.  They should be easy.[/teal]

[teal]gettext("single parameters");[/teal]

[teal]# This is double.  again easy.[/teal]

[teal]gammergammer(); gettext("double parameters", 10); ooobaby();[/teal]

[teal]# Single but multi-line.[/teal]

[teal]gettext([/teal]
[teal]	"ha! we can hit return!"[/teal]
[teal]);[/teal]

[teal]# Ha, we can include another function!  Woot Woot![/teal]

[teal]gettext([/teal]
[teal]	foobar("boo shocka locka"),[/teal]
[teal]	"ha! we can hit return!"[/teal]
[teal]);[/teal]

[teal]# Most complicated[/teal]

[teal]gettext([/teal]
[teal]	foo("boo shocka locka"),[/teal]
[teal]	bar(10,baz(),biz()),[/teal]
[teal]	"ha! we can hit return!"[/teal]
[teal]);[/teal]

And the output from running is:

Code:
>perl gettext.pl
gettext("single parameters")
gettext("double parameters", 10)
gettext(
        "ha! we can hit return!"
)
gettext(
        foobar("boo shocka locka"),
        "ha! we can hit return!"
)
gettext(
        foo("boo shocka locka"),
        bar(10,baz(),biz()),
        "ha! we can hit return!"
)

- Miller
 
Wow, I certainly wasn't expecting such a response. Thank you very much. Now I can spend the rest of the day trying to figure how that's supposed to work :) This sure proves I was doing things the wrong way. Thanks again, I truly appreciate your help.

~Eric
 
hehehe.... I been writing perl code for a long time and I don't understand it either. But there seems to be a bit of problem:

Eval-group not allowed at runtime, use re 'eval' in regex m/
0-14='0-15'; while (&PB2DB::pbstep88(0)) {if ( length()) {print PBEVAL88 eval();}} gettext\s*\(
(?:
.../ at script line 14.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Yea, I'm getting an error as well. I'm trying to work my way through it :)

 
Nevermind. It works beautifully. The only thing it doesn't do, is grab nested gettext()'s from a gettext() function. This is a huge improvement though. Now I'm gonna play with the regular expression until I can figure it out. Again, thanks :)
 
Hi Miller-

I've worked my way through most of the code, and I understand how it's finding the parenthesis and everything. I've been working on my own regex, to handle pulling out the first param.... my main question is, is it possible to do this in a similar way as the way you accomplished pulling out all the gettext() functions? I've tried, but because the quotes are the same for opening and closing, I'm having a hard time doing it if the quoted string contains a comma ",". Essentially, I can do this if I just loop through the string one character at a time, checking... but I really like your usage of re's and was wondering if it's even possible given all the different possible variations for the first parameter? Again, I'm not trying to poach you for code, but rather, your advice on how to tackle the problem so I can work my way through it. Even using the perldoc it was hard to translate what you were doing...... Thanks again,

~Eric
 
Hi Eric,

I understand that you are not trying to poach for code. But I'm sure that you can understand that it was a lot easier for me to simply give you the above code then to actually describe how to do it so that you would write it. Some things are just too advanced to be described easily to a coder whose background you are unfamiliar with.

Nevertheless, to answer your question, yes it is possible to parse the parameters using the same techniques from the previous code. However, I'm not sure I would advise you to try.

Essentially what you're trying to do is create your own CSV like parser, except more complicated since you want to take into account more than just double quotes. There's also single quotes, and quoted regular expressions, quoting operators. Hell, if you want to go crazy you can try to cover comments too. Eek! And then there's embedded function calls that might have multiple parameters. Oooo!

After a while, you could almost be attempting to create your own perl syntax parser. Which is something that regular expressions were not really meant to do. At least not with a single regex :)

However, once again, enough gibber gabber. Yes, it is possible. And here is a version that supports some of the functionality that I described:

Code:
[gray][i]# Single Quotes Regular Expression[/i][/gray]
[url=http://perldoc.perl.org/functions/our.html][black][b]our[/b][/black][/url] [blue]$singleQuotes[/blue] = [red]qr{[/red][purple][/purple]
[purple]	[purple][b]\'[/b][/purple] (?:[/purple]
[purple]		(?> [^[purple][b]\'[/b][/purple][purple][b]\\[/b][/purple]]+ )[/purple]
[purple]	|[/purple]
[purple]		[purple][b]\\[/b][/purple] .[/purple]
[purple]	)* [purple][b]\'[/b][/purple][/purple]
[purple][/purple][red]}[/red][red]sx[/red][red];[/red]

[gray][i]# Double Quotes Regular Expression[/i][/gray]
[black][b]our[/b][/black] [blue]$doubleQuotes[/blue] = [red]qr{[/red][purple][/purple]
[purple]	[purple][b]\"[/b][/purple] (?:[/purple]
[purple]		(?> [^[purple][b]\"[/b][/purple][purple][b]\\[/b][/purple]]+ )[/purple]
[purple]	|[/purple]
[purple]		[purple][b]\\[/b][/purple] .[/purple]
[purple]	)* [purple][b]\"[/b][/purple][/purple]
[purple][/purple][red]}[/red][red]sx[/red][red];[/red]

[gray][i]# Parenthesis Regular Expression (Perl related)[/i][/gray]
[black][b]our[/b][/black] [blue]$parens[/blue] = [red]qr{[/red][purple][/purple]
[purple]	[purple][b]\([/b][/purple][/purple]
[purple]	(?:[/purple]
[purple]		(?> [^[purple][b]\)[/b][/purple][purple][b]\([/b][/purple][purple][b]\'[/b][/purple][purple][b]\"[/b][/purple]]+ )[/purple]
[purple]	|[/purple]
[purple]		(??{ [blue]$parens[/blue] })[/purple]
[purple]	|[/purple]
[purple]		(??{ [blue]$singleQuotes[/blue] })[/purple]
[purple]	|[/purple]
[purple]		(??{ [blue]$doubleQuotes[/blue] })[/purple]
[purple]	)*[/purple]
[purple]	[purple][b]\)[/b][/purple][/purple]
[purple][/purple][red]}[/red][red]sx[/red][red];[/red]


[gray][i]#gettext(params)[/i][/gray]
[black][b]our[/b][/black] [blue]$gettext[/blue] = [red]qr{[/red][purple][/purple]
[purple]	gettext[purple][b]\s[/b][/purple]*[purple][b]\([/b][/purple][/purple]
[purple]	( # Capture first parameter[/purple]
[purple]		(?:[/purple]
[purple]			(?> [^[purple][b]\)[/b][/purple][purple][b]\,[/b][/purple][purple][b]\([/b][/purple][purple][b]\'[/b][/purple][purple][b]\"[/b][/purple]]+ )[/purple]
[purple]		|[/purple]
[purple]			(??{ [blue]$parens[/blue] })[/purple]
[purple]		|[/purple]
[purple]			(??{ [blue]$singleQuotes[/blue] })[/purple]
[purple]		|[/purple]
[purple]			(??{ [blue]$doubleQuotes[/blue] })[/purple]
[purple]		)*[/purple]
[purple]	) # End Capture[/purple]
[purple]	[[purple][b]\)[/b][/purple][purple][b]\,[/b][/purple]][/purple]
[purple][/purple][red]}[/red][red]sx[/red][red];[/red]

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$data[/blue] = [url=http://perldoc.perl.org/functions/do.html][black][b]do[/b][/black][/url] [red]{[/red][url=http://perldoc.perl.org/functions/local.html][black][b]local[/b][/black][/url] [blue]$/[/blue][red];[/red] <DATA>[red]}[/red][red];[/red]

[olive][b]while[/b][/olive] [red]([/red][blue]$data[/blue] =~ [red]/[/red][purple][blue]$gettext[/blue][/purple][red]/[/red][red]sg[/red][red])[/red] [red]{[/red]
	[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple]Parameter is <[blue]$1[/blue]>[purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
[red]}[/red]

[fuchsia]1[/fuchsia][red];[/red]

[teal]__END__[/teal]

[teal]# These are just simple parameters.  They should be easy.[/teal]

[teal]gettext("single parameters");[/teal]

[teal]# This is double.  again easy.[/teal]

[teal]gammergammer(); gettext("double parameters", 10); ooobaby();[/teal]

[teal]# Single but multi-line.[/teal]

[teal]gettext([/teal]
[teal]	"ha! we can hit return!"[/teal]
[teal]);[/teal]

[teal]# Ha, we can include another function!  Woot Woot![/teal]

[teal]gettext([/teal]
[teal]	foobar("boo shocka locka"),[/teal]
[teal]	"ha! we can hit return!"[/teal]
[teal]);[/teal]

[teal]# Most complicated[/teal]

[teal]gettext([/teal]
[teal]	foo("boo shocka locka"),[/teal]
[teal]	bar(10,baz(),biz()),[/teal]
[teal]	"ha! we can hit return!"[/teal]
[teal]);[/teal]

[teal]# More Most Complicated[/teal]

[teal]gettext([/teal]
[teal]	foo("comma , and paren )",'single \' quotes'),[/teal]
[teal]	bar(10,baz(),biz()),[/teal]
[teal]	"ha! we can hit return!"[/teal]
[teal]);[/teal]

And the output is:

Code:
>perl gettext2.pl
Parameter is <"single parameters">
Parameter is <"double parameters">
Parameter is <
        "ha! we can hit return!"
>
Parameter is <
        foobar("boo shocka locka")>
Parameter is <
        foo("boo shocka locka")>
Parameter is <
        foo("comma , and paren )",'single \' quotes')>
- Miller

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top