regex: replace all special characters 1

rao1soft · Dec 9, 2004

Hi,

I'm trying to replace all special characters (space, comma, parens) within double quotes in a string with an underscore using:

Code:

while(<>) {
 s/"(.*)[ ,:\]\[\(\)](.*)"/"$1_$2"/g;
print "$_\n";
}

The following text

Code:

<?xml version="1.0" encoding="UTF-8"?>
<root>
<cleanup"a string,here with(junk">
<cleanup"a string,here[with junk">
<cleanup"a string)here:with,junk">
<cleanup"a string(here:with[junk">
<cleanup"a string,here:with)junk">
<cleanup"a string:here]with]junk">
</root>

gets me

Code:

<?xml version="1.0"_encoding="UTF-8"?>
<root>
<cleanup"a string,here with_junk">
<cleanup"a string,here[with_junk">
<cleanup"a string)here:with_junk">
<cleanup"a string(here:with_junk">
<cleanup"a string,here:with_junk">
<cleanup"a string:here]with_junk">
</root

But only the last one is being substituted. How do I get to sub all the occurrences? Also, if there are two sets of double quotes (as in line 1) how do I process each set separately?

Thanks.

uida1154 · Dec 9, 2004

It's got to do with the greediness of RE.

Try using .*?

The ? means "use the shortest match".

rao1soft · Dec 9, 2004

Then only the first match is replaced!

mikevh · Dec 9, 2004

I had this sub already written, which seems to do the trick:

Code:

#!perl
use strict;
use warnings;

while (<DATA>) {
    print rpldelimxy($_, "\"", qr([ ,:\]\[\(\)]), "_");
}

[b]sub rpldelimxy {
    # Replace $x within $delim's with $y.
    # $x and $delim may be regexes. $y must be a string.
    my ($str, $delim, $x, $y) = @_;
    my $strlen = length($str);
    my $sawdelim = 0;
    for (my $i=0; $i<$strlen; $i++) {
        my $s = substr($str, $i, 1);
        if ($s =~ $delim) {
            $sawdelim = $sawdelim? 0: 1;
        }
        if ($s =~ /$x/ && $sawdelim) {
            substr($str, $i, 1) = $y;
        }
    }
    return $str;
}[/b]

__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<root>
<cleanup"a string,here with(junk">
<cleanup"a string,here[with junk">
<cleanup"a string)here:with,junk">
<cleanup"a string(here:with[junk">
<cleanup"a string,here:with)junk">
<cleanup"a string:here]with]junk">
</root>

Output

Code:

<?xml version="1.0" encoding="UTF-8"?>
<root>
<cleanup"a_string_here_with_junk">
<cleanup"a_string_here_with_junk">
<cleanup"a_string_here_with_junk">
<cleanup"a_string_here_with_junk">
<cleanup"a_string_here_with_junk">
<cleanup"a_string_here_with_junk">
</root>

mikevh · Dec 9, 2004

Hmm, in rpldelimxy, that really should be
if ($s =~ /$delim/)
(Note the slashes around $delim.)
Seems to work without 'em, but why ask for trouble. (It always finds you anyway.)

mikevh · Dec 9, 2004

Here's an improved version of the routine. This allows $delim, $x, and $y to contain multiple characters, which the earlier version did not. There they all had to be single chars for the routine to work correctly. If you pass an empty string in $y, $x will be deleted within delims.

This version worked well with light testing.

Note the CAVEATS. There's currently no check to make sure an opening delimiter has a corresponding ending delimiter. (Also true of the earlier version.) I think I want to think about this a bit more before I do anything about it.

Code:

sub rpldelimxy {
    # Replace $x within $delim's with string $y.
    # $delim and $x may be strings or regexes.  $y must be a string.
    # Passing an empty string in $y will delete $x within $delims.
    # CAVEATS: Currently no check to see if delimiters are paired.
    #    Once we see an opening $delim, we keep replacing $x with $y 
    # until we see a closing $delim ... 
    my ($str, $delim, $x, $y) = @_;
    my $sawdelim = 0;
    my $leny = length($y);
    my $i = 0;
    while ($i < length($str)) {
        my $s = substr($str, $i);
        if ($s =~ /^($delim)/) {
            $sawdelim = $sawdelim? 0: 1;
            $i += length($1);
        } elsif ($s =~ /^($x)/ && $sawdelim) {
            my $lenx = length($1);
            my $temp = substr($str, 0, $i) . $y;
            if ($i < length($str) - $lenx) {
                $temp .= substr($str, $i + $lenx);
            }
            $str = $temp;
            $i += $leny;
        } else {
            $i++;
        }
    }
    return $str;
}

chessbot · Dec 10, 2004

Is there a stack module or representation in perl at all? If so, it would be pretty easy to test if the values are paired.

--Chessbot

There is a level of Hell reserved for probability theorists in which every monkey that types on a typewriter produces a Shakespearean sonnet.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

regex: replace all special characters 1

rao1soft

Programmer

uida1154

Programmer

rao1soft

Programmer

mikevh

Programmer

mikevh

Programmer

mikevh

Programmer

chessbot

Programmer

Similar threads

Part and Inventory Search

Sponsor