Simple sed Parsing

tCarls · Dec 28, 2006

I'd like to print only the part of a line enclosed
in parenthesis. For example:

input: junk_VAR1=12_junk_VAR2=231_junk

output: 12

In perl (this is too slow) it would be something like:
print ($line =~ /VAR1=(\d+)/);

Right now I'm substituting the entire line with \1, which
is faster than perl, but that can't be the most efficient
way to do it, can it?

echo $line | sed 's/^.*VAR1=$[0-9]*$.*$/\1/'

Annihilannic · Dec 28, 2006

I can't think how you'd make it much more efficient. Why do you think perl is too slow? When I tested it perl took 0.2s versus 0.1s for sed hardly a difference you'd notice!

Annihilannic.

tCarls · Dec 29, 2006

I'm running about 100 of these scripts in the background.
I have a GUI written in tcl/tk built around a bash script
and it's these few lines of parsing that's taking the
most effort and slowing everything else down.

I looked into it more and it turns out it's not so
much sed that takes too long but piping input into sed.
You're right, it's hardly something you would ever care
about, but it's good to know if you ever need it ... and
I think it's interesting.

bash-2.05b$ time ./test.sh perlPipe

real 0m1.269s
user 0m0.340s
sys 0m0.876s
bash-2.05b$ time ./test.sh perlFile

real 0m0.948s
user 0m0.364s
sys 0m0.544s
bash-2.05b$ time ./test.sh sedPipe

real 0m0.846s
user 0m0.244s
sys 0m0.564s
bash-2.05b$ time ./test.sh sedFile

real 0m0.522s
user 0m0.176s
sys 0m0.312s

#!/bin/sh
# test.sh
case $1 in
perlPipe)
for (( i=1 ; $i < 100 ; i=$(($i+1)) )); do
line="junk_VAR1=7_junk_VAR2=9_junk_VAR3=85_junk"
var1=`echo "print ('$line' =~ /VAR1=(\d+)/)" | perl`
var2=`echo "print ('$line' =~ /VAR2=(\d+)/)" | perl`
var3=`echo "print ('$line' =~ /VAR3=(\d+)/)" | perl`
done
;;

perlFile)
for (( i=1 ; $i < 100 ; i=$(($i+1)) )); do
echo "print ('junk_VAR1=7_junk_VAR2=9_junk_VAR3=85_junk' =~ /VAR1=(\d+)/)" > file
var1=`perl file`
echo "print ('junk_VAR1=7_junk_VAR2=9_junk_VAR3=85_junk' =~ /VAR2=(\d+)/)" > file
var2=`perl file`
echo "print ('junk_VAR1=7_junk_VAR2=9_junk_VAR3=85_junk' =~ /VAR3=(\d+)/)" > file
var3=`perl file`
done
;;

sedPipe)
for (( i=1 ; $i < 100 ; i=$(($i+1)) )); do
line="junk_VAR1=7_junk_VAR2=9_junk_VAR3=85_junk"
var1=`echo $line | sed 's/^.*VAR1=$[0-9]*$.*$/\1/'`
var2=`echo $line | sed 's/^.*VAR2=$[0-9]*$.*$/\1/'`
var3=`echo $line | sed 's/^.*VAR3=$[0-9]*$.*$/\1/'`
done
;;

sedFile)
for (( i=1 ; $i < 100 ; i=$(($i+1)) )); do
echo "junk_VAR1=7_junk_VAR2=9_junk_VAR3=85_junk" > line
var1=`sed 's/^.*VAR1=$[0-9]*$.*$/\1/' line`
var2=`sed 's/^.*VAR2=$[0-9]*$.*$/\1/' line`
var3=`sed 's/^.*VAR3=$[0-9]*$.*$/\1/' line`
done
;;
esac

Salem · Dec 29, 2006

If there is a consistent number of _ and = in the line, as your example suggests, then perhaps using the IFS variable as shown in this example would work.

Code:

   useIFS)
        oldIFS=$IFS
        IFS="_="
        for (( i=1 ; $i < 100 ; i=$(($i+1)) )); do
                echo "junk_VAR1=7_junk_VAR2=9_junk_VAR3=85_junk" |
                    (
                        read j1 v1 var1 j2 v2 var2 j3 v3 var3 j4
                        #echo $var1 $var2 $var3
                    )
        done
        IFS=$oldIFS
   ;;

Since it creates far fewer processes to split the values out of the string, it should be quicker.
Whilst cygwin bash replicates the functionality, the performance of some things leaves a lot to be desired. But my limited test was about 5x quicker than the perlPipe test.

--

tCarls · Dec 30, 2006

Wow, that makes a world of difference. I actually ended up
rewriting the entire bash script in C, which no script
stands a chance against in terms of speed, but it's still
good to know. Thanks a lot.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Simple sed Parsing

tCarls

Programmer

Annihilannic

MIS

tCarls

Programmer

Salem

Programmer

tCarls

Programmer

Similar threads

Part and Inventory Search

Sponsor