Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Simple sed Parsing

Status
Not open for further replies.

tCarls

Programmer
Dec 26, 2006
13
US
I'd like to print only the part of a line enclosed
in parenthesis. For example:

input: junk_VAR1=12_junk_VAR2=231_junk

output: 12

In perl (this is too slow) it would be something like:
print ($line =~ /VAR1=(\d+)/);

Right now I'm substituting the entire line with \1, which
is faster than perl, but that can't be the most efficient
way to do it, can it?

echo $line | sed 's/^.*VAR1=\([0-9]*\).*$/\1/'
 
I can't think how you'd make it much more efficient. Why do you think perl is too slow? When I tested it perl took 0.2s versus 0.1s for sed hardly a difference you'd notice!

Annihilannic.
 
I'm running about 100 of these scripts in the background.
I have a GUI written in tcl/tk built around a bash script
and it's these few lines of parsing that's taking the
most effort and slowing everything else down.

I looked into it more and it turns out it's not so
much sed that takes too long but piping input into sed.
You're right, it's hardly something you would ever care
about, but it's good to know if you ever need it ... and
I think it's interesting.

bash-2.05b$ time ./test.sh perlPipe

real 0m1.269s
user 0m0.340s
sys 0m0.876s
bash-2.05b$ time ./test.sh perlFile

real 0m0.948s
user 0m0.364s
sys 0m0.544s
bash-2.05b$ time ./test.sh sedPipe

real 0m0.846s
user 0m0.244s
sys 0m0.564s
bash-2.05b$ time ./test.sh sedFile

real 0m0.522s
user 0m0.176s
sys 0m0.312s



#!/bin/sh
# test.sh
case $1 in
perlPipe)
for (( i=1 ; $i < 100 ; i=$(($i+1)) )); do
line="junk_VAR1=7_junk_VAR2=9_junk_VAR3=85_junk"
var1=`echo "print ('$line' =~ /VAR1=(\d+)/)" | perl`
var2=`echo "print ('$line' =~ /VAR2=(\d+)/)" | perl`
var3=`echo "print ('$line' =~ /VAR3=(\d+)/)" | perl`
done
;;

perlFile)
for (( i=1 ; $i < 100 ; i=$(($i+1)) )); do
echo "print ('junk_VAR1=7_junk_VAR2=9_junk_VAR3=85_junk' =~ /VAR1=(\d+)/)" > file
var1=`perl file`
echo "print ('junk_VAR1=7_junk_VAR2=9_junk_VAR3=85_junk' =~ /VAR2=(\d+)/)" > file
var2=`perl file`
echo "print ('junk_VAR1=7_junk_VAR2=9_junk_VAR3=85_junk' =~ /VAR3=(\d+)/)" > file
var3=`perl file`
done
;;

sedPipe)
for (( i=1 ; $i < 100 ; i=$(($i+1)) )); do
line="junk_VAR1=7_junk_VAR2=9_junk_VAR3=85_junk"
var1=`echo $line | sed 's/^.*VAR1=\([0-9]*\).*$/\1/'`
var2=`echo $line | sed 's/^.*VAR2=\([0-9]*\).*$/\1/'`
var3=`echo $line | sed 's/^.*VAR3=\([0-9]*\).*$/\1/'`
done
;;

sedFile)
for (( i=1 ; $i < 100 ; i=$(($i+1)) )); do
echo "junk_VAR1=7_junk_VAR2=9_junk_VAR3=85_junk" > line
var1=`sed 's/^.*VAR1=\([0-9]*\).*$/\1/' line`
var2=`sed 's/^.*VAR2=\([0-9]*\).*$/\1/' line`
var3=`sed 's/^.*VAR3=\([0-9]*\).*$/\1/' line`
done
;;
esac
 
If there is a consistent number of _ and = in the line, as your example suggests, then perhaps using the IFS variable as shown in this example would work.
Code:
   useIFS)
        oldIFS=$IFS
        IFS="_="
        for (( i=1 ; $i < 100 ; i=$(($i+1)) )); do
                echo "junk_VAR1=7_junk_VAR2=9_junk_VAR3=85_junk" |
                    (
                        read j1 v1 var1 j2 v2 var2 j3 v3 var3 j4
                        #echo $var1 $var2 $var3
                    )
        done
        IFS=$oldIFS
   ;;
Since it creates far fewer processes to split the values out of the string, it should be quicker.
Whilst cygwin bash replicates the functionality, the performance of some things leaves a lot to be desired. But my limited test was about 5x quicker than the perlPipe test.

--
 
Wow, that makes a world of difference. I actually ended up
rewriting the entire bash script in C, which no script
stands a chance against in terms of speed, but it's still
good to know. Thanks a lot.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top