Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Multiply "regexp" conditions

Status
Not open for further replies.

cptk

Technical User
Mar 18, 2003
305
US
say I have a list called mylist with the following elements:
dog.5.5.5
dog.6.6.6.xyz
cat.7.7.7
dog.7.7.7


and I want to only retrieve the elements that start with dog but end only with a number (i.e. - dog.5.5.5 & dog.7.7.7).

I've tried several variations on the regexp cmd, for example:

foeach elem $mylist {
regexp {^(dog)[^a-z]$) $elem, or
regexp {^(dog)[0-9]$} $elem, etc.

Basically I'm thinking I can do it all in one step with "regexp" cmd. I know I can do it if, for example, I first test for "dog" (via regexp) then test again for whether or not ending in a number, but I hoping to consolidate this "testing" in one "regexp" cmd. Is this possible? Can someone shed some light on using the various regexp syntax's available (i.e. '*' '?' '|' etc.)

...thanks!
 
I think this works:
regexp {^dog(.*)[0-9]$} $<your string>

Bob Rashkin
rrashkin@csc.com
 
I think this works:
regexp {^dog(.*)[0-9]$} $<your string>

Bob Rashkin
rrashkin@csc.com
 
Thanks Bob (aka Krisha Bob (per Hank)) ... the missing .* was what was tripping me up!

Anyway, for my own edification, I summarized the following notes regarding regexp cmds:

1.) You don't necessarily need the parentheses - they only provide feedback if you use the optional submatch variables (i.e., q w x y z in my examples below)

2.) When using submatch variable(s), the first variable always contains the entire value of a successful match returned from the regexp cmd.

3.) The remaining submatch variables, if used, will contain the values of each parenthesized expression within the regexp cmd.

4.) The key expression (for me at least) was using the .* expression, which represents any number of characters.

example #1:
regexp {(^dog)(.*)([0-9]$)} dog.6.7.8 q w x y z
results:
q="dog.6.7.8" <== successful match, entire value
w="dog" <== 1st parentheses
x=".6.7." <== 2nd parentheses
y="8" <== 3rd parentheses
z="" <== not set (no 4th parenthesized expression)

example #2:
regexp {^dog(.*)([0-9]$)} dog.6.7.8 q w x y z
results:
q="dog.6.7.8" <== successful match, entire value
w=".6.7." <== 1st parentheses
x="8" <== 2nd parentheses
y="" <== not set (no 3rd parenthesized expression)
z="" <== not set (no 4th parenthesized expression)

example #3:
regexp {^dog.*[0-9]$} dog.6.7.8 q w x y z
results:
q="dog.6.7.8" <== successful match, entire value
w="" <== not set (no 1st parenthesized expression)
x="" <== not set (no 2nd parenthesized expression)
y="" <== not set (no 3rd parenthesized expression)
z="" <== not set (no 4th parenthesized expression)

example #4:
regexp {^dog.*([^a-z][0-9]$)} dog.6.7.8 q w x y z
results:
q="dog.6.7.8"
w=".8" <== 1st parentheses
x="" <== not set (no 2nd parenthesized expression)
y="" <== not set (no 3rd parenthesized expression)
z="" <== not set (no 4th parenthesized expression)


Expressions followed by a ? will match zero or one occurrence only of the expression.
example #5:
regexp {(z?)[0-9]$} zzzz5 q w x y z
results:
q="z5"
w="z" <== the 4th "z"
...x,y,z not set!

example #6:
regexp {(z?)[0-9]$} 555 q w x y z
results:
q="5"
w="" <== zero matches from 1st parentheses
...x,y,z not set!

Expressions followed by a + will match one or more occurrences of the expression.
example #7:
regexp {(z+)[0-9]$} zzz5 q w x y z
results:
q="zzz5"
w="zzz" <== mult. matches
...x,y,z not set!

example #8:
regexp {(z+)[0-9]$} zzz.5 q w x y z
results:
regexp fails due to the period between a group of z's and the number 5

...now I'm done! Hopefully this will be used for future "jog-your-memory" days.
 
Yeah, I've seen it, but it's always good to have many examples ...

Now, another question ...

How to substitute a variable in a regexp?

> set x "dog"
> regexp {$x} dog123


...I've tried many permutations

> regexp {[$x]} dog123
> regexp {($x)} dog123
> regexp {"$x"} dog123
escaping the $, etc. to no avail. Can this be done?
 
Lose the curly braces. That is:
set exp "^dog"
regexp $exp dog123

Bob Rashkin
rrashkin@csc.com
 
But I can't lose those curly braces if I have multiple expressions (refer to my examples #1-4 above)
 
Sure you can. I meant lose them in the final statement:
set exp {^dog.*([^a-z][0-9]$)}
regexp $exp dog123

Bob Rashkin
rrashkin@csc.com
 
We're still not in sync ...

I want to substitute a variable, say x, witin my multiple regexp cmd so that the value of "x" can change (i.e. - say from "dog" to "cat")

example:
set x "dog"
regexp {^$x.*([^a-z][0-9]$)} dog123

I'm not too worried about the leading carrot for now (I can always include that value while setting the variable.)
 
I think you need to do that substitution before the regexp statement. So for instance, let's make your regexp be

^$x.*([0-9]$), where x can be, say, "dog". We'll put that into a variable, exp, for later substitution:
set exp {\{^$x.*(\[0-9\]$)\}}. I've wrapped it all in curly braces to avoid substitution now, and explicitly escaped the { and [ brackets so they won't be evaluated when I do perform the substitution. At this point, exp = \{^$x.*(\[0-9\]$)\}. Now make x what you want:
set x dog, and evaluate the exp with an explicit substitution call with the regexp statement:
regexp [subst $exp] dog.123

Bob Rashkin
rrashkin@csc.com
 
Thanks, but still problems ... Strange, check this out! ...

1> set x "dog"
dog

2> set q {\{^$x.*\[^a-z\]\[0-9\]$\}}
\{^$x.*\[^a-z\]\[0-9\]$\}

3>regexp [subst $q] dog.3.1.4.26
0 (wasn't successful!)

However, ...
4> subst $q
{^dog.*[^a-z][0-9]$}

5> regexp {^dog.*[^a-z][0-9]$} dog.3.1.4.26
1 (successful!)

Upshot: it seems line# 3> "subst" didn't quite work, even though the return value on line# 4> appears to look good.
(Note: I took out the parentheses - didn't need them.)
 
Since I got it to work in my tclsh experiment, all I can think is that the parenthesis might be needed for the subst nested command.

Bob Rashkin
rrashkin@csc.com
 
Makes no difference (for me) with or without the parentheses.
My tcl version is 8.3

Well, I think we exhusted this thread, huh?
 
Where's Ken Jones (Avia) when we need him?

Bob Rashkin
rrashkin@csc.com
 
I'm missing something.
No need for subst and it would be subst -novariable
anyway.
The list {} syntax is WRONG.

Try this:
Code:
set x "dog"
regexp ($x) "dog123" all in
puts $in
#or:
regexp "($x)(\[0-9\]+)" "dog123" all x num
puts "$x != $num"
#or:
set oldx $x
regexp "$x\(\[a-z\]+\[0-9\]+\)$" "doghasbeendefunct200" all x 
puts "x was $oldx but it is now $x"

You guys are making it too hard. ;)




 
Thanks "marsd" for the reply ...

You're are right, using the list {} syntax is WRONG!!

When I do the following, it works:

> set x "dog"
dog
> regexp ^$x.*\[^a-z\]\[0-9\]$ dog.3.1.4.26
1

Which is interrupted as follows:
1.) ^$x -- find at the beginning of string the value of variable $x (i.e. - dog)
2.) .* -- any number of characters
3.) [^a-z] -- the 2nd to last value of string not equal to a lower case character
4.) [0-9]$ -- last character must be a number

Although it's not recommended to use the list {} syntax, you can get away with it as long as you don't try to use the escape character (i.e. - "\") within your string!

However, using the list {} syntax will cause you headache if your try to use a varaible substituion.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top