Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How can regexp choose the smallest possible area?

Status
Not open for further replies.

firelex

Programmer
Jan 10, 2002
118
DE
Hello, all!
My problem is:
I have a string like "Title:zuiop:asddf:asdf:dd".
I wanted to regexp it so, that [tt]"Title"[/tt] and the rest of the string were put into two separate variables.
BUT the point is : regexp always tries to select the largest possible range from the string. Here is my regexp:
[tt]
set temp "Title:zuiop:asddf:asdf:dd"
regexp {^(.*)\:(.*)$} $temp all v1 v2
[/tt]
It divides into "Title:zuiop:asddf:asdf" and "dd" and not [tt]Title[/tt] and [tt]zuiop:asddf:asdf:dd[/tt]!!!

This one doesn't work also :
[tt]regexp {^([^\:].*)\:(.*)$} $temp all v1 v2[/tt]

Any ideas? Thanks in advance
 
Code:
regexp "^(\[a-zA-Z\]+):(.*)" $name all one two
1
(bin) 15 % puts $one ; puts $two
Title
zuiop:asddf:asdf:dd

regexp "(\[a-zA-Z0-9\]+):(.*)" $name all one two
1
(bin) 37 % puts $one ; puts $two
Title
zuiop:asddf:asdf:dd
 
You almost had it with your second attempt. The problem was that your pattern [tt][ignore]([^\:].*)[/ignore][/tt] tried to match a single, non-colon character followed by 0 or more of any other characters. You wanted [tt][ignore]([^:]*)[/ignore][/tt] to match 0 or more non-colon characters. Also, as a minor note, the last [tt]$[/tt] is superfluous, as the preceding [tt](.*)[/tt] will already match as many characters as possible, all the way to the end. As is the leading [tt]^[/tt], as a regular expression matches as early in a string as possible. Thus, your streamlined regexp pattern is:

Code:
regexp {([^:]*):(.*)} $str all v1 v2

By the way, some people might suggest using the non-greedy quantifiers, introduced in Tcl 8.1, which match as few characters as possible while still fulfilling the entire match. The only problem is that in Tcl, mixing greedy and non-greedy quantifiers is very tricky and therefore advised against. For example, the following fails, because it turns out the the first non-greedy quantifier ends up making the entire regular expression non-greedy:

[tt]% regexp {(.*?):(.*)} $str all v1 v2
1
% set v1
Title
% set v2
% [/tt]

Using a non-greedy quantifier in this case does turn out to need the [tt]$[/tt] anchor:

[tt]% regexp {(.*?):(.*)$} $str all v1 v2
1
% set v1
Title
% set v2
zuiop:asddf:asdf:dd[/tt]


- Ken Jones, President, ken@avia-training.com
Avia Training and Consulting, 866-TCL-HELP (866-825-4357) US Toll free
415-643-8692 Voice
415-643-8697 Fax
 
Thanks Avia!
Thanks marsd. Your regexp would also work, but I don't know if there any [\;\-\.] and so on could also be present in that string.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top