Help with RegEX Match issue 2

ljsmith91 · Feb 29, 2008

Hi,

I am trying to pull only the 1st comment from a field that can have 1 or many comments enclosed within {} curly brackets. So the field can look like this:

$field={10-20-2007 My comment1}{My comment2}{comment3}

or

$field={10-20-2007 My comment1}

and I always only want to grab the 1st contents of the 1st set of {} curly brackets.

I tried variations of this without success:

$field = $1 if ($field =~ /^(\{.+\}).+$/);

it just grabs the entire $field. I need some Regex expertise. Thanks so much.

ljs

max1x · Feb 29, 2008

What does you data structure look like:

$field={10-20-2007 My comment1}{My comment2}{comment3}

OR

{10-20-2007 My comment1}{My comment2}{comment3}

$field =~ /^(\{ --> assumes ur data Structure starts with a { and not $field={

ljsmith91 · Feb 29, 2008

max1x,

The structure is in scalar $field...so the structure starts with { or open curly bracket.

ljs

KevinADC · Feb 29, 2008

Code:

$field = '{10-20-2007 My comment1}{My comment2}{comment3}';
($comment) = $field =~ /^{([^}]+)}/;
print $comment;

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

ljsmith91 · Feb 29, 2008

Thanks KevinADC,

Just what I needed. Took me a few to determine what you were doing. Excellent and thanks. -ljs

wxb2744 · Feb 29, 2008

I know this has already been answered, but I think there is a more elegant solution.

The problem is that regular expressions are usually greedy; meaning that they match the biggest possible string. This can, however, be changed by adding a ? after quantifier specifiers (?, +, . or {}) to make it miserly.

For example: -

Code:

$field = '{10-20-2007 My comment1}{My comment2}{comment3}';
($comment) = $field =~ /^({.+?})/;
print $comment;

ljsmith91 · Feb 29, 2008

wxb2744,

Excellent info...and routine. Thanks. -ljs

KevinADC · Feb 29, 2008

Why is it more elegant?

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

brigmar · Mar 1, 2008

TIMTOWTDI,
Everybody has their own definition of 'elegant'.

wxb's regex is easier to 'scan' on the eyes in this instance; less parentheses and square brackets. Plus you don't have to remember to change the negated character class if the surrounding characters change (say from {} to <>).

Just personal preference.

Having said that, I have a feeling Kevin's explicit negated character class may be faster in execution.

Horses for courses.

ishnid · Mar 2, 2008

As far as 'elegance' is concerned, I'm personally prepared to put up with a bit of visual ugliness for a better solution. Have a read of this.

KevinADC · Mar 2, 2008

If everyone has their own definition of elegance, then the word has no meaning.

This is a perfect example of why TIMTOWTDI is not always helpful. Posting worse ways to do something is counter productive unless the object is to show it's worse. But in this case the poster was implying its better. TIMTOWTDI needs to be replaced with TIMTOGWTDI and TIMTOBWTDI.

G = Good
B = Bad

In this case the "elegant" suggestion falls into the B category. No big deal, but I think it was worth pointing out why it was not more elegant instead of just saying it is more elegant and offering no reasons why.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

wxb2744 · Mar 2, 2008

Personally, I think one of the most important aspects of a script is readability. If it is easier to read then it is easier to understand and therefore easier to maintain.

The .*? approach is easier/simple to understand and still effective and therefore elegant. Opinions may differ, but that is no reason to critize and call it bad.

ishnid · Mar 2, 2008

Readability is secondary to function though. If one piece of code looks nicer but doesn't do its job as well as the other, then readability shouldn't really come into consideration. Sure, for short strings of trivial length, there won't be any noticeable performance difference between using a negated character class and the dot-star notation. But if you're trying to parse a sufficiently long and complex piece of text, the advantages of coding this in a slightly less readable style become more pronounced.

In any event, aren't "readability" and "regular expressions" mutually exclusive? ;-)

KevinADC · Mar 2, 2008

There is a good reason to call it bad, but don't take it personally. The criticism is the method, not you. I hope you read the link ishnid posted if you are unaware of how inefficient that type of matching is. If you know that and prefer to continue using it, thats fine, to each his own. But calling it more elegant is to ignore the objective fact that it is not. But everyone has their own opinion, and it proves that opinions don't matter much, my own included.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

ljsmith91 · Mar 3, 2008

All,

Thanks for the postings and replies. I have certainly learned much and recognize I've still much to learn. You've all pushed that process along. Great information. I DID favor readability over performance, actually gave performance little thought unless it were noticeably poor. However, this discussion has changed the way I will attempt to code regex's going forward. Thanks all.

ljs

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Help with RegEX Match issue 2

ljsmith91

Programmer

max1x

Programmer

ljsmith91

Programmer

KevinADC

Technical User

ljsmith91

Programmer

wxb2744

Technical User

ljsmith91

Programmer

KevinADC

Technical User

brigmar

Programmer

ishnid

Programmer

KevinADC

Technical User

wxb2744

Technical User

ishnid

Programmer

KevinADC

Technical User

ljsmith91

Programmer

Similar threads

Part and Inventory Search

Sponsor