Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help with RegEX Match issue 2

Status
Not open for further replies.

ljsmith91

Programmer
May 28, 2003
305
0
0
US
Hi,

I am trying to pull only the 1st comment from a field that can have 1 or many comments enclosed within {} curly brackets. So the field can look like this:

$field={10-20-2007 My comment1}{My comment2}{comment3}

or

$field={10-20-2007 My comment1}

and I always only want to grab the 1st contents of the 1st set of {} curly brackets.

I tried variations of this without success:

$field = $1 if ($field =~ /^(\{.+\}).+$/);

it just grabs the entire $field. I need some Regex expertise. Thanks so much.

ljs
 
What does you data structure look like:

$field={10-20-2007 My comment1}{My comment2}{comment3}

OR

{10-20-2007 My comment1}{My comment2}{comment3}

$field =~ /^(\{ --> assumes ur data Structure starts with a { and not $field={
 
max1x,

The structure is in scalar $field...so the structure starts with { or open curly bracket.

ljs
 
Code:
$field = '{10-20-2007 My comment1}{My comment2}{comment3}';
($comment) = $field =~ /^{([^}]+)}/;
print $comment;

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Thanks KevinADC,

Just what I needed. Took me a few to determine what you were doing. Excellent and thanks. -ljs
 
I know this has already been answered, but I think there is a more elegant solution.

The problem is that regular expressions are usually greedy; meaning that they match the biggest possible string. This can, however, be changed by adding a ? after quantifier specifiers (?, +, . or {}) to make it miserly.

For example: -

Code:
$field = '{10-20-2007 My comment1}{My comment2}{comment3}';
($comment) = $field =~ /^({.+?})/;
print $comment;

 
wxb2744,

Excellent info...and routine. Thanks. -ljs
 
Why is it more elegant?




------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
TIMTOWTDI,
Everybody has their own definition of 'elegant'.

wxb's regex is easier to 'scan' on the eyes in this instance; less parentheses and square brackets. Plus you don't have to remember to change the negated character class if the surrounding characters change (say from {} to <>).

Just personal preference.

Having said that, I have a feeling Kevin's explicit negated character class may be faster in execution.

Horses for courses.
 
As far as 'elegance' is concerned, I'm personally prepared to put up with a bit of visual ugliness for a better solution. Have a read of this.
 
If everyone has their own definition of elegance, then the word has no meaning.

This is a perfect example of why TIMTOWTDI is not always helpful. Posting worse ways to do something is counter productive unless the object is to show it's worse. But in this case the poster was implying its better. TIMTOWTDI needs to be replaced with TIMTOGWTDI and TIMTOBWTDI.

G = Good
B = Bad

In this case the "elegant" suggestion falls into the B category. No big deal, but I think it was worth pointing out why it was not more elegant instead of just saying it is more elegant and offering no reasons why.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Personally, I think one of the most important aspects of a script is readability. If it is easier to read then it is easier to understand and therefore easier to maintain.

The .*? approach is easier/simple to understand and still effective and therefore elegant. Opinions may differ, but that is no reason to critize and call it bad.
 
Readability is secondary to function though. If one piece of code looks nicer but doesn't do its job as well as the other, then readability shouldn't really come into consideration. Sure, for short strings of trivial length, there won't be any noticeable performance difference between using a negated character class and the dot-star notation. But if you're trying to parse a sufficiently long and complex piece of text, the advantages of coding this in a slightly less readable style become more pronounced.

In any event, aren't "readability" and "regular expressions" mutually exclusive? ;-)
 
There is a good reason to call it bad, but don't take it personally. The criticism is the method, not you. I hope you read the link ishnid posted if you are unaware of how inefficient that type of matching is. If you know that and prefer to continue using it, thats fine, to each his own. But calling it more elegant is to ignore the objective fact that it is not. But everyone has their own opinion, and it proves that opinions don't matter much, my own included.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
All,

Thanks for the postings and replies. I have certainly learned much and recognize I've still much to learn. You've all pushed that process along. Great information. I DID favor readability over performance, actually gave performance little thought unless it were noticeably poor. However, this discussion has changed the way I will attempt to code regex's going forward. Thanks all.

ljs
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top