Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regular Expressions, and Back Reference Matching...

Status
Not open for further replies.

PogoWolf

Programmer
Mar 2, 2001
351
US
Afternoon all,
I've got this expression:
\[(?<Date>\S+)\s{1}(?<Time>\S+)\]\s{1}(?<ToonName>.*)\s{1}\-\>\s{1}(?<MobName>.*) : (?<Ability>.*)\n

with this sample data:
01) [01/23/05 00:04:00] Foxfire -> Platinum Boulder Golem
02) [01/23/05 00:04:02] Foxfire -> Platinum Boulder Golem
03) [01/23/05 00:04:04] Foxfire -> Platinum Boulder Golem
04) [01/23/05 00:04:06] Foxfire -> Platinum Boulder Golem
05) [01/23/05 00:04:48] Foxfire -> Foxfire : Gift of Speed III [ Casting... ]
06) [01/23/05 00:04:51] Foxfire -> Foxfire : Gift of Speed III
07) [01/23/05 00:04:55] Foxfire -> Foxfire : Gift of Health III [ Casting... ]
08) [01/23/05 00:04:58] Foxfire -> Foxfire : Gift of Health III
09) [01/23/05 00:12:07] Foxfire -> Foxfire : Multicast
10) [01/23/05 00:13:47] Foxfire -> Foxfire : Recall [ Casting... ]
11) [01/23/05 00:14:00] Foxfire -> Foxfire : Recall
12) [01/23/05 00:18:17] Foxfire -> Foxfire : Recall [ Casting... ]

Problem is the expression is matching line 5 though 11.
what I'm needing the expression to do is NOT match any line with the "[ casting... ]" in it.

I know this can be done with a back reference.. but I can't seem to find any information
on the NET that can explane HOW to use a back reference (in a way that I can understand..hehe)

I'm wondering then if anyone can help with this expession and more to the point either
explain how the back reference works, or point me to some good resources?

The PogoWolf
 
Hope this will help, but here goes.

I used the block of text you gave me as my test string and the following regular expression:

/(^(\d\d\))+\s+(\[([^\]])*\])+\s+([^\n\f\r\[]*)$)/gim

Here were the results (there were 8):

Match 1
Group 1: "01) [01/23/05 00:04:00] Foxfire -> Platinum Boulder Golem"
Group 2: "01)"
Group 3: "[01/23/05 00:04:00]"
Group 4: "0"
Group 5: "Foxfire -> Platinum Boulder Golem"
Match 2
Group 1: "02) [01/23/05 00:04:02] Foxfire -> Platinum Boulder Golem"
Group 2: "02)"
Group 3: "[01/23/05 00:04:02]"
Group 4: "2"
Group 5: "Foxfire -> Platinum Boulder Golem"
Match 3
Group 1: "03) [01/23/05 00:04:04] Foxfire -> Platinum Boulder Golem"
Group 2: "03)"
Group 3: "[01/23/05 00:04:04]"
Group 4: "4"
Group 5: "Foxfire -> Platinum Boulder Golem"
Match 4
Group 1: "04) [01/23/05 00:04:06] Foxfire -> Platinum Boulder Golem"
Group 2: "04)"
Group 3: "[01/23/05 00:04:06]"
Group 4: "6"
Group 5: "Foxfire -> Platinum Boulder Golem"
Match 5
Group 1: "06) [01/23/05 00:04:51] Foxfire -> Foxfire : Gift of Speed III "
Group 2: "06)"
Group 3: "[01/23/05 00:04:51]"
Group 4: "1"
Group 5: "Foxfire -> Foxfire : Gift of Speed III "
Match 6
Group 1: "08) [01/23/05 00:04:58] Foxfire -> Foxfire : Gift of Health III "
Group 2: "08)"
Group 3: "[01/23/05 00:04:58]"
Group 4: "8"
Group 5: "Foxfire -> Foxfire : Gift of Health III "
Match 7
Group 1: "09) [01/23/05 00:12:07] Foxfire -> Foxfire : Multicast "
Group 2: "09)"
Group 3: "[01/23/05 00:12:07]"
Group 4: "7"
Group 5: "Foxfire -> Foxfire : Multicast "
Match 8
Group 1: "11) [01/23/05 00:14:00] Foxfire -> Foxfire : Recall "
Group 2: "11)"
Group 3: "[01/23/05 00:14:00]"
Group 4: "0"
Group 5: "Foxfire -> Foxfire : Recall
 
Thanks Borvik. it's close to what I'm needing..

01) [01/23/05 00:04:00] Foxfire -> Platinum Boulder Golem
02) [01/23/05 00:04:02] Foxfire -> Platinum Boulder Golem
03) [01/23/05 00:04:04] Foxfire -> Platinum Boulder Golem
04) [01/23/05 00:04:06] Foxfire -> Platinum Boulder Golem
05) [01/23/05 00:04:48] Foxfire -> Foxfire : Gift of Speed III [ Casting... ]
06) [01/23/05 00:04:51] Foxfire -> Foxfire : Gift of Speed III
07) [01/23/05 00:04:55] Foxfire -> Foxfire : Gift of Health III [ Casting... ]
08) [01/23/05 00:04:58] Foxfire -> Foxfire : Gift of Health III
09) [01/23/05 00:12:07] Foxfire -> Foxfire : Multicast
10) [01/23/05 00:13:47] Foxfire -> Foxfire : Recall [ Casting... ]
11) [01/23/05 00:14:00] Foxfire -> Foxfire : Recall
12) [01/23/05 00:18:17] Foxfire -> Foxfire : Recall [ Casting... ]

The expression would ONLY match lines: 6,8,9,and 11
but return the Date, Time, Toon, Mob, Ability.

and Example of THAT break down is like this:
[01/23/05 00:14:00] Foxfire -> Foxfire : Recall
'01/23/05' = Date
'00:14:00' = Time
'Foxfire' = Toon
'Foxfire' = Mob
'Recall' = Ability




The PogoWolf
 
Ok then we'll add the requirement of the : between Mob and Ability like so:

/(^(\d\d\))+\s+(\[([^\]])*\])+\s+([^\n\f\r\[]*[\:][^\n\f\r\[]*)$)/gim

This produced the result (4):
Match 1
Group 1: "06) [01/23/05 00:04:51] Foxfire -> Foxfire : Gift of Speed III "
Group 2: "06)"
Group 3: "[01/23/05 00:04:51]"
Group 4: "1"
Group 5: "Foxfire -> Foxfire : Gift of Speed III "
Match 2
Group 1: "08) [01/23/05 00:04:58] Foxfire -> Foxfire : Gift of Health III "
Group 2: "08)"
Group 3: "[01/23/05 00:04:58]"
Group 4: "8"
Group 5: "Foxfire -> Foxfire : Gift of Health III "
Match 3
Group 1: "09) [01/23/05 00:12:07] Foxfire -> Foxfire : Multicast "
Group 2: "09)"
Group 3: "[01/23/05 00:12:07]"
Group 4: "7"
Group 5: "Foxfire -> Foxfire : Multicast "
Match 4
Group 1: "11) [01/23/05 00:14:00] Foxfire -> Foxfire : Recall "
Group 2: "11)"
Group 3: "[01/23/05 00:14:00]"
Group 4: "0"
Group 5: "Foxfire -> Foxfire : Recall "

That should do it - from here it should be a matter of parsing.
 
oops....

I used my PHP regex builder - here is one I built using a dotnet builder:

\d\d\)\s\[([^\s]*)\s([^\]]*)\]\s([^(->)]*)\-\>\s([^\:]*)\:\s([^(\[)]*)\n

This yields the following results:
Match 1: 06) [01/23/05 00:04:51] Foxfire -> Foxfire : Gift of Speed III

Group 1: 06) [01/23/05 00:04:51] Foxfire -> Foxfire : Gift of Speed III

Group 2: 01/23/05
Group 3: 00:04:51
Group 4: Foxfire
Group 5: Foxfire
Group 6: Gift of Speed III

Match 2: 08) [01/23/05 00:04:58] Foxfire -> Foxfire : Gift of Health III

Group 1: 08) [01/23/05 00:04:58] Foxfire -> Foxfire : Gift of Health III

Group 2: 01/23/05
Group 3: 00:04:58
Group 4: Foxfire
Group 5: Foxfire
Group 6: Gift of Health III

Match 3: 09) [01/23/05 00:12:07] Foxfire -> Foxfire : Multicast

Group 1: 09) [01/23/05 00:12:07] Foxfire -> Foxfire : Multicast

Group 2: 01/23/05
Group 3: 00:12:07
Group 4: Foxfire
Group 5: Foxfire
Group 6: Multicast

Match 4: 11) [01/23/05 00:14:00] Foxfire -> Foxfire : Recall

Group 1: 11) [01/23/05 00:14:00] Foxfire -> Foxfire : Recall

Group 2: 01/23/05
Group 3: 00:14:00
Group 4: Foxfire
Group 5: Foxfire
Group 6: Recall

Here is a link to the DotNet builder:
That should work for ya.
 
Well looks like that's perfect.. =)

What program did you use? and if at all possible, could you explain the expression to me? I look at mine, and then yours and realize, I've still got a LOT to learn about Reg. Exp.'s.


The PogoWolf
 
I didn't really use a program.

I found a website that helps me build Regex - the one above is for .NET. You just enter your regex string, your test string and hit the test button.

Now for the explaining...(still learning too - that was just playing around lol):

Let's break this down into parts.

First - putting paratheses around parts of the expression. The parts that are matched inside the parantheses are grouped off (hence the group 2-6) Group 1 will always be the entire returned expression. Those can be accessed via various methods:
Code:
Sub Main() 
    'Look For Match 
    Dim m1 as Match = Regex.Match("Our phone number is 508-888-8888.", "\d\d\d-\d\d\d-\d\d\d\d") 

    'Output match information if a match was found 
    if (m1.Success) then 
        Console.WriteLine("The value '{0}' was found at index {1}, and is {2} characters long.", m1.Value, m1.Index, m1.Length) 
    end if 
End Sub
In this instance m1.Value contains the match - I'm not sure but this would most likely be an array some sort with the grouped matches.

Now let's break down the actual expression.

First:
\d\d\)\s\[

This is saying it needs to start with 2 digits (\d\d) followed by a end paranthese (\)) and then a white space (\s) and an open bracket (\[).

\d stands for digit
\) and \[ are escaping literals
\s is for all white spaces

Second:
([^\s]*)

Our first group. You notice the white space. The brackets [] group the white space as a character, with the ^. The carat signifies NO. It means no white space. The * after the bracket means to match all no white spaces. However this will stop at the first white space it finds - the space between the date and time.

Third:
\s

Ok, include the white space between the date and time.

Fourth:
([^\]]*)

Can you figure that out? It's keep going until you hit a ]. Just like the syntax of the other one.

Fifth:
\]\s

Include both the literal ] and a white space.

Sixth:
([^(->)]*)

Okay, here I was just playing around with it. My guess would be, (as the -> is grouped by the ()) keep going until you hit the ->, however I could be wrong.

Seventh:
\-\>\s

Include the literals - and > and a white space.

Eight:
([^\:]*)

Should be catching on - another literal.

Ninth:
\:\s

Adding those in.

Tenth:
([^(\[)]*)

This I probably could have left as ([^\[]*) which is another go until literal character.

Last:
\n

Include a line break. By placing this after the upto bracket of part 10, we're saying that it must be after that point. As on those casting lines the line break doesn't come until after a ] - it will skip that.

Whew! Hope that helps you understand it.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top