Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Advanced (RegExp) 1

Status
Not open for further replies.

Rydel

Programmer
Feb 5, 2001
376
0
0
CZ
I have a large XML file, and I would like to match all the content in between the <event>...</event> (usually the content spans several lines). Somehow RegExp does not want to work.


So, for testing/debugging purpose I use this function:

Code:
Function RegExpTest(patrn, strng)
   Dim regEx, Match, Matches   ' Create variable.
   Set regEx = New RegExp   ' Create regular expression.
   regEx.Pattern = patrn   ' Set pattern.
   regEx.IgnoreCase = True   ' Set case insensitivity.
   regEx.Global = True   ' Set global applicability.
   Set Matches = regEx.Execute(strng)   ' Execute search.
   For Each Match in Matches   ' Iterate Matches collection.
      RetStr = RetStr & &quot;Match &quot; & I & &quot; found at position &quot;
      RetStr = RetStr & Match.FirstIndex & &quot;. Match Value is &quot;'
      RetStr = RetStr & Match.Value & &quot;'.&quot; & vbCRLF
   Next
   RegExpTest = RetStr
End Function

If I pass&quot;<event>&quot; as a pattern, it finds the &quot;<event>&quot; occurences correctly.

Similarly, if I pass &quot;</event>&quot; as a pattern, it finds all the &quot;</event>&quot; occurences correctly as well.

If I pass &quot;<event>.*</event>&quot; then it finds only ONE-LINERS.

If I pass &quot;<event>[.\n]*</event>&quot; then it does NOT find anything.

Your help is greatly appreciated.






regards,
rydel n23
 
how about
<event>(.|\n)+?</event>

_________________________________________________________
[sub]$str = &quot;sleep is good for you. sleep gives you the energy you need to function&quot;;
$Nstr = ereg_replace(&quot;sleep&quot;,&quot;coffee&quot;,$str); echo $Nstr;[/sub]
onpnt2.gif
[sup] [/sub]
 
or
<event>(.|\n)*</event>
may be more reliable


_________________________________________________________
[sub]$str = &quot;sleep is good for you. sleep gives you the energy you need to function&quot;;
$Nstr = ereg_replace(&quot;sleep&quot;,&quot;coffee&quot;,$str); echo $Nstr;[/sub]
onpnt2.gif
[sup] [/sub]
 
Thank you very much (star is a must :))! I really have big trouble with RegExp syntax. So, this is a major step forward. There is still a slight problem. It matches the longest <event>...</event> span. E.g. if there is a file with

<something>123</something>
<event>first event</event>
<other_tag>1.01</other_tag>
<event>second</event>

Then it matches the whole thing:

<event>first event</event>
<other_tag>1.01</other_tag>
<event>second</event>

While I'd like to have to separate matches in the Match collection. Is that doable?

I vaguely remember from school that it was something to do with &quot;greedy&quot; or &quot;non-greedy&quot; matching, but how to control it in VB or VBScript I have no clue... :(

I've tried switching regEx.Global = False to True, but it does not seem to matter.




regards,
rydel n23
 
whoops! That was actually kind of stupid of me
<event>(.|\n)*?</event>

the addition of ? matches the next </event> and will not jump to the following (or last)

just to touble check I ran your code on this var
str = &quot;<event>event 1</event> <event>event 2</event> <event>event 3</event> <event>event 4</event>&quot;
str = str & &quot;<event>event 5</event> <other_tag>1.01</other_tag> <event>event 6</event>&quot;

and got

Match found at position 0. Match Value is event 1
Match found at position 23. Match Value is event 2
Match found at position 46. Match Value is event 3
Match found at position 69. Match Value is event 4
Match found at position 91. Match Value is event 5
Match found at position 142. Match Value is event 6

seems to be running fine but if you have further problems let us know

_________________________________________________________
[sub]$str = &quot;sleep is good for you. sleep gives you the energy you need to function&quot;;
$Nstr = ereg_replace(&quot;sleep&quot;,&quot;coffee&quot;,$str); echo $Nstr;[/sub]
onpnt2.gif
[sup] [/sub]
 
you know you could you XMLDOM object & xpath to get these nodes.
 
Could you set regEx.Multiline = true and just use &quot;<event>.*?</event>&quot; ? At least in Perl, the multiline switch /s allows the . to match newlines, too, which isn't normal behavior.

----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top