Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regular Expression 1

Status
Not open for further replies.

inexperienced1

IS-IT--Management
Aug 9, 2002
49
US
I need some help building a regular expression.

Simply put I need to find a ection of text with specific start text and specific end text, but allowing for repetition in between

Ie

Find START and stop at END making sure they are at the same level.


If the text is

START adslhf START sfdlhadsa END END

I would want
1) START adslhf START sfdlhadsa END END
2) START sjdkhfa END

Not
1) START adslhf START sfdlhadsa END
2) START sjdkhfa END

or

START adslhf START sfdlhadsa END END

I have found, but this does not quite work as it returns all values

(((?<Open>START)[^START|END]*)+((?<Close-Open>END)[^START|END]*?)+)*(?(<Open>)(?!))$


Can anyone help me identify to correct pattern?





 
I don't think this can be resolved in "one" pattern. It should be aided by simple algorithm to finish off the job.

The idea is that the nesting is of finite depth. Hence the procedure would something like this.
[1] Set a maximum depth (can even be reasonably large, say 5,6,7... or even 100 without affecting much of the performance).
[2] Use a loop (no need to hand code) to make up a pattern array list with each entry of form:
[tt]
START.*?END
START.*?START.*?END.*?END
START.*?START.*?START.*?END.*?END.*?END
etc... until the max depth chosen is reached
[/tt]
[3] Use the longest pattern to test downward. Since test (IsMatch()) is more economic to enhance performance.
[4] Most "unreasonably" deep pattern won't match. Looping downward until the first match is found.
[5] You have then found the matches of certain depth in nesting per [4]. After shown and store the matches, you use replace to replace the match to empty string. (This affect the original string, hence, the whole looping process is processing a copy of the original string.) You replace those matches by empty so that you won't pick up the matches again in lesser deep patterns.
[6] Continue the process downward until the last.
[7] Since there is just a finite step, you've thereby found all the substring matching START...END with varying matching pair inside.

I can show you how to put it in action in vbs, if you like, so you can port it to vb net.
 
I would like to take you up on yor offer to see the code please.
 
[8] Since vbs is my favorite rapid-dev tool, this is what I come up with to demo the concept.
[tt]
'givens
b_subnested=false 'true or false; to show (true) or not (false) the outer container of nest
max_nest_level=5

s="START [1] START [2] START [3] START [4] END [3] END [2] END [1] END [0] END | START [21] END" '[0] END is off-balance part

set rx=new regexp
with rx
.ignorecase=true
.global=true
end with

rxu="START.*?END"

dim a_pattern() 'array of pattern geneology
redim a_pattern(max_nest_level)

spattern=rxu
a_pattern(ubound(a_pattern))=spattern
for i=ubound(a_pattern)-1 to 0 step -1
spattern="START.*?" & spattern & ".*?END"
a_pattern(i)=spattern
next

sproc=s 'make a copy of s and let s intact
dim a() 'store the matches
redim a(-1)
for i= 0 to ubound(a_pattern)
rx.pattern=a_pattern(i)
if rx.test(sproc) then
set cm=rx.execute(sproc)
for each m in cm
redim preserve a(ubound(a)+1)
a(ubound(a))=m

if b_subnested then
m_temp=m
for j=i+1 to ubound(a_pattern)
rx.pattern=a_pattern(j)
'rx.test is not necessary, it must contain matches
m_temp=right(m_temp,len(m_temp)-instr(1,m_temp,"START",1)-len("START")+1)
m_temp=left(m_temp,instrrev(m_temp,"END",-1,1)-1)
set cmm=rx.execute(m_temp)
for each mm in cmm
redim preserve a(ubound(a)+1)
a(ubound(a))=mm
next
set cmm=nothing
next
end if
next
rx.pattern=a_pattern(i) 'restore to the pattern at the start
sproc=rx.replace(sproc,"") 'eliminate matched part
end if
next

wscript.echo "string: " & vbcrlf & s & vbcrlf & vbcrlf & _
"b_subnested: " & b_subnested & vbcrlf & vbcrlf & "matches:" & vbcrlf & join(a,vbcrlf)
[/tt]
[9] The approach is limited to russian-doll nesting. String pattern is a complicated free-group object, its complexity defies simplistic expression, unfortunately.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top