Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

RegExp to match several lines

Status
Not open for further replies.

Rydel

Programmer
Feb 5, 2001
376
CZ
I apologize if it's a sort of beginning question, but I couldn't find that in the help file (or figure it out myself). I'd like to match all blackquotes inside an HTML file. E.g. <blockquote>any text</blockquote>. I constructed the following regular expression:
regEx.Pattern = &quot;<blockquote>.*<\/blockquote>&quot;
Unfortunately .* does not include new line character, so only when both tags are on the same line I get what I want. Otherwise - niente. Any gurus who could give a hand to a novice? :)
---
---
 
Hello, Rydel.

I do not think there is a magic pattern to meet the requirement for the case of having a fixed two-side outer strings (here <blockquote> and </blockquote>). Not that I can perceive the moment upon looking at your posted problem.

The script below is the way I would do to resolve this type of problem.

A note in this particular context. The <blockquote> </blockquote> can be nested. The script will not resolve it and it is not desirable to resolve it without a reason.

To help you test out the script. You make a testdoc.htm or whatever and if it is not in the same folder as the script, supply the full path.

Try it out and see how you like it.

regards - tsuji

'--------------------twoside_regexp.vbs-----/tsuji/----------------
Option Explicit

Const tfile = &quot;testdoc.html&quot; '<<<input testing file here

Const sB = &quot;<blockquote>&quot;
Const eB = &quot;<\/blockquote>&quot;

'-------------------------reading text stream------------------
Dim fso,tf, sBase
Set fso = CreateObject(&quot;Scripting.FileSystemObject&quot;)

Set tf = fso_OpenTextFile(tfile,1)
sBase = tf.Readall
WScript.Echo sBase
tf.close

Set tf = Nothing
Set fso = Nothing
'-------------------------reading text stream ended------------

'------------------------regexp operation----------------------
Const attachM = &quot;.*\n.*&quot;
Const midM = &quot;.*&quot;
Const delimiter = &quot;<>&quot;
Const sRpl = &quot;--&quot;

Dim startM, endM, sMTake, MCount, i

startM = sB & midM
endM = midM & eB
sMTake = &quot;&quot;

i = -1
Do While InStr(sBase, sB) <> 0
i=i+1
Call regexp_matches(startM, endM, i, sBase, sMTake)
Loop

sMTake=Left(sMTake,InStrRev(sMTake,delimiter)-1)
sMTake=Split(sMTake,delimiter)
'-----regexp operation ended : Result stored in sMTake array------

'-----display Matched Results only----------------------------------
MCount = UBound(sMTake)+1
For i = 0 To MCount-1
WScript.Echo &quot;Match(&quot; & cstr(i) & &quot;) :- &quot; & vbCrLf & sMTake(i)
Next
'--------------------------------------------------------------------
WScript.Quit

Sub regexp_matches(sM, eM, iter, strBase, strMTake)

Dim Matches, Match, i, strMatch

strMatch = &quot;&quot;
For i = 1 To iter
strMatch = strMatch & attachM
Next
strMatch = sM & strMatch & eM

Dim oBN
Set oBN = New RegExp

With oBN
.Global = True
.IgnoreCase = True
.Pattern = strMatch
End With

Set Matches = oBN.Execute(strBase)

If Matches.Count <>0 Then
For Each Match In Matches
strMTake = strMTake & Match & delimiter
strBase = oBN.Replace(strBase, sRpl)
Next
Set Matches = Nothing
End If

End Sub
'-------------end----twoside_regexp.vbs-----/tsuji/----------------
 
Actually, there is. I got a wonderful reply from one guy on ASP forum. It's a matter of one line. We just both had to read RegExp object description more cerafully all you need is to set Global property to true and &quot;.*&quot; will start matching multuiple lines. Here is the thread:
Anyway, thank you very much for your help!
---
---
 
Hello again.

Thanks for the info. Just one note. It is for the vbscript version 5.5. Unfortunately my box runs on 5.01. Global should be set in any case to scan the whole file, which I did. I don't think 5.01 supports the extended greediness of .*, nor the original BN syntax. I would take a look on that. In the meantime, v5.6 definitive version should be out any time by now or maybe already.

Thanks for the feedback again.

regards - tsuji
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top