Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

VBScript Reg Exp help 1

Status
Not open for further replies.

JimFL

Programmer
Jun 17, 2005
131
GB
Hi,

I am trying to generate a simple search function that reads in the text file (ie html or asp) and then strips out the page content using the function below using vbscript.

<%

Function clearAllTags(s)
Dim re
Set re = New RegExp
re.Pattern = "(<[^>]*>)"
re.Global = True
re.IgnoreCase = True
clearAllTags = re.Replace(s, "")
End Function

%>
However it seems to work fine for some instances but not all pages. It doesnt seem to remove the javascript and fails to remove some of the other asp and html for certain pages.
Does anybody have another solution or know how to adapt the code to remove html,javascript and asp?

Can anybody help?

 
I apologise I think the above function did work I just copied the wrong pattern in.

For a working solution I dont seem to need the bottom 3 reg exp. Thanks for all your help tsuji.

function striptags_asp(s)
dim rx, stmp
stmp=s 'moved here
set rx=new regexp
with rx
.pattern="<" & "%(.|\n)*?%" & ">"
.global=true
end with
stmp=rx.replace(stmp,"")
with rx
.pattern="<script(.|\n)*?script>"
.global=true
.ignorecase=true
end with
stmp=rx.replace(stmp,"")
with rx
.pattern="<(.|\n)*?>"
end with
stmp=rx.replace(stmp,"")
'the rest is just cleaning up
'with rx
' .pattern="^\s*(.*?)\s*$"
' .global=true
'end with
'stmp=rx.replace(stmp,"$1")
'with rx
' .pattern="^\s*$"
' .global=true
' .multiline=true
'end with
'stmp=rx.replace(stmp,"")
striptags_asp=stmp
end function

 
[1] I just test it in an asp page with the simulation of html+asp string (always avoiding the direct combination of <% and %> by separating it as concat as ...<" & "%... and ...%" & ">..." as Tarwn noted.
[2] The avoidance of collision in the pattern can be in alternative form though using backslash.
[tt].pattern="\<\%(.|\n)*?\%\>"[/tt]
or separated into string concatination as Tarwn noted.

By the above, it works and as you reported too. The part I noted for formatting in stdout or standalone application. In browser, those irrelevant whitespaces won't show. Hence, you can surely comment them out as what you've done. Glad come to a good starting point at least for the moment.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top