Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

regular expression to strip <script> sections 1

Status
Not open for further replies.

anorakgirl

Programmer
Jun 5, 2001
103
GB
hi,
i'm trying to write an asp driven search engine, and to get the keywords for the page, i'm uing regular expressions to strip out html tags etc.

i'm using the following to strip out html tags:
Code:
var reg = /<[^>]*>/ig;
strKeywords = strKeywords.replace(reg,&quot;&quot;);
which works fine but obviously this leaves the bits between the html tags, which means if there is a
Code:
<script language=&quot;javascript&quot;>
blah blah
</script>
section in the page, the blah blah gets left in.
i tried doing something like this before stripping the html tags:
Code:
reg = /<script[^>]*>[.|\s|\n]*<\/script>/ig;
strKeywords = strKeywords.replace(reg,&quot;&quot;);
but it doesn't seem to work. i'm guessing its something to do with carriage returns or < characters or something within the javascript (ie the blah blah) but i'm not a regular expression expert - anyone give me a clue why this isn't working?

thank you! ~ ~
 
What is this pearl syntax? Regards gsc1ugs
&quot;Cant see wood for tree's...!&quot;
 
Here you go:


:)
paul
penny1.gif
penny1.gif
 
thanks for the tip link9 - unfortunately that article is about stripping html tags, which i've already done. that method basically replaces anything between < and > with an empty string. but because there are javascript sections in the web pages i'm working on, the function would only strip the <script language=&quot;javascript&quot;> and </script> tags and leave the javascript in the middle, which isn't what i want to do. anyone else tried this??
cheers anyway! ~ ~
 
Modify to delete from < to > Regards gsc1ugs
&quot;Cant see wood for tree's...!&quot;
 
i'm already deleting between < and > which doesn't get the javascript because its not inside the tag. i seem to have cracked it though - this works:
Code:
reg = /<script[^>]*>[\s\S]*<\/script>/ig;
strKeywords = strKeywords.replace(reg,&quot;&quot;);
cheers for help anyway ~ ~
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top