Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

strip html entities from a string

Status
Not open for further replies.

davemarsh

Technical User
Feb 11, 2003
12
IN
hi - i have a function that strips html entities such as '	' etc:

Code:
option explicit
dim tmp
tmp=request.querystring("s")
response.write strip(tmp)

function strip(tmp)
dim x,i
set x=new regexp
x.global=true
x.ignorecase=true
x.pattern="\&([#a-zA-Z0-9]+);"
tmp=x.replace(tmp,"")
set x=nothing
strip=tmp
end function

..but is there a way to convert the entities back to normal chars? eg: 'a' would be coverted to 'a'

cheers - dave
 
Rather than strip, replace....

myStr = " This is a "Test"©"

myStr = replace(myStr, " ", " ")
myStr = replace(myStr, "©", "©")


What would probably be best would be to put all of the codes/decodes into a 2-dimensional array and loop through them...

Programming today is a race between software engineers striving to build better and bigger idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning. - Rick Cook
 
Why not just find each one and replace it with it's character using the Chr function? You could create a regular expression, loop through everything that matches that expression replacing with chr(Mid(match,3,len(match)-3,)). Basically the ascii number is everything starting from the 3rd character for a length of the string minus the &#;

-Tarwn





01010100 01101001 01100101 01110010 01101110 01101111 01101011 00101110 01100011 01101111 01101101
29 3K 10 3D 3L 3J 3K 10 32 35 10 3E 39 33 35 10 3K 3F 10 38 31 3M 35 10 36 3I 35 35 10 3K 39 3D 35 10 1Q 19
Do you know how hot your computer is running at home? I do
 
DOH! Mine didn't post! I should've known....

myStr = "& nbsp ;This is a & quot ;Test& quot ;& copy ;"

myStr = replace(myStr, "& nbsp ;", " ")
myStr = replace(myStr, "& quot ;", """)
myStr = replace(myStr, "& copy ;", "©")

Programming today is a race between software engineers striving to build better and bigger idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning. - Rick Cook
 
ok - thanks for the help guys - this is what i came up with

function test(tmp)
dim ent,i,j
set ent=new regexp
ent.global=true
ent.ignorecase=true
ent.pattern="\&([#a-zA-Z0-9]+);"
set i=ent.execute(tmp)
if i.count=0 then
test=tmp
exit function
else
for each j in i
tmp=replace(tmp,j,chr(Mid(j,3,len(j)-3)))
next
end if
test=tmp
end function
 
Nice. Except that I would remove the letters from your regular expression, I don't think things such as & will work with that, but they will get returned as a match.

-Tarwn

01010100 01101001 01100101 01110010 01101110 01101111 01101011 00101110 01100011 01101111 01101101
29 3K 10 3D 3L 3J 3K 10 32 35 10 3E 39 33 35 10 3K 3F 10 38 31 3M 35 10 36 3I 35 35 10 3K 39 3D 35 10 1Q 19
Do you know how hot your computer is running at home? I do
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top