Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

regsub and html entity 1

Status
Not open for further replies.

cubicle4

Programmer
Jun 18, 2001
18
US
Hi all,

I am trying to replace html characters with their encoded entities. More specifically I am trying to replace " and & with their encoded entity " and &.

The problem is that the character & is a special sequence and is replaced with the string that matches the pattern (i.e. replacing a " will return this encoding "quot; instead of "). I would like for the proc to return the proper encoding of the character and unfortunately this has proven quite difficult. Any help in this subject matter or perhaps another way of doing this would be great, TIA.

Here is my test code:
[
set MyContent1 {this is "1" & 2}

proc HTML_ENCODE {inString} {
regsub -all {&} $inString {&} outString
regsub -all {"} $outString {"} outString
# since the ampersand will be encoded incorrectly
#we need to regsub it again, *ungh*
regsub -all {amp;amp;} $outString {amp;} outString
return $outString
}

proc HTML_DECODE {inString} {
regsub -all {&} $inString {&} outString
regsub -all {"} $outString {"} outString
return $outString
}

]
Encode:
[set MyEncode1 [HTML_ENCODE $MyContent1]]
<br>

Decode:
[set MyContent1 [HTML_DECODE $MyEncode1]]

<script language=javascript>
alert ('[HTML_ENCODE $MyContent1]');
</script>

 
To prevent the &quot;&&quot; in your regsub substitution value from being replaced with the match string, simply escape it with the backslash (&quot;\&&quot; becomes a literal &quot;&&quot; in the replacement text.

The following snippet of code is from the html package in Tcllib 0.80. You might want to change the name of the procedure if you decide to implement it yourself instead of use the entire package (note the the procedure is defined within the &quot;html&quot; namespace):

Code:
proc html::quoteFormValue {value} {
    regsub -all {&} $value {\&amp;} value
    regsub -all {&quot;} $value {\&#34;} value
    regsub -all {'} $value {\&#39;} value
    regsub -all {<} $value {\&lt;} value
    regsub -all {>} $value {\&gt;} value
    return $value
}

By the way, you might ask, &quot;What is Tcllib?&quot; Well, the official description is:

&quot;Tcllib is a collection of utility modules for Tcl. These modules provide a wide variety of functionality, from implementations of standard data structures to implementations of common networking protocols. The intent is to collect commonly used function into a single library, which users can rely on to be available and stable.&quot;

Tcllib now is distributed standard with Tcl. But you can download the most recent version from the Tcl Developer Xchange site:
Oh, another way to solve the substitution problem is with the string map command, which was introduced in Tcl 8.1.1. You give string map a &quot;character map&quot; list of key-value pairs and the string to process. Each instance of a &quot;key&quot; in the input string is replaced by the corresponding &quot;value&quot;. So, we could handle your substitution problem from above with the following:

Code:
set htmlEncodings {
    &   &amp;
    {&quot;} &#34;
    '   &#39;
    <   &lt;
    >   &gt;
}
set value [string map $htmlEncodings $value]
- Ken Jones, President
Avia Training and Consulting
866-TCL-HELP (866-825-4357) US Toll free
415-643-8692 Voice
415-643-8697 Fax
 
Ken,

Thx =P Fortunately, that is the conclusion we came to about escaping the character. However, the I was unaware of the string map command. Will have to give that one a spin. Anyhow, thanks for the great information related to this problem.

Harold
 
*sigh* I just looked at the way my post turned out on this message board. You wouldn't believe the effort I went through to try to ensure that the various escape characters appeared correctly when posted. I went through several iterations with the &quot;Preview Post&quot; button, adding more and more HTML escapes and character references. Only to have the final result munged.

Anyway, in the examples shown in my post, the replacement text was supposed to contain the HTML escapse sequences/reference for the special characters (for example, replacing the &quot;&&quot; with an &quot;&&quot;, &quot;amp&quot;, and &quot;;&quot; all concatenated together).

Hope everyone got the general idea... - Ken Jones, President
Avia Training and Consulting
866-TCL-HELP (866-825-4357) US Toll free
415-643-8692 Voice
415-643-8697 Fax
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top