Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

A possible bug in escape()? 2

Status
Not open for further replies.

lcs01

Programmer
Aug 2, 2006
182
US
First of all, I don't know if someone has asked this question before. The search functionality is not available for sometime now.

It seems to that escape() can not handle extended fonts correctly. For example,

Code:
var str1 = 'faßt';
var str2 = escape(str1);

The value of 'str2' becomes 'fa%DFt', which means javascript treats 'ß' as a 1-byte character, which is actually a 2-byte character.

I also tested it using perl CGI.pm:

Code:
use CGI;

my $str1 = 'faßt';
my $str2 = CGI::escape($str1);

The value of '$str2' is 'fa%C3%9Ft', which indicates perl treats 'ß' as a 2-byte character. And I think this is right!!

What I need to do is this:

1) Take a user input from a web interface;
2) Massage it first using javascript including escape();
3) Then pass it as a javascript variable to perl;

Because javascript..escape corrupts extended fonts, perl does not know how to unescape it.

Could someone here tell me how to solve this problem? for instance, in javascript, can we test if a string contains extended fonts?

Many thank!
 
To Jeff,

Thank you for your reply. And no, I can confirm I have set the character set of the page. Actually, I have no way to set the character set of the page. A user input can be anything, such as German, Greek, Chinese, etc.
 
Thank you, Dan. Your results are exciting. But, it does not escape/encode single quots. At our site, we must escape qingle quots from user inputs.

Thank you again.
 
To Dan,

It would not work using both, for escape would corrupt extended fonts. Currently, we are doing the way you suggested - "search and replace single quotes with some other character". The problem is that how many "other" characters are there?

The best way would be like this: if a user input contains any extended fonts, then apply encodeURI to it; else escape it.

But how to test whether a string from users input contians those extended fonts or not?
 
Dan, the only problem is that the % symbols will fall prey to the escape fuction, which means either way you'll be required to go back thru and replace all the %25s with % signs.

Interestingly, when running with utf-8 charset the escape function provides a 5 character code to signify the ß character. Additionally, when running in ISO-8859-1 charset, the function has no problem escaping the character and then restoring the character on an unescape, but the character is lost to unescape using a utf-8 charset:

Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "[URL unfurl="true"]http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">[/URL]
<html xmlns="[URL unfurl="true"]http://www.w3.org/1999/xhtml">[/URL]
<head>
<title>title test</title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
<script type="text/javascript">
alert("escape: " + escape('faßt'));
alert("unescape: " + unescape(escape('faßt')));
alert("encodeURI: " + encodeURI('faßt'));
alert("both: " + escape(encodeURI('faßt')));
</script>
<style type="text/css"></style>
</head>
<body>
</body>
</html>

Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "[URL unfurl="true"]http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">[/URL]
<html xmlns="[URL unfurl="true"]http://www.w3.org/1999/xhtml">[/URL]
<head>
<title>title test</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<script type="text/javascript">
alert("escape: " + escape('faßt'));
alert("unescape: " + unescape(escape('faßt')));
alert("encodeURI: " + encodeURI('faßt'));
alert("both: " + escape(encodeURI('faßt')));
</script>
<style type="text/css"></style>
</head>
<body>
</body>
</html>

-kaht

Looking for a puppy? [small](Silky Terriers are hypoallergenic dogs that make great indoor pets due to their lack of shedding and small size)[/small]
uncle_rico_thumb.jpg
 
the % symbols will fall prey to the escape fuction, which means either way you'll be required to go back thru and replace all the %25s with % signs

True - although I had assumed that the OP would realise this, and run two relevant decoding functions from the Perl end... but I guess it makes no difference whether both encodeURI and escape are used, or just encodeURI, and then the ' characters replaced with some other arbitrary character.

Interesting to see how the UTF-8 and ISO-8859-1 charsets behave differently, though. If anything, I would have expected the results to be the other way around. Curious!



Coedit Limited - Delivering standards compliant, accessible web solutions

[tt]Dan's Page [blue]@[/blue] Code Couch
[/tt]
 
Interesting to see how the UTF-8 and ISO-8859-1 charsets behave differently, though. If anything, I would have expected the results to be the other way around.

Same here!

-kaht

Looking for a puppy? [small](Silky Terriers are hypoallergenic dogs that make great indoor pets due to their lack of shedding and small size)[/small]
uncle_rico_thumb.jpg
 
Thank you, Dan & kaht!

This works very well for me:

By using this -- escape(encodeURI('faßt')), both single quots and entended fonts are processed nicely w/o setting the char-set.

Thank you!

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top