A possible bug in escape()? 2

lcs01 · Apr 3, 2007

First of all, I don't know if someone has asked this question before. The search functionality is not available for sometime now.

It seems to that escape() can not handle extended fonts correctly. For example,

Code:

var str1 = 'faßt';
var str2 = escape(str1);

The value of 'str2' becomes 'fa%DFt', which means javascript treats 'ß' as a 1-byte character, which is actually a 2-byte character.

I also tested it using perl CGI.pm:

Code:

use CGI;

my $str1 = 'faßt';
my $str2 = CGI::escape($str1);

The value of '$str2' is 'fa%C3%9Ft', which indicates perl treats 'ß' as a 2-byte character. And I think this is right!!

What I need to do is this:

1) Take a user input from a web interface;
2) Massage it first using javascript including escape();
3) Then pass it as a javascript variable to perl;

Because javascript..escape corrupts extended fonts, perl does not know how to unescape it.

Could someone here tell me how to solve this problem? for instance, in javascript, can we test if a string contains extended fonts?

Many thank!

BabyJeffy · Apr 3, 2007

Could you confirm you have set the character set of the page (and the doctype, meta details) to UTF8 (or something similar that supports multi-byte characters).

Cheers,
Jeff

[tt]Jeff's Page @ Code Couch

http://www.codecouch.com/jeff/blog/

http://www.coedit.co.uk/

[/tt]

What is Javascript? FAQ216-6094

BillyRayPreachersSon · Apr 3, 2007

While I don't know if it will help, take a look here:

http://xkr.us/articles/javascript/encode-compare/

Give the encodeURI method a try - it might offer more than escape.

Hope this helps,
Dan

Coedit Limited - Delivering standards compliant, accessible web solutions

[tt]Dan's Page [blue]@[/blue] Code Couch

http://www.codecouch.com/dan/

[/tt]

BillyRayPreachersSon · Apr 3, 2007

Well... I've just tried:

Code:

encodeURI('faßt')

and I end up with a double-byte result:

Code:

fa%C3%9Ft

So it looks promising.

Dan

Coedit Limited - Delivering standards compliant, accessible web solutions

[tt]Dan's Page [blue]@[/blue] Code Couch

http://www.codecouch.com/dan/

[/tt]

lcs01 · Apr 3, 2007

To Jeff,

Thank you for your reply. And no, I can confirm I have set the character set of the page. Actually, I have no way to set the character set of the page. A user input can be anything, such as German, Greek, Chinese, etc.

lcs01 · Apr 3, 2007

Thank you, Dan. Your results are exciting. But, it does not escape/encode single quots. At our site, we must escape qingle quots from user inputs.

Thank you again.

BillyRayPreachersSon · Apr 3, 2007

Why not use both escape and encodeURI, then? Or, search and replace single quotes with some other character.

Hope this helps,
Dan

Coedit Limited - Delivering standards compliant, accessible web solutions

[tt]Dan's Page [blue]@[/blue] Code Couch

http://www.codecouch.com/dan/

[/tt]

lcs01 · Apr 3, 2007

To Dan,

It would not work using both, for escape would corrupt extended fonts. Currently, we are doing the way you suggested - "search and replace single quotes with some other character". The problem is that how many "other" characters are there?

The best way would be like this: if a user input contains any extended fonts, then apply encodeURI to it; else escape it.

But how to test whether a string from users input contians those extended fonts or not?

BillyRayPreachersSon · Apr 3, 2007

If you do the encodeURI [!]first[/!] and then the escape, the extended fonts wouldn't be an issue, would they.

Dan

Coedit Limited - Delivering standards compliant, accessible web solutions

[tt]Dan's Page [blue]@[/blue] Code Couch

http://www.codecouch.com/dan/

[/tt]

kaht · Apr 3, 2007

Dan, the only problem is that the % symbols will fall prey to the escape fuction, which means either way you'll be required to go back thru and replace all the %25s with % signs.

Interestingly, when running with utf-8 charset the escape function provides a 5 character code to signify the ß character. Additionally, when running in ISO-8859-1 charset, the function has no problem escaping the character and then restoring the character on an unescape, but the character is lost to unescape using a utf-8 charset:

Code:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "[URL unfurl="true"]http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">[/URL]
<html xmlns="[URL unfurl="true"]http://www.w3.org/1999/xhtml">[/URL]
<head>
<title>title test</title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
<script type="text/javascript">
alert("escape: " + escape('faßt'));
alert("unescape: " + unescape(escape('faßt')));
alert("encodeURI: " + encodeURI('faßt'));
alert("both: " + escape(encodeURI('faßt')));
</script>
<style type="text/css"></style>
</head>
<body>
</body>
</html>

Code:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "[URL unfurl="true"]http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">[/URL]
<html xmlns="[URL unfurl="true"]http://www.w3.org/1999/xhtml">[/URL]
<head>
<title>title test</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<script type="text/javascript">
alert("escape: " + escape('faßt'));
alert("unescape: " + unescape(escape('faßt')));
alert("encodeURI: " + encodeURI('faßt'));
alert("both: " + escape(encodeURI('faßt')));
</script>
<style type="text/css"></style>
</head>
<body>
</body>
</html>

-kaht

Looking for a puppy?

http://www.silkypups.com

[small](Silky Terriers are hypoallergenic dogs that make great indoor pets due to their lack of shedding and small size)[/small]

BillyRayPreachersSon · Apr 3, 2007

the % symbols will fall prey to the escape fuction, which means either way you'll be required to go back thru and replace all the %25s with % signs

True - although I had assumed that the OP would realise this, and run two relevant decoding functions from the Perl end... but I guess it makes no difference whether both encodeURI and escape are used, or just encodeURI, and then the ' characters replaced with some other arbitrary character.

Interesting to see how the UTF-8 and ISO-8859-1 charsets behave differently, though. If anything, I would have expected the results to be the other way around. Curious!

Coedit Limited - Delivering standards compliant, accessible web solutions

[tt]Dan's Page [blue]@[/blue] Code Couch

http://www.codecouch.com/dan/

[/tt]

kaht · Apr 3, 2007

Interesting to see how the UTF-8 and ISO-8859-1 charsets behave differently, though. If anything, I would have expected the results to be the other way around.

Same here!

-kaht

Looking for a puppy?

http://www.silkypups.com

[small](Silky Terriers are hypoallergenic dogs that make great indoor pets due to their lack of shedding and small size)[/small]

lcs01 · Apr 4, 2007

Thank you, Dan & kaht!

This works very well for me:

By using this -- escape(encodeURI('faßt')), both single quots and entended fonts are processed nicely w/o setting the char-set.

Thank you!

BillyRayPreachersSon · Apr 4, 2007

Excellent news. That link I posted is a good read. I'd no idea myself about the difference between the methods - just that they existed.

Dan

Coedit Limited - Delivering standards compliant, accessible web solutions

[tt]Dan's Page [blue]@[/blue] Code Couch

http://www.codecouch.com/dan/

[/tt]

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

A possible bug in escape()? 2

lcs01

Programmer

BabyJeffy

Programmer

BillyRayPreachersSon

Programmer

BillyRayPreachersSon

Programmer

lcs01

Programmer

lcs01

Programmer

BillyRayPreachersSon

Programmer

lcs01

Programmer

BillyRayPreachersSon

Programmer

kaht

Programmer

BillyRayPreachersSon

Programmer

kaht

Programmer

lcs01

Programmer

BillyRayPreachersSon

Programmer

Similar threads

Part and Inventory Search

Sponsor