Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

XMLHttpRequest and Charsets

Status
Not open for further replies.

pbb72

Programmer
Mar 27, 2004
38
0
0
NO
Hi,

I'm using XMLHttpRequest in JavaScript to retreive information from remote websites into Internet Explorer. Everything works great, except for two websites, where the server doesn't return any Charset information in the headers. The result is that XMLHttp interprets the results as UTF-8, while I know they are ISO-8859-1, which results in all extended characters (for example å, ø, æ) being displayed as questionmarks.

I do not have access to those servers, or to the documents on them. Is there any way I can force XMLHttpRequest to interpret the documents as ISO-8859-1, or to display the UTF-8 results properly in another way?

Thanks, Peter
 
I do not have access to those servers, or to the documents on them.

So you're scraping other people's websites without their permission, presumably?

Is there any way I can force XMLHttpRequest to interpret the documents as ISO-8859-1, or to display the UTF-8 results properly in another way?

Possibly. You might be able to do a regexp replacement on the returned data, switching known characters for HTML entities. Give that a whirl.

Hope this helps,
Dan

Coedit Limited - Delivering standards compliant, accessible web solutions

[tt]Dan's Page [blue]@[/blue] Code Couch
[/tt]
 
Thanks for the reply, Dan.

It didn't help however. All of the special characters get character code 65535 when I read them... :-(
 
To: op
If you can provide more detail of what user environment you page is operating in... If the user environment is such that the least common denominator rules and that users are just mouse-clickers, may be you have to live with it.
 
Well, apart from half of the text being unreadable (the texts are not English but Norwegian, with lots of mangled up å, ø, æ), the main problem is that the mangeling-up also takes the next 2 or 3 characters with it.
For example: "<div>få</div>" gets converted to "<div>f?iv>".
This makes proper interpretation of the results quite hard.

And besides, "they just have to live with it", that can't be programmer saying that? ;-)
 
>"they just have to live with it", that can't be programmer saying that?
You'll be surprised if you know the attitudes of the community of client-side page designers...

Here is a cryptic solution for you. (You say you use ie, it is ie applicable. But you have to make sure adodb.stream is not disabled in the browser environment. Usually it is.)
[tt]
//you have your xmlhttp object captured the response
//here is what you do after the part
var ostream=new ActiveXObject("adodb.stream");
with (ostream) {
type=1;
open();
write(oxmlhttp.responsebody);
type=2;
charset="iso-8859-1";
var s=readText(-1);
close();
}
ostream=null;
//start working on with the variable s which is the xml response string at the place of .responseText.
[/tt]
 
Thanks, that worked like a charm!!

Apparently, you forgot 1 line in the code. After the write command, I needed to add "position = 0;", and then everything worked as required.

tsuji, you are da man! :-D
 
>After the write command, I needed to add "position = 0;",
You have to! (I sometimes make persistent the info, then it won't need to reposition.) Glad you know your biz, that safe me thousands (I am exaggerating) words.

 
Hehehe, yeah with the help of your code, I was able to Google the final details together.
Thanks again man!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top