Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

UTF-8 , Unicode, HTML & CSV - no consitency?

Status
Not open for further replies.

1DMF

Programmer
Jan 18, 2005
8,795
GB
Hi,

I seem to be going round in circles trying to understand when I need to encode to UTF-8 to get special characters to show correctly (specifically the GBP symbol £)

I have a reporting class that formats monetary values with the 'Locale::Currency::Format' module...
Code:
$my_value = currency_format('gbp', $my_value, FMT_SYMBOL);

My understanding is this module auto converts the currency symbol to Unicode "/x{00A3}" , which according to the codepoint.net site (if I'm reading it correctly), is UTF-16. (00A3)

But I tried to decode as UTF-16 and the encode module just bombs with
"UTF-16:Unrecognised BOM 2249 at C:/Perl/site/lib/Encode.pm line 175."

So I tried simply encoding to UTF-8 and outputting to HTML and it displays correctly
Code:
$self->_encode($my_string, 'UTF-8')
Great, however, if I then try to output to a CSV text file the browser just hangs and no file is downloaded.

So I removed from the output code
Code:
$iof->binmode(":encoding(UTF-8)");
and I get the CSV output but with funny characters...
, so I remove the encode to UTF-8 but keep the binmode output encoding to UTF-8 but that outputs
Which according to codepoint.net is Perl.

So I remove both encoding and the binmode output formatting and bingo, I get a GBP pound sign in my CSV.

However, if I remove the encode to UTF-8 before outputting the HTML I get

I'm baffled, do I or don't I need to encode before outputting, what formatting is my string currently, what am I meant to convert it to and when?

your help is appreciated.
1DMF

"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"
Free Electronic Dance Music
 
Sounds like the hassle I had a while back with a similar issue.

I had 2 individual websites with the same ISP and I used a common Perl template on both sites.
One site displayed £ signs correctly and the other displayed a question mark in a diamond. The ISP said it was an encoding problem and I insisted it was a server setup problem otherwise both sites would do exactly the same thing.

They kept coming back to the same solution which was to use the £ entity but I wanted to get to the root cause. I got not joy from them so moved to another ISP and I am pleased to report I have not encountered the problem since.

Keith
 
Hi Keith,

Well the IIS server is ours, so if you think it is a configuration issue, do you have any idea what needs changing?

What I don't understand is the inconsistency of needing and not needing to encode and the differing types of symbol you end up with.

Perhaps it's relative to the application opening the content, I.E. vs Excel?

The local devel server (part of Catalyst) when you encode to UTF-8 with the Encode module and output via IO::File using binmode ':encoding(UTF-8)' the CSV is delivered and the pound signs shows correctly.

However, this mechanism crashes IIS7.5, the download freezes, the doc shows zero bytes and eventually fails to be delivered?

I feel this could be an IIS7.5 file delivery issue, but not sure why and what is causing it, normal static CSV/XLS works fine, so perhaps it is Catalyst and the way it prints to STDOUT using UTF-8 which IIS is having trouble with, though it only seems to be for this dynamically created CSV file delivery, as mentioned, outputting HTML as UTF-8 works perfectly fine.

[ponder]


"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"
Free Electronic Dance Music
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top