use Encode: I need to change the encoding of a file to UTF-8

pkskytektip · Oct 16, 2011

I have some HTML files that use at least three different languages and are generated by PERL from both a MySQL database and original HTML files.

This vast collection of HTML files has to be edited in order to transform some characters that are not valid UTF-8.

I have made a little progress in applying the Encode module from Perl. If I apply

Code:

$_ = encode( "UTF8", $_);

to a file handle representing a test file, I get all of my characters transformed to correct UTF-8 except the ISO-1189-2, latin 2 characters which are mangled. I get "??or??evi??" instead of "?or?evi?".

If I look at the original sample file in Notepad++ and look at the encoding as set in the Encoding Menu, it shows up as ANSI.

I don't see ANSI as being a supported encoding in the Encoding module. Is there another way to do this?

Any and all tips or clues would be appreciated.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

use Encode: I need to change the encoding of a file to UTF-8

pkskytektip

Programmer

Similar threads

Part and Inventory Search

Sponsor