Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

converting codepage 437 to standard ASCII

Status
Not open for further replies.

spperl

Programmer
Mar 29, 2005
34
GB
I'm struggling with a character encoding issue. Basically the text I am trying to manipulate contains extended ascii characters which are invalid with the database encoding (unfortunately I have no control over this)

Without applying a brute force search and replace does anyone know of any neat ways to convert extended ASCII into the ASCII equivalent?

e.g. e' to e

Any help or pointers would be greatly appreciated.
 
Brute force is the only way that I know to do it since there are no rules for such a translation. Simply create a hash with however you want the translation done and then use a regex convert strings. You can add an additional regex to filter out any extended ascii characters you didn't choose to translate.

Here's a snippet of code I use to translate such things into html codes. You can adapt it to your purpose:

Code:
my %html_ext_chars => (
	chr(192) => 'À',
	chr(224) => 'à',
	chr(193) => 'Á',
	chr(225) => 'á',
	chr(194) => 'Â',
	chr(226) => 'â',
	chr(195) => 'Ã',
	chr(227) => 'ã',
	chr(196) => 'Ä',
	chr(228) => 'ä',
	chr(197) => 'Å',
	chr(229) => 'å',

	chr(200) => 'È',
	chr(232) => 'è',
	chr(201) => 'É',
	chr(233) => 'é',
	chr(202) => 'Ê',
	chr(234) => 'ê',
	chr(203) => 'Ë',
	chr(235) => 'ë',

	chr(204) => 'Ì',
	chr(236) => 'ì',
	chr(205) => 'Í',
	chr(237) => 'í',
	chr(206) => 'Î',
	chr(238) => 'î',
	chr(207) => 'Ï',
	chr(239) => 'ï',

	chr(210) => 'Ò',
	chr(242) => 'ò',
	chr(211) => 'Ó',
	chr(243) => 'ó',
	chr(212) => 'Ô',
	chr(244) => 'ô',
	chr(213) => 'Õ',
	chr(245) => 'õ',
	chr(214) => 'Ö',
	chr(246) => 'ö',

	chr(217) => 'Ù',
	chr(249) => 'ù',
	chr(218) => 'Ú',
	chr(250) => 'ú',
	chr(219) => 'Û',
	chr(251) => 'û',
	chr(220) => 'Ü',
	chr(252) => 'ü',

	chr(223) => 'ß',
	chr(199) => 'Ç',
	chr(231) => 'ç',
	chr(209) => 'Ñ',
	chr(241) => 'ñ',
	chr(253) => 'ý',
	chr(255) => 'ÿ',
	chr(191) => '¿',
	chr(161) => '¡',
);

my $html_ext_chars_re = '(' . join('|', map {"\\x" . sprintf("%lx", ord $_) . ""} keys %html_ext_chars) . ')';

sub escape_html_ext_chars {
	s/$html_ext_chars_re/$html_ext_chars{$1}/g for (@_);
}

- Miller
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top