Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Removing french chars from string 1

Status
Not open for further replies.

dmazzini

Programmer
Jan 20, 2004
480
US
Hi guys

I have been looking for an easy way to remove french chars from string and and I did not find it.

So, I came out with it, it works ok, so it may be useful as a reference. Maybe someone on the forum has better idea.

Code:
$wbts_location_iub =~ s/Â|Ä|È|É|Ê|Ë|Î|Ï|Ô|Œ|Ù|Û|Ü|Ÿ|à|â|ä|è|é|ê|ë|î|ï|ô|œ|ù|û|ü|ÿ|Ç|ç|«|»|€|\'//g;
Removing french chars from string

dmazzini
GSM/UMTS System and Telecomm Consultant

 
I was thinking and I dont really want to remove french chars, I do really want to "convert" char to English..

So, I have seen that there is an old module called

Text::StripAccents. See link below:

Code:
[URL unfurl="true"]http://search.cpan.org/~ccolbourn/Text-StripAccents-0.11/lib/Text/StripAccents.pm[/URL]

So, I just added more chars to the StripAccents.pm file.

new one here:

Code:
##############################################################                                                                                                                                                                                                                   
# Text::StripAccents - remove non a-z chars from a string                                                                                                                                                                                                                        
#  and replace them with their a-z counterparts                                                                                                                                                                                                                                  
##############################################################                                                                                                                                                                                                                   
#                                                                                                                                                                                                                                                                                
# Version information                                                                                                                                                                                                                                                            
# ===================                                                                                                                                                                                                                                                            
#                                                                                                                                                                                                                                                                                
# 0.1	CC	Apr 05		New module                                                                                                                                                                                                                                                   
#                                                                                                                                                                                                                                                                                
# 0.11	CC	Jun 05		After feedback in cpanrating,                                                                                                                                                                                                                              
#				documented that the module is                                                                                                                                                                                                                                            
#				latin1 only, and pp with no                                                                                                                                                                                                                                              
#				prereqs                                                                                                                                                                                                                                                                  
# 0.12 dmazzini added more characters                                                                                                                                                                                                                                                                               
##############################################################                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                 
package Text::StripAccents;                                                                                                                                                                                                                                                      
use strict;                                                                                                                                                                                                                                                                      
use vars qw (@ISA $VERSION @EXPORT);                                                                                                                                                                                                                                             
use Exporter ();                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                 
@ISA = qw(Exporter);                                                                                                                                                                                                                                                             
@EXPORT = qw(stripaccents);                                                                                                                                                                                                                                                      
$VERSION="0.11";                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                 
##############################################################                                                                                                                                                                                                                   
=pod                                                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                 
=head1 NAME                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                 
 Text::StripAccents - removes accented & special characters from strings                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                 
=head1 SYNOPSIS                                                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                 
 use Text::StripAccents;                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                 
 my $Stripaccent = Text::StripAccents->new();                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                 
 my $convertedString = $StripAccents->strip($unconvertedString);                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                 
OR                                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                 
 use Text::StripAccents;                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                 
 stripaccents($string);                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                 
=head1 DESCRIPTION                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                 
This simple module takes accented characters and replaces them with their anglicised ASCII counterparts, e.g. Ü becomes U. It currently ONLY supports Latin1. If there are any characters I've missed out that you think should be included, please mail me and I'll add them in.
                                                                                                                                                                                                                                                                                 
This is a pure perl module with no prerequisites.                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
=head1 PREREQS                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                 
None.                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                 
=head1 SEE ALSO                                                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                 
Text::Unaccent is a much more advanced utility to do the same job, but with a C dependency.                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                 
=head1 CHANGES                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                 
0.11 - bugfix to clarify the documentation, as per Dobrica Pavlinusic's feedback.                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
=head1 LICENSE                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                 
Copyright 2005 by Charles Colbourn, all rights reserved. This program is free software, you can redistribute it and/or modify it under the same terms as Perl itself.                                                                                                            
                                                                                                                                                                                                                                                                                 
=head1 AUTHOR                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                 
Charles Colbourn - charlesc@g0n.net                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                 
(Character mapping hash supplied by Nigel Currie).                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                 
=cut                                                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                 
##############################################################                                                                                                                                                                                                                   
# Text::Stripaccent::new - constructor                                                                                                                                                                                                                                           
##############################################################                                                                                                                                                                                                                   
#                                                                                                                                                                                                                                                                                
# Takes as param the character set you are using. Latin1                                                                                                                                                                                                                         
# support only at present                                                                                                                                                                                                                                                        
#                                                                                                                                                                                                                                                                                
# returns a Stripaccent object                                                                                                                                                                                                                                                   
##############################################################                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                 
sub new                                                                                                                                                                                                                                                                          
{                                                                                                                                                                                                                                                                                
	my $class = shift;                                                                                                                                                                                                                                                             
	my $charset = shift;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                 
	my %object;                                                                                                                                                                                                                                                                    
	return bless \%object,$class;                                                                                                                                                                                                                                                  
}                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
###############################################################                                                                                                                                                                                                                  
# Text::Stripaccent::strip                                                                                                                                                                                                                                                       
###############################################################                                                                                                                                                                                                                  
#                                                                                                                                                                                                                                                                                
# Removes all accented chars from a string and replaces them                                                                                                                                                                                                                     
# with their unaccented equivalents.                                                                                                                                                                                                                                             
#                                                                                                                                                                                                                                                                                
# takes a string as a param, returns a converted string                                                                                                                                                                                                                          
#                                                                                                                                                                                                                                                                                
###############################################################                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                 
sub strip                                                                                                                                                                                                                                                                        
{                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
	my $object = shift;                                                                                                                                                                                                                                                            
	my $string = shift;                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                 
	my %IsoLatin1ToASCIITable = ("A" => "A", "À" => "A", "Á" => "A", "Â" => "A",                                                                                                                                                                                                   
"Ã" => "A", "Ä" => "A", "Å" => "A",                                                                                                                                                                                                                                              
			     "B" => "B",                                                                                                                                                                                                                                                           
    			     "C" => "C", "Ç" => "C",                                                                                                                                                                                                                                           
    			     "D" => "D",                                                                                                                                                                                                                                                       
    			     "E" => "E", "È" => "E", "É" => "E", "Ê" => "E",                                                                                                                                                                                                                   
"Ë" => "E",                                                                                                                                                                                                                                                                      
    			     "F" => "F",                                                                                                                                                                                                                                                       
    			     "G" => "G",                                                                                                                                                                                                                                                       
    			     "H" => "H",                                                                                                                                                                                                                                                       
    			     "I" => "I", "Ì" => "I", "Í" => "I", "Î" => "I",                                                                                                                                                                                                                   
"Ï" => "I",                                                                                                                                                                                                                                                                      
    			     "J" => "J",                                                                                                                                                                                                                                                       
    			     "K" => "K",                                                                                                                                                                                                                                                       
    			     "L" => "L",                                                                                                                                                                                                                                                       
    			     "M" => "M",                                                                                                                                                                                                                                                       
    			     "N" => "N", "Ñ" => "N",                                                                                                                                                                                                                                           
    			     "O" => "O", "Ò" => "O", "Ó" => "O", "Ô" => "O",                                                                                                                                                                                                                   
"Õ" => "O", "Ö" => "O",                                                                                                                                                                                                                                                          
    			     "P" => "P",                                                                                                                                                                                                                                                       
    			     "Q" => "Q",                                                                                                                                                                                                                                                       
    			     "R" => "R",                                                                                                                                                                                                                                                       
    			     "S" => "S",                                                                                                                                                                                                                                                       
    			     "T" => "T",                                                                                                                                                                                                                                                       
    			     "U" => "U", "Ù" => "U", "Ú" => "U", "Û" => "U",                                                                                                                                                                                                                   
"Ü" => "U",                                                                                                                                                                                                                                                                      
    			     "V" => "V",                                                                                                                                                                                                                                                       
    			     "W" => "W",                                                                                                                                                                                                                                                       
    			     "X" => "X",                                                                                                                                                                                                                                                       
    			     "Y" => "Y", "Y" => "Y",  "Ÿ"=>"Y","ÿ" =>"y",                                                                                                                                                                                                                                           
    			     "Z" => "Z",                                                                                                                                                                                                                                                       
    			     "a" => "a", "à" => "a", "á" => "a", "â" => "a",                                                                                                                                                                                                                   
"ã" => "a", "ä" => "a", "å" => "a",                                                                                                                                                                                                                                              
    			     "b" => "b",                                                                                                                                                                                                                                                       
    			     "c" => "c", "ç" => "c",                                                                                                                                                                                                                                           
    			     "d" => "d",                                                                                                                                                                                                                                                       
    			     "e" => "e", "è" => "e", "é" => "e", "ê" => "e",                                                                                                                                                                                                                   
"ë" => "e",                                                                                                                                                                                                                                                                      
    			     "f" => "f",                                                                                                                                                                                                                                                       
    			     "g" => "g",                                                                                                                                                                                                                                                       
    			     "h" => "h",                                                                                                                                                                                                                                                       
    			     "i" => "i", "ì" => "i", "í" => "i", "î" => "i", "î" => "i",                                                                                                                                                                                                                
"ï" => "i",                                                                                                                                                                                                                                                                      
    			     "j" => "j",                                                                                                                                                                                                                                                       
    			     "k" => "k",                                                                                                                                                                                                                                                       
    			     "l" => "l",                                                                                                                                                                                                                                                       
    			     "m" => "m",                                                                                                                                                                                                                                                       
    			     "n" => "n", "ñ" => "n",                                                                                                                                                                                                                                           
    			     "o" => "o", "ò" => "o", "ó" => "o", "ô" => "o",                                                                                                                                                                                                                   
"õ" => "o", "ö" => "o",                                                                                                                                                                                                                                                          
    			     "p" => "p",                                                                                                                                                                                                                                                       
    			     "q" => "q",                                                                                                                                                                                                                                                       
    			     "r" => "r",                                                                                                                                                                                                                                                       
    			     "s" => "s",                                                                                                                                                                                                                                                       
    			     "t" => "t",                                                                                                                                                                                                                                                       
    			     "u" => "u", "ù" => "u", "ú" => "u", "û" => "u",                                                                                                                                                                                                                   
"ü" => "u",                                                                                                                                                                                                                                                                      
    			     "v" => "v",                                                                                                                                                                                                                                                       
    			     "w" => "w",                                                                                                                                                                                                                                                       
    			     "x" => "x",                                                                                                                                                                                                                                                       
    			     "y" => "y", "y" => "y", "ý" => "y",                                                                                                                                                                                                                               
    			     "z" => "z", 
    			     "Œ" => "CE",
    			     "œ" => "ce",
    			     "Ç" => "C",
    			     "ç" => "c",
    			     "«" => "_", 
    			     "»" =>  "_",
    			     "€" => "C",
    			     
    			                                                                                                                                                                                                                                                          
"ß"=>"ss");                                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                 
	my @stringArray = split //,$string;                                                                                                                                                                                                                                            
	foreach (@stringArray)                                                                                                                                                                                                                                                         
	{                                                                                                                                                                                                                                                                              
		if ($IsoLatin1ToASCIITable{$_})                                                                                                                                                                                                                                              
		{                                                                                                                                                                                                                                                                            
			$_ = $IsoLatin1ToASCIITable{$_};                                                                                                                                                                                                                                           
		}                                                                                                                                                                                                                                                                            
	}                                                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                 
	my $returnString = join '',@stringArray;                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                 
	return $returnString;                                                                                                                                                                                                                                                          
}                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
################################################################                                                                                                                                                                                                                 
# stripaccent - function to call ::strip in non OO mode                                                                                                                                                                                                                          
################################################################                                                                                                                                                                                                                 
sub stripaccents                                                                                                                                                                                                                                                                 
{                                                                                                                                                                                                                                                                                
	my $string = shift;                                                                                                                                                                                                                                                            
	return __PACKAGE__->strip($string);                                                                                                                                                                                                                                            
}

and you us eit in your script

Code:
use Text::StripAccents;                                       
                                                                
 my $Stripaccent = Text::StripAccents->new();  
 
$unconvertedString = "Â_Ä_È_É_Ê_Ë_Î_Ï_Ô_Œ_Ù_Û_Ü_Ÿ_à_â_ä_è_é_ê_ë_î_ï_ô_œ_ù_û_ü_ÿ_Ç_ç_«_»_€_";
 my $convertedString =stripaccents($unconvertedString);  
 print "$unconvertedString   ->   $convertedString$\n";


result is

Code:
Â_Ä_È_É_Ê_Ë_Î_Ï_Ô_Œ_Ù_Û_Ü_Ÿ_à_â_ä_è_é_ê_ë_î_ï_ô_œ_ù_û_ü_ÿ_Ç_ç_«_»_€_   ->   A_A_E_E_E_E_I_I_O_CE_U_U_U_Y_a_a_a_e_e_e_e_i_i_o_ce_u_u_u_y_C_c_____C_n



1;

dmazzini
GSM/UMTS System and Telecomm Consultant
 
What about
Code:
$wbts_location_iub=~tr/ÂÄÈÉÊËÎÏÔŒÙÛÜŸàâäèéêëîïôœùûüÿÇç€/AAEEEEIIOOUUUYaaaeeeeiioouuuyCc/d;
Of course this assumes you are using a compatible character set.
Of course I don't know what you want to do with the euro and other symbols: in the example above the euro is deleted, as it has no replacement char in the replacement list.

Franco
: Online engineering calculations
: Magnetic brakes for fun rides
: Air bearing pads
 
Thanks prex1. It works great and very short syntax. Cool!

dmazzini
GSM/UMTS System and Telecomm Consultant

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top