Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

gsub not replacing unicode characters

Status
Not open for further replies.

joelewis1910

Programmer
Nov 19, 2007
1
TW
Hi,
I am a newbie here. My problem is that i have a Chinese character string mixed with some ascii characters and I want to remove all the ascii chars from it and get back only the chinese characters. The command - awk '{ gsub(/[\041-\177]/, ""); printf $0 }' actually removes all the ascii characters but it also changed some existing chinese characters in the string to other. Any idea!

thanks in advance,
joe
 
What happens when you use the POSIX character class [:alpha:] (Google for POSIX character class if you are unfamilar with character classes) in gsub and set your locale to C (see locale(1) see Google for it)?
 
I found the thread above from joelewis1910 and response from fpmurphy regarding special foreign characters however am not not clear on how this should work to replace characters of one language with another.

Need to be able to convert French "e" (containing an accent mark over the letter) to an English "e" without accent mark.

Any information you can provide will be greatly appreciated.........thank you
 
Yours is a very different scenario, I guess you'll just need to set up a translation table of some sort. tr could be useful for this, e.g.:

[tt]echo "Les français me disent que c'est l'été là-bas" | tr 'éèàâç' 'eeaac'[/tt]

You may have difficulty with the character set depending on your version of Unix. For example, I can paste the above command in to a Linux shell no probs, but not HP-UX.



Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top