Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Gsub with Unicode Characters

Status
Not open for further replies.

Malistryx

Programmer
Jan 9, 2008
2
CA
Hi

I have an application which needs to take utf-8 encoded data from a database which includes French, and replace any special characters (ie. é) with their HTML representations.

So for that example should
Code:
french_name.gsub!(/é/, 'é')
be doing the trick? I've spent a while trying to figure this one out but haven't really been able to find anything online... I'm pretty stuck on this one, I've spent about a day trying different options but to no avail.

Thanks for the help
 
I'm not actually a Ruby guy, so I apologize if this is a waste of your time...At any rate... Here is my shot in the dark.... In all the languages that I work with you need to assign the parsed value back to the original variable... i.e.

french_name = french_name.gsub!(/é/, 'é')

Syntactically that may not be correct, but I'm sure you get the idea.



Kevin Davie
Consultant
Sogeti USA
 
Thanks for the reply, however in ruby the ! essentially makes the gsub execute on itself, so
Code:
french_name =  french_name.gsub(/é/, 'é')
is the same as
Code:
french_name.gsub!(/é/, 'é')

Thanks though :)
 
I see... Well, I have a friend who is a very talented ruby developer.... I sent him an email with your question... Additionally, I did a little research and found this article... Maybe it will help too...


When my friend replies, I will post his response in here as well. Good luck!

Kevin Davie
Consultant
Sogeti USA
 
Heyas:

I have dealt with this exact issue with the Chinese language.

Here is how I dealt with it:

At the top of my model(s) I put the following:

<code>
$KCODE = 'u'
require 'jcode'
</code>

The jcode library updates a number of methods on the String class: ‘chop!’, ‘chop’, ‘delete!’, ‘delete’, ‘squeeze!’, ‘squeeze’, ‘succ!’, ‘succ’, ‘tr!’, ‘tr’, ‘tr_s!’, and ‘tr_s’. It also adds ‘jlength’ and ‘jcount’.

The encoding assumed in a string is globally defined in the global variable ‘$KCODE’.

So now I can take a 'sentence' of all Chinese characters and do something like this:

<code>
words = self.simplified.split('')
words.each do |w|
<doing something>
end
</code>

Also, I noticed when I did the gsub like your doing w/ something like '&#233;' the h() command outputs '&#233;' and NOT the special character. To get around this I didn't use h() to format my HTML output. However, this leaves you open to injection attacks, so make sure you aren't showing otuput entered by the user w/out the h().

If this doesn't help or if your still having problems let me know. I'll be happy to help out.

Thanks!
Nathan

Want to learn Chinese?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top