Gsub with Unicode Characters

Malistryx · Jan 10, 2008

Hi

I have an application which needs to take utf-8 encoded data from a database which includes French, and replace any special characters (ie. é) with their HTML representations.

So for that example should

Code:

french_name.gsub!(/é/, '&#233;')

be doing the trick? I've spent a while trying to figure this one out but haven't really been able to find anything online... I'm pretty stuck on this one, I've spent about a day trying different options but to no avail.

Thanks for the help

KDavie · Jan 10, 2008

I'm not actually a Ruby guy, so I apologize if this is a waste of your time...At any rate... Here is my shot in the dark.... In all the languages that I work with you need to assign the parsed value back to the original variable... i.e.

french_name = french_name.gsub!(/é/, 'é')

Syntactically that may not be correct, but I'm sure you get the idea.

Kevin Davie
Consultant
Sogeti USA

http://www.us.sogeti.com

Malistryx · Jan 10, 2008

Thanks for the reply, however in ruby the ! essentially makes the gsub execute on itself, so

Code:

french_name =  french_name.gsub(/é/, '&#233;')

is the same as

Code:

french_name.gsub!(/é/, '&#233;')

Thanks though

KDavie · Jan 10, 2008

I see... Well, I have a friend who is a very talented ruby developer.... I sent him an email with your question... Additionally, I did a little research and found this article... Maybe it will help too...

http://wiki.rubyonrails.org/rails/pages/HowToUseUnicodeStrings

When my friend replies, I will post his response in here as well. Good luck!

Kevin Davie
Consultant
Sogeti USA

http://www.us.sogeti.com

nrasch · Apr 16, 2008

Heyas:

I have dealt with this exact issue with the Chinese language.

Here is how I dealt with it:

At the top of my model(s) I put the following:

<code>
$KCODE = 'u'
require 'jcode'
</code>

The jcode library updates a number of methods on the String class: ‘chop!’, ‘chop’, ‘delete!’, ‘delete’, ‘squeeze!’, ‘squeeze’, ‘succ!’, ‘succ’, ‘tr!’, ‘tr’, ‘tr_s!’, and ‘tr_s’. It also adds ‘jlength’ and ‘jcount’.

The encoding assumed in a string is globally defined in the global variable ‘$KCODE’.

So now I can take a 'sentence' of all Chinese characters and do something like this:

<code>
words = self.simplified.split('')
words.each do |w|
<doing something>
end
</code>

Also, I noticed when I did the gsub like your doing w/ something like 'é' the h() command outputs 'é' and NOT the special character. To get around this I didn't use h() to format my HTML output. However, this leaves you open to injection attacks, so make sure you aren't showing otuput entered by the user w/out the h().

If this doesn't help or if your still having problems let me know. I'll be happy to help out.

Thanks!
Nathan

Want to learn Chinese?

http://www.gatewaychina.net

nrasch · Apr 16, 2008

Oh yah, here are some helpful links;

http://www.fngtps.com/2006/01/encoding-in-rails

http://wiki.rubyonrails.com/rails/pages/HowToUseUnicodeStrings

Nathan

Want to learn Chinese?

http://www.gatewaychina.net

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Gsub with Unicode Characters

Malistryx

Programmer

KDavie

Programmer

Malistryx

Programmer

KDavie

Programmer

nrasch

Programmer

nrasch

Programmer

Similar threads

Part and Inventory Search

Sponsor