Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Converting latin characters

Status
Not open for further replies.

cfsponge

Programmer
Feb 22, 2006
44
US
I'm having some difficulty with having latin characters converted from either HTML entity or Word to stay converted. I'm using a replace that works on the first time, but subsequent form submissions corrupt them. A few letters, like á and é, always work. I'll use the letter ú for my example.

I have 2 possible filters for when converting for database storage:
sText = Replace(sText, "ú", "ú")
sText = Replace(sText, "ú", "ú")

However, it never keeps its HTML entity value when retrieving from the database. I get lots of additional question marks and other odd characters. Can anyone provide some assistance into this?

Oh, and I'm using UTF-8 so the charset isn't the problem.
 
This is an issue full of intricuacy. Here are some notes for further look-into.

[1] Stability of ú (as an instance)
The stability should be achieved in the sense that the form element containing ú sent to the server and then response.write (or some equivalent device) back to the user agent.

[1.1] This stability can be done client-side for the form's html page having the meta data
[tt]<meta http-equiv="content-type" content="text/html; charset="utf-8"> [/tt]

[1.2] At the server-side, the page with the form data submitted to has its codepage set to for instance 65001 (something equivalent to utf-8)
[tt]session.codepage=65001[/tt]

[1.3] When response.write request("input_name"), the server should have in its asp (or some equivalent server-side tech)
[tt]response.charset="utf-8"[/tt]

[1.4] With the above client-->server-->client communication, the ú should be stable and show up on the user agent correctly.

[2] If the server-side involving processing with some dbase, it involves some new complication.

[2.1] In that case, the request("input_name") data containing ú should be mapped to ucs-2. My naive way of doing it is this.
[tt]x=unescape(replace(escape(request("input_name")),"%FA","%U77C7"))[/tt]
With x, it is stand-ready for querying/interacting with most dbase.

[2.2] If the ú is what we pull out from dbases described in the article in [3], the same happens but in reverse order.
[tt] y=unescape(replace(escape(rs("some_field")),"%U77C7","%FA"))[/tt]
With y, it is stand-ready to be response.write to the page being served to the client with charset set again to utf-8.

[2.3] Why all it sounds entangling, it is just because I'm not good enough to use high powered jargon to say it. It may sound more involved than it really is.

[3] This msdn article may help to see the big picture for case involving db server-side.

You probably are using cf and not using asp, but the essential maybe is still the same.
 
Thank you for the excellent resource. When you say "dbase", you are referring to using any database for storage of the information? I want to make sure you don't specifically mean dBase type.
 
I just meant those listed in the reference article:
[tt] "Microsoft Windows NT, SQL Server, Java, COM, and the SQL Server ODBC dirver and OLEDB provider all internally represent Unicode data as UCS-2."[/tt]
Impossible to know every possible tool, just not enough resource available at my disposal... somebody may know better.
 
After doing all the above research, all I had to do was add the session.codepage=65001 to my ASP pages that the submissions happen on. Since I was already using utf-8 for my character set, this was great. Again, thanks for your help.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top