Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to suport unicode 1

Status
Not open for further replies.

Lorey

Programmer
Feb 16, 2003
88
SG
Hi Experts,

We will modify our System (done in VC++, SQL 8, ODBC environment) to support unicode characters.

Please help me identify things to modify aside from putting _T in all string literals.

Im concerned in our database.

 
any functions u call in the crt may need to be replaced with the tchar version:

strcpy - _tcscpy
strcat - _tcscat
strchr - _tcschr

etc

Skute

"There are 10 types of people in this World, those that understand binary, and those that don't!"
 
Hi,

Check for all the windows API's that have been used. You should use the API's that have a post fix of W to them else there will be wrong outputs for unicode from ANSI API's.

cheers
C
 
u dont need todo that.

Just use TCHARs everywhere, the windows API can determine which function to use (ie, MessageBoxA or MessageBoxW) by whether you have _UNICODE defined.

You should never hard code for a specific character type unless you have a good reason to.

Skute

"There are 10 types of people in this World, those that understand binary, and those that don't!"
 
>You should never hard code for a specific character type unless you have a good reason to.

I would say the opposite:

You should only have ambiguous character types if you have a good reason to.

/Per
[sub]
www.perfnurt.se[/sub]
 
Thats not exactly what i meant by my comment Per.

What i meant was, you should never actually write in your code:

MessageBoxA(hWnd, "Hello World", "ANSI MessageBox", MB_OK);

Unless youve got a good reason to.


And in response to your comment, why would you intentionally only support ANSI and not UNICODE? It is no more effort to support UNICODE, just involves typing TCHAR instead of char.
The only area you need to be careful of is pointer arithmetic.

Skute

"There are 10 types of people in this World, those that understand binary, and those that don't!"
 
As I said - only have ambiguous character types if you have a good reason to. I would define switching the codebase from ANSI to UNICODE as a good reason....

When producing new code however (targeted at UNICODE) I try to avoid the ambiguity with T stuff.

For example: Just by looking at T("Foo") I can't tell if it is ANSI or UNICODE - I have to go look in the compiler settings, while "Foo" and L"Foo" is crystal clear.



/Per
[sub]
www.perfnurt.se[/sub]
 
Experts,

Just want your ideas....

We're still arguing if we need to change all varchar type in our SQL Server database to nvarchar (to support unicode), or just only those values that are changing (like user inputs).

can we leave other fields that are system generated to varchar?

I want to know the impact of changing all to nvarchar to speed, memory space, or any other relevant issues.
Coz if there's no difference, It will be very easy for us to change all to nvarchar rather than selective.

please help.


 
I think it is better to be selective: just do the output strings. A global change, although easier can really mess up on silly things like dates. It can also double the size of your database. You will always end up with some bum routine which will only take varchar instead of nvarchar.

Quite often, instead of arguing about it, just try it - it is a lot quicker. Changing everying from varchar to nvarchar isn't difficult and you'll know straight away whether or not it works.

We used a lot of unsigned chars because we were using mbcs to wchar. Changing everything to TCHARs didn't really help because all the strcpys were casted from unsigned char* to char* so the compiler didn't spot any problems with the code. It wasn't until we started running the code and only got half the string that we realized that all these unsigned char* had been casted to char*.

It took 7 developers about 4 days to do about 500 files. It took about 3 months to weed out most of the casting and sizeof problems. If there is no casting, it is dead simple.

Beware of things like RTF import - they only take chars. RTF doesn't like TCHARs.
 
Thanks xwb for such a generous and substantial ideas.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top