How to read multibytes charecter files like Chinese,Japanese & Korean language content files ?.
Iam just new to C Porgramming ?. Any help highly appreciated!.
Iam working in VisualC++ 6.0 . But i need to test this in unix also using gcc compiler . I tried to findout the mbstowcs , mbtowc functions . Is it a built in function or from external library ?
From file , i can able to read as char (getc()) or getw(int) .But how can i read as short. Can u pass a piece of sample.
Pls advice me.
Well I was going through the man pages of Solaris, I found the following APIs related to wide char. (Sorry for the format, I have just copied them from net). Maybe you can go through theor man pages for details.
watoll(3C) - convert wide character string to long integer
wcrtomb(3C) - convert a wide-character code to a character (restartable)
wcscat(3C) - wide-character string operations
wcschr(3C) - wide-character string operations
wcscmp(3C) - wide-character string operations
wcscoll(3C) - wide character string comparison using collating information
wcscpy(3C) - wide-character string operations
wcscspn(3C) - wide-character string operations
wcsetno(3C) - get information on EUC codesets
wcsftime(3C) - convert date and time to wide character string
wcslen(3C) - wide-character string operations
wcsncat(3C) - wide-character string operations
wcsncmp(3C) - wide-character string operations
wcsncpy(3C) - wide-character string operations
wcspbrk(3C) - wide-character string operations
wcsrchr(3C) - wide-character string operations
wcsrtombs(3C) - convert a wide-character string to a character string (restartable)
wcsspn(3C) - wide-character string operations
wcsstr(3C) - find a wide-character substring
wcstod(3C) - convert wide character string to double-precision number
wcstok(3C) - wide-character string operations
wcstol(3C) - convert wide character string to long integer
wcstombs(3C) - convert a wide-character string to a character string
wcstoul(3C) - convert wide character string to unsigned long
wcstring(3C) - wide-character string operations
wcswcs(3C) - wide-character string operations
wcswidth(3C) - number of column positions of a wide-character string
wcsxfrm(3C) - wide character string transformation
wctob(3C) - wide-character to single-byte conversion
wctomb(3C) - convert a wide-character code to a character
wctrans(3C) - define character mapping
wctype(3C) - define character class
wcwidth(3C) - number of column positions of a wide-character code
WIFEXITED(3UCB) - wait for process to terminate or stop
WIFSIGNALED(3UCB) - wait for process to terminate or stop
WIFSTOPPED(3UCB) - wait for process to terminate or stop
windex(3C) - wide-character string operations
wmemchr(3C) - find a wide-character in memory
wmemcmp(3C) - compare wide-characters in memory
wmemcpy(3C) - copy wide-characters in memory
wmemmove(3C) - copy wide-characters in memory with overlapping areas
wmemset(3C) - set wide-characters in memory
wordexp(3C) - perform word expansions
wordfree(3C) - perform word expansions
wprintf(3C) - print formatted wide-character output
wrindex(3C) - wide-character string operations
wscanf(3C) - convert formatted wide-character input
wscasecmp(3C) - Process Code string operations
wscat(3C) - wide-character string operations
wschr(3C) - wide-character string operations
wscmp(3C) - wide-character string operations
wscol(3C) - Process Code string operations
wscoll(3C) - wide character string comparison using collating information
wscpy(3C) - wide-character string operations
wscspn(3C) - wide-character string operations
wsdup(3C) - Process Code string operations
wslen(3C) - wide-character string operations
wsncasecmp(3C) - Process Code string operations
wsncat(3C) - wide-character string operations
wsncmp(3C) - wide-character string operations
wsncpy(3C) - wide-character string operations
wspbrk(3C) - wide-character string operations
wsprintf(3C) - formatted output conversion
wsrchr(3C) - wide-character string operations
wsscanf(3C) - formatted input conversion
wsspn(3C) - wide-character string operations
wstod(3C) - convert wide character string to double-precision number
wstok(3C) - wide-character string operations
wstol(3C) - convert wide character string to long integer
wstostr(3C) - code conversion for Process Code and File Code
wstring(3C) - Process Code string operations
wsxfrm(3C) - wide character string transformation
The mbstowcs and mbtowc functions are supposed to be Standard C (but they may be part of the newer Standard, in which case older libraries might not have them). They should be declared in <stdlib.h>.
On a UNIX-ish machine, type "man mbtowc" on the command line to get info about it.
If I understand correctly, all the wc stuff is what you use to manipulate Unicode strings within your program. A multibyte character file is not in Unicode, but a set of (often) single-byte characters and escape codes, where the escape codes change "states" of the file, and state determines the meaning of each byte. In short, it's a more compressed representation than straight Unicode, especially when your data has large sections that use the same character set.
So you read the multibyte character file into a char buffer. You use mbstowcs to convert the multibyte character string (char*) into a wide character string (wchar_t*).
Once you have a wchar_t representation of your file, you can work with it in your program using the wc library stuff mentioned above (a lot of which is Standard).
And, of course, there are the wcstombs and wctomb functions for changing a wchar_t* into a char* of multibyte characters you can write out.
I have tried this mlen() function in windows VisualC++.
It can compile & Build . But giving runtime exception(abruptly exited) when executing this mlen() function exactly...
Is this function fine with windows ?.
This is the code...
#include <stdio.h>
#include <stdlib.h>
int main()
{
char v;
int vi =0;
FILE *fp;
printf("\n Multi Bytes operation starts here.. \n"
fp = fopen("test.htm","r"
while ((v = getc(fp)) != EOF)
{
vi = mblen((const char *)v, MB_CUR_MAX );
if(vi==0) printf("\n String Error \n"
else if (vi>0) printf("\n Length: %d\n",vi);
else if (vi==-1) printf("\n Invalid multibyte char \n"
}
You're casting your char to a char* and passing it to mblen. That just gets you a random pointer, which does normally give a runtime error if you try using it.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.