Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

std::wstring

Status
Not open for further replies.

Lorey

Programmer
Feb 16, 2003
88
SG
hi experts,

i have a std::wstring variable and i want to loop thru it via iterator.

what is the byte size of each character in std::wstring?

 
You have to be careful about this. wchar is only 2 byte for MS. If you port to something like Linux or Solaris, it is 4 bytes. I don't know what it is on the 64 bit MS OSs. If you keep to sizeof, you're safe. If you assume 2 bytes or 4 bytes, you may hit problems.
 
Yes, you're right. It is wchar_t.
 
thanks to all your replies.

im very new to vc++. can you please help me in the coding in the iteration using this wchar_t? the problem is im porting my program in windows and also in solaris. I get a different result in both platform. so far, the windows output is correct.
 
What does the code you're porting look like (the part that doesn't work)? If I remember correctly, on Solaris, wchar_t is 32 bits. Also, are you working in Unicode, UTF-8, SBCS, DBCS or MBCS?
 
Why do you need the size of the character for iteration?

Code:
#include <iostream>
#include <string>

int main()
{
    std::wstring widestring = L"Hello World.\n";

    std::wstring::iterator pos = widestring.begin();
    while (pos != widestring.end())
    {
        if (*pos == L'H')
            *pos = L'J';
        ++pos;
    }

    std::wcout << widestring;
}
 
please see below. this is run in solaris.

when i use directly the 2nd ::sendTextMessage below, output is correct. But if I use first the 1rst ::sendTextMessage, the output is wrong. So I assume the problem is in the first ::sendTextMessage wstring? The first function converts the std::string to vector of unsigned char and then pass to the 2nd ::sendTextMessage



******************
bool RadioSession::sendTextMessage(const std::string& destTSI, const std::string& message, ESdsServiceSelection serviceSelection, const std::string& sessionId, std::string & status)
{
FUNCTION_ENTRY("sendTextMessage - internal");

// page and convert the message to ucs2
TA_IRS_Bus::MultiPartSdsMessage multiPartMessage = m_sdsUtils.buildMultiPartMessage(message);

MessageListType messageList;

// get each page
std::vector<TA_IRS_Bus::SdsMessageComponent>& messages = multiPartMessage.getMessageParts();

// for each page
for ( std::vector<TA_IRS_Bus::SdsMessageComponent>::const_iterator messageIter = messages.begin();
messageIter != messages.end(); ++messageIter)
{
MessageTextType messageText;
for ( std::wstring::const_iterator textIter = messageIter->m_messageContent.begin();
textIter != messageIter->m_messageContent.end(); ++textIter)
{
messageText.push_back( static_cast<unsigned char>((*textIter & 0xFF00) >> 8) ); // high byte
messageText.push_back( static_cast<unsigned char>( *textIter & 0xFF) ); // low byte
}

messageList.push_back(messageText);
}
FUNCTION_EXIT;
return sendTextMessage(destTSI, messageList, serviceSelection, sessionId, status);
}

bool RadioSession::sendTextMessage(const std::string & destTSI, const TA_IRS_App::MessageListType& messages, ESdsServiceSelection serviceSelection, const std::string& sessionId, std::string & status, bool logHistory)
{
FUNCTION_ENTRY("sendTextMessage");

if ( Unknown == serviceSelection )
{
RadioSessionHelper helper(getSessionRef(), m_radioTcpServer);
std::string k_type = helper.getSubscriberType(destTSI);
if ( ("P" == k_type) || ("G" == k_type) )
{
serviceSelection = Group;
}
else
{
serviceSelection = Individual;
}
}

for (MessageListType::const_iterator messageIter = messages.begin(); messageIter != messages.end(); ++messageIter)
{
if (false == sendTextDataMessage(destTSI, *messageIter, serviceSelection, status) )
{
FUNCTION_EXIT;
return false;
}
if(messages.size() > 1)
{
TA_Base_Core::Thread::sleep(1000);
}
}

}
******************
 
Since Solaris has 4 bytes you need to add:
Code:
#ifdef SOLARIS
            messageText.push_back( static_cast<unsigned char>((*textIter & 0xFF000000) >> 24) ); 
            messageText.push_back( static_cast<unsigned char>(( *textIter & 0xFF0000) >> 16) ); 
#endif
 
You can also replace the #ifdef with
Code:
if ( sizeof( wchar_t ) > 2 ) {
and the #endif with }
 
Hi Miros,

the last reply you gave didnt work. it still gives me garbage data.

Is there an equivalent of MultiByteToWideChar of Windows in Solaris?

thanks!
 

Hi experts,

I have a std::string which was converted to std::wstring.
when in windows, wstring is 2 bytes so i can easily get the high and low byte and push in vector<unsigned char>.

But, how can I also get the high and low byte of 4 bytes wstring in solaris and assign to my vector<unsigned char>.

thanks much!
 
I think this should work on all platforms (note: I haven't tested it):
Code:
#include <climits>

unsigned char LowByte( wchar_t  wc )
{
    return static_cast<unsigned char>( wc );
}

unsigned char HighByte( wchar_t  wc )
{
    return static_cast<unsigned char>( wc >> ((sizeof(wchar_t) * CHAR_BIT) - CHAR_BIT) );
}
 
Hi cpjust, thanks for this.

when the argument is non-chinese (i.e. "ABC"), it displays correctly. but if the argument is in chinese, the display is incorrect :(
 
If the wchar_t is 4 bytes, then why would you send only the high and the low bytes? What about the other two?

This is just a guess and is not tested with Chinese or other multibyte characters:
Code:
#include <iostream>
#include <algorithm>
#include <iterator>
#include <string>
#include <vector>

int main()
{
    std::vector<unsigned char> messageText;
    std::wstring messageContent(L"ABC");



    // Get a pointer to the data:
    const wchar_t* wstr_data = messageContent.data();

    // Cast to an unsigned char*:
    const unsigned char* uchar_data = reinterpret_cast<const unsigned char*>(wstr_data);

    // Determine size of data:
    unsigned int data_size = messageContent.size() * sizeof(wchar_t);

    // Add to vector:
    messageText.insert(messageText.end(), uchar_data, uchar_data + data_size);



    // Prove the vector holds the data:
    std::copy(messageText.begin(), messageText.end(), std::ostream_iterator<unsigned char>(std::cout, "\n"));
}

By the way, I forget if this data is being sent over the network or something. If you are converting to unsigned char and sending the data to a server to be returned to a wstring, then you might have problems if the server machine has a different definition of wchar_t. If that is the case, you should be adding information in the data you send about the format of the bytes.

For example, you would send a header saying each character is 2 or 4 bytes, then send each byte in a specified order. To get the bytes out you would use a different method on Windows and Solaris because there are a different number of bytes. (I guess you could use sizeof(wchar_t) to control a generic for loop, but I don't feel like writing that code unless you say that's what you need and are having trouble writing it yourself. :))
 
let me explain my scenario,

in my windows application, i have a std::string which converts to wstring thru multibyte2widechar(). the higher and lower byte of wstring is put into vector of unsigned char to be passed in my solaris application. since the wstring in windows is 2 bytes, getting higher and lower byte is ok.

also, i have another flow where, inside solaris, i will have a std::string which will be converted to wstring, which i dont know how to convert in wstring in the first place, now that i know that wstring in solaris is 4 bytes.
this wstring should be stored in vector of unsigned char (same above vector of unsigned char coz this vector will be passed to hardware protocol via tcpserver).

The good thing is, from the solaris, when i try to populate my std::string with non-chinese, the hardware can read it correctly but if chinese, its garbage.

Maybe what I want to know is:

1. how to convert std::string to wstring inside solaris
2. from wstring, how can i put it in my vector of unsigned char correctly.

I need to pass a vector of unsigned char to hardware, 2 bytes per each chinese char. if its an ascii, the higherbyte is just 00 (hex)

I hope I didnt confuse you all :(





 
So what does the wchar_t look like on Solaris when it holds a Chinese character? Does it have 0x00 for two of the bytes and non-zero data for the other two?

You say you need to know how to convert std::string to std::wstring inside Solaris, I would guess you would have to do that first? If so, then maybe ask in the C++: Unix forum.

Since you are passing the data over a network to another machine, I would imagine that you want to have a specific byte order and number of bytes every time that is defined by your application. Otherwise you might have endian issues and byte-number issues between machine types. If you only need two unsigned chars per character, then to put the wstring into your vector you would figure out which two bytes in the wchar_t are important and extract the values of those bytes with a method similar to what cpjust showed.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top