Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

help debugging my iconv command 1

Status
Not open for further replies.

jez

Programmer
Apr 24, 2001
370
VN
Hi everyone,

I have an MSSQL 2005 database, with some data in it, some of which is in arabic characters.

Using the MS tools to manage the DB i can see these no problem. I am trying to output this data to a web page.

In the page itself i am just getting ? instead of the arabic.

So far i have found that arabic is character set
ISO-8859-6
or the windows version is
CP1256

In my HTML i have
Code:
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


Then in my action that gets the actual data i am filling up an array for each row found, as follows;-
Code:
$resultArr[$j]['Series_Id'] = odbc_result($result, 'Series_Id');
$resultArr[$j]['Title_Id'] = odbc_result($result, 'Title_Id');
$resultArr[$j]['Series_Name'] = iconv("CP1256", "UTF-8", (odbc_result($result, 'Series_Name')));


As you can see on the Series_Name field i am trying to use iconv, but it is not working. I think it is right to use iconv (and i have checked i have it installed and enabled).

Can anyone suggest how to fix this line of code to output the right character set.

Thanks,
 
have you set the character set on the odbc connection manager (maybe using COM rather than the vanilla library)

and are you using a display font that has the necessary characters for the UTF that you are outputting?
 
have you set the character set on the odbc connection manager

... i will take a look at that, do you mean that i should be telling the connection to give me the right character set (i.e. the same one as is in the database).

and are you using a display font that has the necessary characters for the UTF that you are outputting?

Not sure what you mean by a display font in this case? The font displaying it to the web? The pages should be served as UTF-8, but it is possible that UTF8 does not have the characters needed. It is for an IE only audience (intranet) so windows-1256 could work, but i would like to find a more universal solution since i dev in ff.
 
second question first.

2. utf-8 will have the characters available. but not all font-sets will have glyphs for every entry in the utf-8 symbol table. where a glyph is missing you will normally get a question mark or such other default character. so debug with standard arial/times fontsets.

1. if it's not a simple font issue then ensuring that the all elements in the data chain are utf-8 will help. this means the database tables and the database connection. i do not think that odbc does this directly but you can set the flag in the odbc connection applet in control panel (for windows) or use ADODB via COM. I'm not sure how it works with MS Sql but it is def. advisable for mysql to have the database and connection set to the same charset.
 
Thanks for the reply.

I had already found the setting in the ODBC connector that sets the flag on the connection, also as mentioned i can see the right characters in the DB with the SQL server management studio.

I have also ensured that the font is set to arial in the CSS, and the pages are showing in the browser properties as being in UTF-8.

I have managed to copy and paste a sample of the arabic into my php editor (Komodo) and it does appear there, and if i print that to the browser it does show up correctly.

So, i think there is a break somewhere in my PHP between it coming from the DB and getting to the page.


After looking into this further, i have found that MS SQL does NOT support UTF-8 instead it uses proprietry MS lib for it which is confusing the issue, something called UCS-2LE.

So, i have now got alot of random characters on the screen which are not just ??? anymore, but nor are they anything like arabic.

Needless to say i am very confused now as i have not been able to find a single example on the web and my boss is telling me it can be done in java and .net, and if php can't do it then time to ditch php (which does not look good for me).

What i really need to know is what collation i need to set on the DB and how to get info out of the db without being mangled. I spoke to a DBA here and he advised he solved this by switching to Oracle...obviously not ideal.

 
are you able to switch to COM rather than using ODBC?
 
the alternatives to com are, i believe:

1. stipulate the datatypes of the relevant columns as binary. (disadvantage is that you lose searching/sorting).
2. iconv to/from UCS-2 for storage/output.


in both cases make sure that your forms specify the relevant charset too.
Code:
<form accept-charset="UTF-8" method="post" enctype="application/x-[URL unfurl="true"]www-form-urlencoded"[/URL] action="somewebpage.php">

so far as I can work out, the problem is nothing to do with php: it's purely that MS does _not_ accept native UTF-8 in mssql. alternative databases might make life a lot easier for you (sqlite/mysql etc). i guess that's not practical though?
 
Hi,

I can change my connection to COM, and i have tried that, but i cannot see how that changes things.
I also tried ADODB but i could not find how that changed things either, since most of the documentation talks about just connecting, or about using MySql.
I am sure that it is possible with com or ado db, but i could not find the relevant 'how to'.

Unfortunately i cannot really change either the database in use to another type (mysql), or change the data in it to be a binary field.

From what i can find on the internet, the reason .NET and Java do better with MSSQL is because they can convert from
UCS-2LE to UTF-8 internally (as they natively support the UCS-2LE). Is this what i would be trying to do with com?
 
it changes things because the adodb library is, i believe, utf-8 aware and so spins the data into the right flavour for MS on the fly. the process is then transparent for you.

to keep with odbc try encoding the data to UCS-2LE before storage and back before display

Code:
$resultArr[$j]['Series_Id'] = odbc_result($result, 'Series_Id');
$resultArr[$j]['Title_Id'] = odbc_result($result, 'Title_Id');
$resultArr[$j]['Series_Name'] = iconv("UTF-8", "UCS-2LE", (odbc_result($result, 'Series_Name')));
 
[SOLVED]
Thanks again for the help.
I tried to use ADODB, but ran into errors relating to length of Unicode field, which led to info telling me that ADO uses the ODBC driver 6.5, so again a non-starter.


So, after going through all this, i have learned that when it comes to wide characters, each different character set has its own problems.

The fix for this was to use a new driver from Microsoft for PHP!
Yes i was surprised about this too, but in the interests of interoperability they have created a new dll ext for PHP.

Here is a link.



Basically, you use the newer driver as you would the mssql functions, but for unicode data you can retrieve a single field as binary data and iconv it from UTF16-LE to UTF-8.

The specific details of this can be found here;-



Thanks very much for all the help, i understand all this much more now.

:):)
 
Well the field in the database is converted from a string to binary and then passed to php and deconverted, so if it shows up in the database then it can get to the PHP.

I suppose that if the collation supports unicode characters in the db then you are ok to use this method.

Is that what you meant?
 
sorry, no. i was thinking that doing sorting on a binary field will not get you the same result as sorting on a text field.
 
Oh i see what you mean, sorting is made difficult, but i think according to the MSDN site is still possible because it is only when you retrieve the data that it is converted to binary, the sorting has already happened as part of the 'fetch'.

From my point of view, it is only title and text data in arabic and sorting will be done on other identifying fields which are all some form of int.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top