Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

XML: Text is not what the server sent

Status
Not open for further replies.

OnQueIT

Programmer
Nov 12, 2003
18
0
0
US
I am receiving a EULA from a server. When I get it and look at what I got, the text has been changed. Some of the EULA is in French which has words with a grave accent (è) encoded, but when I get it, the letters are changed to a question mark(?). Example: pr?ntes should be présentes. Any clues as to why this is happening would be appreciated.

I am coding this in FoxPro. I am creating the xml manually before "calling" the server to request data. Here is similar code I am using to POST and receive data back from the server:
lcXML = [<?xml version=&quot;1.0&quot; encoding = &quot;ISO-8859-1&quot;?>]
lcXML = lcXML + CHR(10) + [<Request>]
lcXML = lcXML + CHR(10) + [<TransactionReference>]
etc, etc, etc...
lcXML = lcXML + CHR(10) + [</TransactionReference>]
lcXML = lcXML + CHR(10) + [</Request>]

oHTTP = CREATEOBJECT(&quot;Microsoft.XMLHTTP&quot;)
oHTTP.Open(&quot;POST&quot;, {Server Address} ,.F.)
oHTTP.setRequestHeader(&quot;Content-Type&quot;,&quot;text/xml&quot;)
oHTTP.Send(lcXML)
orXML = oHTTP.responseText

Am I doing request correctly? If so, what should I do to get the exact text that the server is sending me?

Thanks in advance for any help...
Chris
 
ISO-8859-1 encoding can have problems with accented characters. Try switching to UTF-8.

Also - there's no need to include the linefeed character (CHR(10)) in the XML -- it's not part of the W3C XML spec. If the receiving site requires it, then it's broken and needs to be fixed. Usually this is a sign that someone is reading the file line-by-line as if it were a CSV file.

Chip H.


If you want to get the best response to a question, please check out FAQ222-2244 first
 
One other encoding-related issue -- When receiving the XML from the remote server, make sure you load it into a DOM (or possibly use SAX to parse it). If you do your typical &quot;get between text&quot; kind of function (Mid$, etc., I forget the name of it in FoxPro), you will have problems reading the accented characters (since UTF-8 is a multi-byte encoding). By using a DOM with the selectSingleNode calls to get individual nodes, you'll avoid having code-page problems.

Chip H.


If you want to get the best response to a question, please check out FAQ222-2244 first
 
I tried the UTF-8 but this did not solve my problem. Parsing the received xml is easy, its what happens to it between it getting sent and my program receiving it. I'm getting &quot;&quot;&quot; and &quot;&apos;&quot; in place of quote marks and apostrophy marks. The server &quot;people&quot; said that the encoding they use and test with it ISO-8859-1.

My question is why does the data getting changed. Is this what encoding does or is it something else? I'm not trying to be difficult, just to understand how this all works. Thanks again for helping..

Chris
 
Either the server's parser or your parser changes those characters to the XML general entities (ampersand etc. semicolon).

Why is this a problem for you? When the data is retrieved from the parser, the general entities are replaced with the corresponding characters so the data will look correct to your program.
 
I receive this text, display it to the end user and get some additional information. The additional information along with the text (EULA) gets sent back to the server that originally sent it to me. Its then verified to make sure the text has not been alter from what they originally sent. The problem is that the server is sending a hard error back stating that the EULA does not match the original and I'm not changing anything in the EULA text. I don't how or why the server thinks the text has changed...
 
As far as XML is concerned, the characters in the set [><&'&quot;] are the same as their corresponding general entities. Of this set, only two characters (&<) must be converted to general entities to ensure that the XML is well-formed. Different parsers are more/less aggressive on the remaining subset.

Your problem is:

1. Your parser is more aggressive than the server's parser in converting characters to general entities
2. The server's post-processing comparison is performed as though the inputs are plain text, rather than XML.

If you're on a UNIX platform, you can easily use sed on your XML output to resubstitutes quote characters for the general entities before returning the EULA to the server.

You could also get the folks who do the comparison to fix their bloody compare routine, which is broken: in XML by definition, general entities are the same as the characters they stand in for.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top