Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

PHP + XML->XSL transformation encoding of 0225 character

Status
Not open for further replies.

csbdeady

Programmer
May 18, 2002
119
GB
Hi

I am trying to accept the character á (0225 - an 'a' with an accent) in a web form that is processed by PHP. I am using PHP PHP 5.0.4 with Apache 2.0.54 on Windows (although later this will be moved to Fedora)

In my form, I have a text entry field. If I enter standard ascii, for example "a" in the field and submit then an XML file is created successfully containing this value. If alternatively I enter "á" (0225 - an 'a' with an accent) into the field and submit I get the following:

Code:
Warning: DOMDocument::load() [function.load]: Input is not proper UTF-8, indicate encoding ! in file:///F%3A/[URL unfurl="true"]www/htdocs/at/xml/isotest12.xml,[/URL] line: 2 in F:\[URL unfurl="true"]www\htdocs\at\modules\mdlxslt.php[/URL] on line 47

Warning: DOMDocument::load() [function.load]: Bytes: 0xE1 0x26 0x23 0x31 in file:///F%3A/[URL unfurl="true"]www/htdocs/at/xml/isotest12.xml,[/URL] line: 2 in F:\[URL unfurl="true"]www\htdocs\at\modules\mdlxslt.php[/URL] on line 47

Warning: ..\xpath.c:11046 Internal error: document without root in F:\[URL unfurl="true"]www\htdocs\at\modules\mdlxslt.php[/URL] on line 88

My XSL Transformation file does however include the encoding:

Code:
<xsl:stylesheet xmlns:xsl="[URL unfurl="true"]http://www.w3.org/1999/XSL/Transform"[/URL] version="1.0">

<xsl:output method="xml" encoding="UTF-8" standalone="yes" indent="yes" />

If I change the encoding to ISO-8859-1 I get into even more of a pickle:

Code:
Warning: output conversion failed due to conv error in F:\[URL unfurl="true"]www\htdocs\at\modules\mdlxslt.php[/URL] on line 88

Warning: Bytes: 0xE1 0x3C 0x2F 0x74 in F:\[URL unfurl="true"]www\htdocs\at\modules\mdlxslt.php[/URL] on line 88

Warning: DOMDocument::load() [function.load]: encoder errorPremature end of data in tag testcomments line 2 in file:///F%3A/[URL unfurl="true"]www/htdocs/at/xml/isotest13.xml,[/URL] line: 2 in F:\[URL unfurl="true"]www\htdocs\at\modules\mdlxslt.php[/URL] on line 47

Warning: DOMDocument::load() [function.load]: Premature end of data in tag testcase line 2 in file:///F%3A/[URL unfurl="true"]www/htdocs/at/xml/isotest13.xml,[/URL] line: 2 in F:\[URL unfurl="true"]www\htdocs\at\modules\mdlxslt.php[/URL] on line 47

Warning: DOMDocument::load() [function.load]: Premature end of data in tag testsuite line 2 in file:///F%3A/[URL unfurl="true"]www/htdocs/at/xml/isotest13.xml,[/URL] line: 2 in F:\[URL unfurl="true"]www\htdocs\at\modules\mdlxslt.php[/URL] on line 47

Warning: ..\xpath.c:11046 Internal error: document without root in F:\[URL unfurl="true"]www\htdocs\at\modules\mdlxslt.php[/URL] on line 88

UTF-16 is not supported (it just generates a blank XML file).

The code that actually performs the transformation in PHP is as follows:

Code:
	// Load the XML and XSL sources
	$xml = new DomDocument;
	$xml->load( 'xml/'.$_GET['filename'] );

	$xsl = new DomDocument;
	$xsl->load( "xsl/".$_GET['xslfile'] );

	// Configure the transformation
	$proc = new xsltprocessor;

	// for each variable in the $_GET array we need to set a parameter to be used in the XSL transformation
 	foreach( $_GET as $lnstrVal )
	{
	  $proc->setParameter( "", key( $_GET ) , $lnstrVal );
      next($_GET);
	}

	$proc->importStyleSheet( $xsl );

	// execute the transormation and save the XML
    $blnWriteSuccess = writefile( "xml/", $_GET['filename'], $proc->transformToXML( $xml ) );

Please can someone suggest where I need to perform the encoding and what encoding I should use for special characters such as these (ie: non-Western European, eg: Hungarian)?
 
I think I stumbled onto the solution to the problem - documenting it here in case anyone else has this.

The correct encoding is UTF-8 - however the text from the form needs to be encoded in this before being parsed by the XSL transformation into XML. To do this I simply change the code:
Code:
    {
      $proc->setParameter( "", key( $_GET ) , $lnstrVal );
      next($_GET);
    }
To:
Code:
    {
      $proc->setParameter( "", key( $_GET ) , utf8_encode( $lnstrVal ) );
      next($_GET);
    }

Simple solutions are the best :)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top