Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Validating XML (with entities) using PHP

Status
Not open for further replies.

opengavel

Programmer
Sep 18, 2006
5
US
I am having a problem dealing with named entities while trying to validate an XML file against a schema.

I have some XML files that include named entities (e.g., endash, etc.) in several places. I am trying to build a PHP script that will validate these XML files against a schema file and then transform them with an XSL stylesheet.

From what I understand schemas can't handle character entities so the xml file also has a small DTD at the top to deal with these special characters.

Here is a simplified version of the type of XML I am talking about:


<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE case [
<!ENTITY mdash "--">
]>
<case xmlns:case="xxx">
<dockets>
<docnum>05&mdash;1789<docnum>
<docnum>05&mdash;1790<docnum>
<docnum>05&mdash;1791<docnum>
</dockets>
</case>


Here is an example part of the XSD file dealing with the dockets and docnum tags:


<xs:element name="dockets">
<xs:complexType>
<xs:sequence>
<xs:element name="docnum" type="xs:string" minOccurs="1" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>


If I leave the DTD portion of the XML file out, I get errors because of the undeclared entities. If I leave the DTD portion in, I get a "unimplemented block" error and for some reason an error that implies that I am using an element within the docnum tag (see below).

Errors:


Warning: DOMDocument::schemaValidate() [function.DOMDocument-schemaValidate]: Unimplemented block at xmlschemas.c:23832 in /home/edomatic/ on line 17

Warning: DOMDocument::schemaValidate() [function.DOMDocument-schemaValidate]: Element 'docnum': Element content is not allowed, because the type definition is simple. in /home/edomatic/ on line 17


I don't really care about the entities in the validation process as they will be changed to numeric entities when the file is transformed anyway, but I want to be able to validate the document prior to transformation. From what I understand the DTD should have changed the mdash entities into double hyphens just prior to the actual validation, so why is it causing an error? How would you normally validate a file with a schema where undeclared entities are present?

Thanks for any help
 
Bump, please help...someone must have encoutered this issue already.
 
Bump not appreciated...however...

One way to determine whether someone has encountered this is to Google [google]Unimplemented block at xmlschemas.c 23832 bug[/google] and discover that this is an unimplemented area in libxml2, which is what you must be using. One of the documents found (near the end of the list) indicates little enthusiasm (in 2005) for an implementation. But, it is free, right?

Perhaps you can find some PHP plug in that uses a different XML processor.

Tom Morrison
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top