I am having a problem dealing with named entities while trying to validate an XML file against a schema.
I have some XML files that include named entities (e.g., endash, etc.) in several places. I am trying to build a PHP script that will validate these XML files against a schema file and then transform them with an XSL stylesheet.
From what I understand schemas can't handle character entities so the xml file also has a small DTD at the top to deal with these special characters.
Here is a simplified version of the type of XML I am talking about:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE case [
<!ENTITY mdash "--">
]>
<case xmlns:case="xxx">
<dockets>
<docnum>05—1789<docnum>
<docnum>05—1790<docnum>
<docnum>05—1791<docnum>
</dockets>
</case>
Here is an example part of the XSD file dealing with the dockets and docnum tags:
<xs:element name="dockets">
<xs:complexType>
<xs:sequence>
<xs:element name="docnum" type="xs:string" minOccurs="1" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
If I leave the DTD portion of the XML file out, I get errors because of the undeclared entities. If I leave the DTD portion in, I get a "unimplemented block" error and for some reason an error that implies that I am using an element within the docnum tag (see below).
Errors:
Warning: DOMDocument::schemaValidate() [function.DOMDocument-schemaValidate]: Unimplemented block at xmlschemas.c:23832 in /home/edomatic/ on line 17
Warning: DOMDocument::schemaValidate() [function.DOMDocument-schemaValidate]: Element 'docnum': Element content is not allowed, because the type definition is simple. in /home/edomatic/ on line 17
I don't really care about the entities in the validation process as they will be changed to numeric entities when the file is transformed anyway, but I want to be able to validate the document prior to transformation. From what I understand the DTD should have changed the mdash entities into double hyphens just prior to the actual validation, so why is it causing an error? How would you normally validate a file with a schema where undeclared entities are present?
Thanks for any help
I have some XML files that include named entities (e.g., endash, etc.) in several places. I am trying to build a PHP script that will validate these XML files against a schema file and then transform them with an XSL stylesheet.
From what I understand schemas can't handle character entities so the xml file also has a small DTD at the top to deal with these special characters.
Here is a simplified version of the type of XML I am talking about:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE case [
<!ENTITY mdash "--">
]>
<case xmlns:case="xxx">
<dockets>
<docnum>05—1789<docnum>
<docnum>05—1790<docnum>
<docnum>05—1791<docnum>
</dockets>
</case>
Here is an example part of the XSD file dealing with the dockets and docnum tags:
<xs:element name="dockets">
<xs:complexType>
<xs:sequence>
<xs:element name="docnum" type="xs:string" minOccurs="1" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
If I leave the DTD portion of the XML file out, I get errors because of the undeclared entities. If I leave the DTD portion in, I get a "unimplemented block" error and for some reason an error that implies that I am using an element within the docnum tag (see below).
Errors:
Warning: DOMDocument::schemaValidate() [function.DOMDocument-schemaValidate]: Unimplemented block at xmlschemas.c:23832 in /home/edomatic/ on line 17
Warning: DOMDocument::schemaValidate() [function.DOMDocument-schemaValidate]: Element 'docnum': Element content is not allowed, because the type definition is simple. in /home/edomatic/ on line 17
I don't really care about the entities in the validation process as they will be changed to numeric entities when the file is transformed anyway, but I want to be able to validate the document prior to transformation. From what I understand the DTD should have changed the mdash entities into double hyphens just prior to the actual validation, so why is it causing an error? How would you normally validate a file with a schema where undeclared entities are present?
Thanks for any help