Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Can't parse "<xyz type="string">hiho</xyz>

Status
Not open for further replies.

gpsmitty

Programmer
Dec 5, 2014
7
US
I'm new to XML but I'm having a problem parsing generated XML and its driving me crazy.
My application uses C++ to generate an XML file and then subsequently uses java to read it in to extract content. It fails when it tries to read. I'm using xercesImpl.jar

The generated XML contains lines like this: <xyz type="string">hiho</xyz>
Everything works fine if I change these lines to look like this: <xyz type="string" value="hiho" />

Any ideas will be VERY appreciated!

Following is an example of XML that's generated and failing to be read/parsed:
<?xml version="1.0" ?>
<!DOCTYPE file_parameters[]= >
<file_parameters>
<compression_flag type="string">false</compression_flag>
<delivery_category type="string">deploy</delivery_category>
.
.
</file_parameters>
 
What failure diagnostic do you get?

Can a simple XML editor load/parse your XML document?

Can you show the entire document instead an excerpt? For example, it is possible that one of the text nodes (which your workaround causes to be absent) has a value that is causing the document not to be valid.

Are you also using Xerces C++ on the C++ side? If not, what is your mechanism/library for creating the XML document?

Tom Morrison
Hill Country Software
 
Thanks for questions - sorry for delay in responding. Following is a complete listing of XML. The c++ software that's trying to read the xml is failing, but any XML errors produced are captured by code but not displayed.

<?xml version="1.0" ?>
<!DOCTYPE file_parameters[]= >
<file_parameters>
<compression_flag type="string">false</compression_flag>
<delivery_category type="string">deploy</delivery_category>
<destinations type="stringlist">
<item>XYZ</item>
</destinations>

<exercise_flag type="string">false</exercise_flag>
<filenames type="stringlist">
</filenames>

<message_type type="string">Xtype</messagetype>
<source_name types="string">NWtrans</source_name>
<subject type="string">Stonybrook</subject>
</file_parameters>

 
The opening and closing element name (a.k.a. tag name) are mismatched here:
XML:
<[highlight #FCE94F]message_type[/highlight] type="string">Xtype</[highlight #FCE94F]messagetype[/highlight]>

Tom Morrison
Hill Country Software
 
I am not sure about the embedded DOCTYPE instruction.

Tom Morrison
Hill Country Software
 
</messagetype> is my typo - My bad, I am on two different machines and I can't cut & paste.

should be <?message_type>

Thanks,
 
I tried it with and without embedded DOCTYPE instruction.

No change in behavior
 
gpmitty said:
Thanks for questions - sorry for delay in responding.

Ok, instead of answering questions, you provided a diversion. Sorry for being a bit terse, but please answer the questions. They were asked as a means to determine your problem.

On any furure posts, please provide a high fidelity representation of the troubled XML document, so that typos may be avoided.

If you are lacking a free XML editor, try here.

Tom Morrison
Hill Country Software
 
Terse" is Ok,

I did not try to bring the xml up in an editor, but I did check the syntax of both versions of the XML via xmllint - xmllint reported both were good.

Based on the build.xml that used to build the c++ side, version ACE_TAO_6.2.0 of the ACE_TAO parser is being used - i.e. libACEXML_Parser.so.

Because both versions of the XML are apparently valid as far as syntax goes, I think we can ignore how the XML is generated at this point, can't we? But for the record, the java side is using xercesImpl.jar that is being referenced from an older version of jboss that's installed on my system - i.e. Its found in ../jboss-4.2.2/lib/endorsed/xercesImpl.jar.

Thanks again,


 
Hi there,

when I paste this in firstobject XML Editor (free), it expectedly spits in sight of this:
<!DOCTYPE file_parameters[]= >

a) there should not be a whitespace before the closing tip bracket.
b) where there is an equal sign, there should follow something (an attribute) - or no = sign.

Try changing the XML head to this and see if it can be parsed:
<!DOCTYPE file_parameters>

Cheers,
MakeItSo

ôKnowledge is power. Information is liberating. Education is the premise of progress, in every society, in every family.ö (Kofi Annan)
Oppose SOPA, PIPA, ACTA; measures to curb freedom of information under whatever name whatsoever.
 
Sorry for delay in responding to you - My bad again as far as introducing typos.

The xml I posted was manually typed, while looking at two machines that were yards apart - my excuse;) That's how the errors you caught were introduced.

Then only difference between the two valid versions of the xml is that that one has lines with just one attribute, and the other has lines with two attributes. The linux xml format checker says both version are valid.

The code I'm working on will successfully process xml with lines like the following - i.e. when it looks for the value of compression_flag it will return "false":
<compression_flag type="string" value="false" />

The code I'm working on will fail to handle lines like the following - i.e. it will not find a value for compression_flag. But the parser does not complain about anything:
<compression_flag type="string">false</compression_flag>

The xml that parses correctly has two attributes where the 2nd attribute, the value= attribute, contains the value and the program successfully retrieves the value.

Thanks for you input.

 
Sorry but it is really difficult trying to help if the material you provide here does not match the actual error-causing material!

If the xml truly parses when the value is provided as attrbiute rather than as value, then I suspect that there is a SCHEMA or DTD behind in which the compression_flag element is defined as having no value, only attributes.
The XML code you provided does not show any linked schema.

Please provide EXACT information. We are all busy people, helping voluntarily, and I for one am not willing to waste my time racking my brain over incorrect material!

ôKnowledge is power. Information is liberating. Education is the premise of progress, in every society, in every family.ö (Kofi Annan)
Oppose SOPA, PIPA, ACTA; measures to curb freedom of information under whatever name whatsoever.
 
I hope the following provides all the details that will
we need to shed some light on my problem? Thanks for you time.

The following files send1.xml and send1.xml both appear to have valid xml but
send1.xml is processed correctly and send2.xml is not.

File send1.xml is processed correctly:
-------------------------------------
<?xml version="1.0"?>
<!DOCTYPE file_parameters [] >
<file_parameters>
<compression_flag type="string" value="false"/>
<delivery_category type="string" value="Priority"/>
<destinations type="stringlist">
<item type="string" value="XYZ1"/>
</destinations>

<exercise_flag type="string" value="false"/>
<filenames type="stringlist">
</filenames>

<message_type type="string" value="encripted"/>
<source_name type="string" value="gpsmitty"/>
<subject type="string" value="GPS"/>

</file_parameters>



File send2.xml is not processed correctly:
------------------------------------
<?xml version="1.0"?>
<!DOCTYPE file_parameters [] >
<file_parameters>
<compression_flag type="string">false</compression_flag>
<delivery_category type="string">Priority</delivery_category>
<destinations type="stringlist">
<item>XYZ1</item>
</destinations>

<exercise_flag type="string">false</exercise_flag>
<filenames type="stringlist">
</filenames>

<message_type type="string">encripted</message_type>
<source_name type="string">gpsmitty</source_name>
<subject type="string">GPS</subject>

</file_parameters>

//The code that I'm having a problem with uses ACE+TAO
//to parse the files. Following is an abbrevoated sequence
//of code that is run, once with filename = send1.xml and
//once again = send2.xml.
//The constructor for ACEXMLConfigLoader(...) turns the input .xml
//file into a filestream named fileStream and then invokes the ACE
//constructor for ACEXML_InputSource(fileSstream) as per following.
//ACEXML_InputSource* _input = new ACEXML_InputSource(fileStream)

//Following is the abreviated sequence of my code:
ACEXMLConfigLoader* loader = new ACEXMLConfigLoader(filename);
std::cout<<loader.toString() = " << loader->toString() <<std::endl

When I run the code with send1.xml, toString() puts out the following:
loader.toString() =
<file_parameters>
<compression_flag type="string" value="false"/>
<delivery_category type="string" value="Priority"/>
<destinations type="stringlist">
<item type="string" value="XYZ1"/>
</destinations>

<exercise_flag type="string" value="false"/>
<filenames type="stringlist">
</filenames>

<message_type type="string" value="encripted"/>
<source_name type="string" value="gpsmitty"/>
<subject type="string" value="GPS"/>
</file_parameters>

But when I run the code with send2.xml, toString() puts out the following,
most of the xml is missing:
loader.toString() =
<file_parameters>
<destinations type="stringlist">
</destinations>

<exercise_flag type="string" value="false"/>
</filenames>

</file_parameters>


The problem presents presents itself to the application when
code subsequently tries to extract tag values from the xml.
With send1.xml all the tag values can be retrieved successfully but that's not
the case when I use send2.xml. And I can't find a schema or
DTD that's used by the code!








 
Have you tried this with something a bit more (ahem!) standard than ACEXML, which does not seem to be exceptionally well documented and what doc I can find talks a lot a bout non-standard parser, and known bugs, and...

Since you are in Java, how about trying Xerces?

Tom Morrison
Hill Country Software
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top