Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

BUG: XMLToCursor has a parse bug when a node value only is a period.

Status
Not open for further replies.

Olaf Doschke

Programmer
Oct 13, 2004
14,847
DE
To everybodies attention.

In thread thread184-1805853 I discussed a lot which has one important essence that should become more generally known. David Higgs actually came across a bug of XMLParsing in XMLToCursor.
Here's a reproduction of the bug. It involves XML that is valid in itself and has sufficiently enough nested levels so XMLToCursor doesn't reject it with parse errors of the XML root note or "unable to infer XML schema". It still causes a parse error because a node value that would convert to a field in a record of the cursor XMLToCursor creates is just a period:
Code:
Local lcXML

Text To lcXML NoShow
<?xml version="1.0" encoding="Windows-1252" standalone="yes"?>
<VFPData>
   <tablename>
      <fieldname>.</fieldname>
   </tablename>
</VFPData>
EndText

On Error ? Message(), "in Line", Lineno()

XMLToCursor(lcXML,"crsFromXMLfails")
lcXML = Strtran(lcXML,">.<",">.a<")
XMLToCursor(lcXML,"crsFromXMLworks")
Select crsFromXMLworks
Browse

This demonstrates only exactly the value "." fails with a parse error and I couldn't reproduce this parse problem with MSXML2.SAXXMLReader.3.0 nor MSXML2.SAXXMLReader.4.0.
I'm still not sure whether the problem is within the MSXML library VFP uses or in VFP and its runtime itself.

[highlight #FCE94F]Edit: It does not only turn out the problem disappears when you create a cursor in advance and use nFlags=8192 so type inference is skipped, it also turns out when there are multiple records and other values are not just ".", then type inference decides for a char field again. So more precisely the problem is inferring a numeric field type and then failing in evaluating "." as a number. [/highlight]

So the bug must be on the VFP side of what is interpreted as a numeric value, despite not being able to convert it to a numeric value. Just because a period is one of the characters that are allowed within numeric values as decimal point, standing alone it's not a number, just like a minus or plus sign isn't. And just checking, a single "+" or "-" are not inferred as a numeric type.

XMLToCursor isn't a universal XML conversion anyway and also the XMLAdapter only can convert XML that adheres to a structure including XML nodes that can be parsed to a table with rows and their fields. You obviously have to have a repeating structure of nodes with same names and datatypes to be able to create records (with the corner case of a single record, obviously needing no repeats).

XMLToCursor mainly exists as inverse operation to CursorToXML(), and XML not working sometimes can easily enough be modified for XMLToCursor() to work. A string replacement of ">.<" could be considered before any XMLToCursor() call as a workaround. But in general you wouldn't know whether the datatype working with the rest of the nodes resembling table records should be a string, numeric or other datatype, so you can only mend XML with more knowledge about its structure.

There are alternatives that all can also make a totally different conversion of XML to an object and I refer to them as replacement and extension of VFP XML functionalities. I know wwXML even existed before VFP7 introduced the first few XML functions in VFP.

wwXML: or more directly: with docs at nfXML: atlopes/xml:
And for simple cases StrExtract is a function, which extracts strings between begin and end delimiters you specify, which, of course, also can be HTML or XML tags:
Code:
Clear
lcPartialXML = '<tag>nodevalue</tag>'
? 'Extracting node values'
lcNodevalue = StrExtract(lcPartialXML,'<tag>','</tag>')
? lcNodevalue

* problem case: The opening tag has attributes:
lcPartialXML = '<tag attribute="attributevalue">nodevalue</tag><othertag />'
lcTag = StrExtract(lcPartialXML,'<tag','</tag>')
lcNodevalue = StrExtract(lcTag,'>','',1,2)
? lcNodevalue

?
? 'Extracting attribute values'
* extracting an attributevalue:
? StrExtract(lcPartialXML,'attribute="','"')
* within a specific tag:
lcTag = StrExtract(lcPartialXML,'<tag','</tag>')
? StrExtract(lcTag,'attribute="','"') 

?
? 'Extracting all tags'
* extracting all tags
For nCount = 1 to Occurs('<',lcPartialXML)
    ? StrExtract(lcPartialXML,'<','>',nCount,4)
Endfor
Don't assume this to cover all cases, though. For example, attributes values might not be delimited (directly) with double or even single quotes, but attribute=value also is working HTML and only a strict XML parser will expect double quotes around the attribute value.

I wouldn't recommend to get much more complex and write your own XML parser, that already has been done in the mentioned libraries. And one of their advantages is you can also turn XML into objects like the HTML/XML DOM (document object model) in general is. You then even more so have to know which XML nodes are nested into each other to know the full XML object name path to a specific inner node, like in the bug reproduction example would be oXMLobject.VFPData.tablename.fieldname. XPath queries help with finding nodes in XML in varying positions (nesting levels), but then that's another topic.

[highlight #FCE94F]Edit: In conclusion we do now know the nature of the problem is in type inference followed by the inability to convert the node value from its string form "." into a number. The nature of type inference seems to be to parse the full XML and decide the data type by what fits for all record values, so not having a cursor or dbf in advance a less practical workaround is adding another record in the XML with a value that's not misinterpreted as a number.

And, last not least, that should help to track down the bug in the XMLToCursor implementation.[/highlight]

Bye, Olaf.

Olaf Doschke Software Engineering
 
Another workaround by atlopes:

First, create the result cursor, then use nflags=8192 on XMLToCursor so it does skip a step of data type inference. Which narrows the problem to that step of XMLToCursor.
This also means knowing what to exactly expect from the XML in advance.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Pardon my ignorance, but isn't the problem that a fieldname can't be a period?
 
No, the innertext of nodes will never become a fieldname, node names become field names, that's why the XML sample has these nodes names.
Also see the output of CursorToXML, It is as the sample XML if you create a dbf or cursor called "tablename" and let it have a field called "fieldname".´

Besides that, when this fieldname is a char(1) field or something like it and it has "." as its value. CursorToXML has no problem generating exactly that XML.

Bye, Olaf.

Olaf Doschke Software Engineering
 
isn't the problem that a fieldname can't be a period

It's a good point, Dan, but the problem arose because a data-entry form asked for the user's name, and the user - rightly or wrongly - entered a period instead.

As I mentioned in the other thread, I think that particular case was as much a user interface issue as anything else. If the business rules require a name to consist of a string of alphabetic characters, then that rule should be enforced at data entry.

However, as it happens, this case brought to light an apparent bug in VFP's XMP parsing, by which a single period caused a crash. David Higgs discovered the bug and Olaf is now exploring it in more detail.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
I mean, you could also have the data about a table structure as what AFIELDS outputs or COYP STRUCTURE, then you'd get this XML. But that would be very meta and not contain the table data, just its structure:

Code:
<?xml version = "1.0" encoding="Windows-1252" standalone="yes"?>
<VFPData>
	<tablestructure>
		<field_name>FIELDNAME</field_name>
		<field_type>C</field_type>
		<field_len>1</field_len>
		<field_dec>0</field_dec>
		<field_null>false</field_null>
		<field_nocp>false</field_nocp>
		<field_defa/>
		<field_rule/>
		<field_err/>
		<table_rule/>
		<table_err/>
		<table_name/>
		<ins_trig/>
		<upd_trig/>
		<del_trig/>
		<table_cmt/>
		<field_next>0</field_next>
		<field_step>0</field_step>
	</tablestructure>
</VFPData>

And that's also not how XML stores its structure. Well, because XML isn't about resembling tables in the first place. It is far closer to being an object notation like JSON is. And an object that is a collection of records with always same structure then is one very specific case that fits one table or cursor. So it's rare XMLToCursor is enough to read in any XML.

And just by the way, most of the tags are selfcloseing, so XMLToCursor wouldn't automatically detect by the field names this is the structure of a DBF you create with COPY STRUCTURE EXTENDED, it would have no values and so no clue about what data type these columns are. And that's because I left out an inline schema, which would bloat the XML much more.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top