Olaf Doschke
Programmer
To everybodies attention.
In thread thread184-1805853 I discussed a lot which has one important essence that should become more generally known. David Higgs actually came across a bug of XMLParsing in XMLToCursor.
Here's a reproduction of the bug. It involves XML that is valid in itself and has sufficiently enough nested levels so XMLToCursor doesn't reject it with parse errors of the XML root note or "unable to infer XML schema". It still causes a parse error because a node value that would convert to a field in a record of the cursor XMLToCursor creates is just a period:
This demonstrates only exactly the value "." fails with a parse error and I couldn't reproduce this parse problem with MSXML2.SAXXMLReader.3.0 nor MSXML2.SAXXMLReader.4.0.
I'm still not sure whether the problem is within the MSXML library VFP uses or in VFP and its runtime itself.
[highlight #FCE94F]Edit: It does not only turn out the problem disappears when you create a cursor in advance and use nFlags=8192 so type inference is skipped, it also turns out when there are multiple records and other values are not just ".", then type inference decides for a char field again. So more precisely the problem is inferring a numeric field type and then failing in evaluating "." as a number. [/highlight]
So the bug must be on the VFP side of what is interpreted as a numeric value, despite not being able to convert it to a numeric value. Just because a period is one of the characters that are allowed within numeric values as decimal point, standing alone it's not a number, just like a minus or plus sign isn't. And just checking, a single "+" or "-" are not inferred as a numeric type.
XMLToCursor isn't a universal XML conversion anyway and also the XMLAdapter only can convert XML that adheres to a structure including XML nodes that can be parsed to a table with rows and their fields. You obviously have to have a repeating structure of nodes with same names and datatypes to be able to create records (with the corner case of a single record, obviously needing no repeats).
XMLToCursor mainly exists as inverse operation to CursorToXML(), and XML not working sometimes can easily enough be modified for XMLToCursor() to work. A string replacement of ">.<" could be considered before any XMLToCursor() call as a workaround. But in general you wouldn't know whether the datatype working with the rest of the nodes resembling table records should be a string, numeric or other datatype, so you can only mend XML with more knowledge about its structure.
There are alternatives that all can also make a totally different conversion of XML to an object and I refer to them as replacement and extension of VFP XML functionalities. I know wwXML even existed before VFP7 introduced the first few XML functions in VFP.
wwXML: or more directly: with docs at nfXML: atlopes/xml:
And for simple cases StrExtract is a function, which extracts strings between begin and end delimiters you specify, which, of course, also can be HTML or XML tags:
Don't assume this to cover all cases, though. For example, attributes values might not be delimited (directly) with double or even single quotes, but attribute=value also is working HTML and only a strict XML parser will expect double quotes around the attribute value.
I wouldn't recommend to get much more complex and write your own XML parser, that already has been done in the mentioned libraries. And one of their advantages is you can also turn XML into objects like the HTML/XML DOM (document object model) in general is. You then even more so have to know which XML nodes are nested into each other to know the full XML object name path to a specific inner node, like in the bug reproduction example would be oXMLobject.VFPData.tablename.fieldname. XPath queries help with finding nodes in XML in varying positions (nesting levels), but then that's another topic.
[highlight #FCE94F]Edit: In conclusion we do now know the nature of the problem is in type inference followed by the inability to convert the node value from its string form "." into a number. The nature of type inference seems to be to parse the full XML and decide the data type by what fits for all record values, so not having a cursor or dbf in advance a less practical workaround is adding another record in the XML with a value that's not misinterpreted as a number.
And, last not least, that should help to track down the bug in the XMLToCursor implementation.[/highlight]
Bye, Olaf.
Olaf Doschke Software Engineering
In thread thread184-1805853 I discussed a lot which has one important essence that should become more generally known. David Higgs actually came across a bug of XMLParsing in XMLToCursor.
Here's a reproduction of the bug. It involves XML that is valid in itself and has sufficiently enough nested levels so XMLToCursor doesn't reject it with parse errors of the XML root note or "unable to infer XML schema". It still causes a parse error because a node value that would convert to a field in a record of the cursor XMLToCursor creates is just a period:
Code:
Local lcXML
Text To lcXML NoShow
<?xml version="1.0" encoding="Windows-1252" standalone="yes"?>
<VFPData>
<tablename>
<fieldname>.</fieldname>
</tablename>
</VFPData>
EndText
On Error ? Message(), "in Line", Lineno()
XMLToCursor(lcXML,"crsFromXMLfails")
lcXML = Strtran(lcXML,">.<",">.a<")
XMLToCursor(lcXML,"crsFromXMLworks")
Select crsFromXMLworks
Browse
This demonstrates only exactly the value "." fails with a parse error and I couldn't reproduce this parse problem with MSXML2.SAXXMLReader.3.0 nor MSXML2.SAXXMLReader.4.0.
I'm still not sure whether the problem is within the MSXML library VFP uses or in VFP and its runtime itself.
[highlight #FCE94F]Edit: It does not only turn out the problem disappears when you create a cursor in advance and use nFlags=8192 so type inference is skipped, it also turns out when there are multiple records and other values are not just ".", then type inference decides for a char field again. So more precisely the problem is inferring a numeric field type and then failing in evaluating "." as a number. [/highlight]
So the bug must be on the VFP side of what is interpreted as a numeric value, despite not being able to convert it to a numeric value. Just because a period is one of the characters that are allowed within numeric values as decimal point, standing alone it's not a number, just like a minus or plus sign isn't. And just checking, a single "+" or "-" are not inferred as a numeric type.
XMLToCursor isn't a universal XML conversion anyway and also the XMLAdapter only can convert XML that adheres to a structure including XML nodes that can be parsed to a table with rows and their fields. You obviously have to have a repeating structure of nodes with same names and datatypes to be able to create records (with the corner case of a single record, obviously needing no repeats).
XMLToCursor mainly exists as inverse operation to CursorToXML(), and XML not working sometimes can easily enough be modified for XMLToCursor() to work. A string replacement of ">.<" could be considered before any XMLToCursor() call as a workaround. But in general you wouldn't know whether the datatype working with the rest of the nodes resembling table records should be a string, numeric or other datatype, so you can only mend XML with more knowledge about its structure.
There are alternatives that all can also make a totally different conversion of XML to an object and I refer to them as replacement and extension of VFP XML functionalities. I know wwXML even existed before VFP7 introduced the first few XML functions in VFP.
wwXML: or more directly: with docs at nfXML: atlopes/xml:
And for simple cases StrExtract is a function, which extracts strings between begin and end delimiters you specify, which, of course, also can be HTML or XML tags:
Code:
Clear
lcPartialXML = '<tag>nodevalue</tag>'
? 'Extracting node values'
lcNodevalue = StrExtract(lcPartialXML,'<tag>','</tag>')
? lcNodevalue
* problem case: The opening tag has attributes:
lcPartialXML = '<tag attribute="attributevalue">nodevalue</tag><othertag />'
lcTag = StrExtract(lcPartialXML,'<tag','</tag>')
lcNodevalue = StrExtract(lcTag,'>','',1,2)
? lcNodevalue
?
? 'Extracting attribute values'
* extracting an attributevalue:
? StrExtract(lcPartialXML,'attribute="','"')
* within a specific tag:
lcTag = StrExtract(lcPartialXML,'<tag','</tag>')
? StrExtract(lcTag,'attribute="','"')
?
? 'Extracting all tags'
* extracting all tags
For nCount = 1 to Occurs('<',lcPartialXML)
? StrExtract(lcPartialXML,'<','>',nCount,4)
Endfor
I wouldn't recommend to get much more complex and write your own XML parser, that already has been done in the mentioned libraries. And one of their advantages is you can also turn XML into objects like the HTML/XML DOM (document object model) in general is. You then even more so have to know which XML nodes are nested into each other to know the full XML object name path to a specific inner node, like in the bug reproduction example would be oXMLobject.VFPData.tablename.fieldname. XPath queries help with finding nodes in XML in varying positions (nesting levels), but then that's another topic.
[highlight #FCE94F]Edit: In conclusion we do now know the nature of the problem is in type inference followed by the inability to convert the node value from its string form "." into a number. The nature of type inference seems to be to parse the full XML and decide the data type by what fits for all record values, so not having a cursor or dbf in advance a less practical workaround is adding another record in the XML with a value that's not misinterpreted as a number.
And, last not least, that should help to track down the bug in the XMLToCursor implementation.[/highlight]
Bye, Olaf.
Olaf Doschke Software Engineering