Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

XMLToCursor Parse Error 3

Status
Not open for further replies.

David Higgs

Programmer
May 6, 2012
392
GB
My application interrogates a WEB Based Database and returns e.g. <fname> David </fname>

The following code has been working ok until recently I can across a <fname> . </fname>.

Code:
XMLToCursor(QRZ_Lookup,"cur_QRZ_Lookup")
Contacts_name = StrExtract(qrz_lookup,"<fname>","</fname>") + " " + StrExtract(qrz_lookup,"<name>","</name>")

The "period" in the First Name produced the following error.

QRZ_Lookup_zmeu1y.jpg


What would I need to do to my code to prevent this error occurring for any invalid values?


Regards,

David.

Recreational user of VFP.
 
Nice,

so this narrows it down to the phase of schema inferring!? I'm not so sure, it's still reported as a parse error.
Clearly parsing is a necessary step in type inferring, so that is the reason for that, but parsing will be done anyway to get the values to put into cursor records.

So what's your guess, does VFP even use the XML reader for type inference?
As far as I look into DOMDocument loadxml it infers nodeTypes (element, endelement, ...), not data types, then NodeTypedValue is also always vartype char, just like the text property of a node. Is a DOM ever creating object properties that are numeric or other data types than char? XML surely has concepts of numeric nodes with xs:float, for example, but when and where at all would I see how an XML reader infers that data type, not node type, for an XML node? Or is it the part that is VFP specific?

Bye, Olaf.






Olaf Doschke Software Engineering
 
I see, David, when you're the end user of your code a simple error handling like [pre]ON ERROR SET STEP ON[/pre] would already be a nice add on to get into debug mode instantly when an error happens. There you'd still have hands on the variable QRZ_Lookup and the XML it contains, for example, but also any other current set of variables, callstack and other things just not available when you neither log nor otherwise handle errors.

Bye, Olaf.

Olaf Doschke Software Engineering
 
David,

May I pick up your point about being lax on error-handling.

The reason to build error-handling into your code is not to fix errors when you come across them in development. It's more to do with shielding the user from the consequences of an error. At the very least, a good error handler will notify the user of the error in a friendly way (rather than display a cryptic error message), and probably log the error and notify the developer.

In this particular case, using TRY / CATCH / ENDTRY would mean that your program can decide to ignore the problem (the dot in the name field), to report it to the user, or to log it in some way. It's all about graceful degradation. In other words, the code might not fully support the expected result (that is, the user expecting to be able to enter a dot in place of their name), but it doesn't crash the system either.

For these reasons, it would be worth familiarising yourself with the various ways of using ON ERROR and TRY / CATCH / FINALLY (and possibly the Error method in objects, although that is something that I have never used, rightly or wrongly).

Mike


__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Mike,

as far as I see the part of user-friendly error handling when the developer is the user, it can get much simplified. SET STEP ON as handling could also become an annoyance just like ON ERROR * can make you unsuspicious about any errors. And indeed SET STEP ON isn't necessary as the default system handler has the suspend button in the error messagebox. But it's a shortcut into the debugger.

Also often enough I just print error messages especially programming something nonvisual or let it go to debugout. That's all single liners not even needing to write an error handler prg or a function within main.prg.

I did an extensive error handler for end users, though, and by the users used to other VFP applications they actually didn't like what would be most responsible: To always quit after any error and prevent any mischiefs. The worst case of a ZAP after a SELECT failed to select the workarea was also no argument, it was too far fetched, and they too often had the case a minor bug could just be ignored to continue.

I also used RETURN TO MASTER as a compromise between exiting the application and just cancelling what's currently done. If you have an application object with a readevents method in which you, well, put the command READ EVENTS, then RETURN TO MASTER simply cancels anything on the stack that led to the error and you're back to waiting for the user to use menu, a form, whatever currently is the scope of the application. That's then leading to less frustration as the startup of enterprise applications tends to be lengthier.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Olaf,

I don't want this thread to get too side-tracked on the subject of error handling. But of course I agree that it can be a nuisance when the developer is the user. I have a fairly sophisticated error-handler that does a lot of logging and notifying, and also deals with reverting buffers, rolling back transactions, etc. But I call it like this:

Code:
IF <we are in the run-time environment)
  DO ERROR WITH <etc.>
ENDIF

In the development environment, I agree that we want to suspend the program and get to either the debugger or the command window as easily as possible.

Regarding RETURN TO MASTER, I used to take that approach in pre-Visual days. And I started by trying to do something similar in VFP. But - rightly or wrongly - I decided that, if an error occurred (one that couldn't be handled at run time), then the best thing is to quit the application as gracefully as possible. You can't know if the error caused any side effects, such as closing a table or releasing a variable, which means that the application is inherently unstable and therefore needs to closed.

(Sorry, I said I didn't want to get side-tracked, but that's exactly what I've done.)

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Mike Lewis said:
I don't want this thread to get too side-tracked on the subject of error handling.

I'll start another thread with a link to this one.

Regards,

David.

Recreational user of VFP.
 
Olaf said:
So what's your guess, does VFP even use the XML reader for type inference?

VFP uses the MSXML classes to parse XML documents and load them into cursors, but the determination of the data type of the resulting columns is VFP's complete responsibility, and it is based on the nodes' contents.

I think there are two levels of problems, here:
[ol a]
[li]VFP decides that a presence of single point determines a numeric value (which is wrong, of course, since a single decimal point is not a valid number) - note that it's not this phase that raises the error, VFP will just create a column of type N(1)[/li]
[li]When reading the actual data from the XML nodes, it does not gracefully degrade to an empty value like it does in other similar circumstances (an empty numeric content, or an invalid date) - and it's at this moment that an error pops up[/li]
[/ol]
 
Hm, shouldn't there be a cursor then, or is it just in some proto state?

OK, let's try when preparing a cursor with a non fitting data type, just like wrong type inference would do:
Code:
Local lcXML

Text To lcXML NoShow
<?xml version="1.0" encoding="Windows-1252" standalone="yes"?>
<VFPData>
   <tablename>
      <fieldname>.</fieldname>
   </tablename>
</VFPData>
EndText

CREATE CURSOR crsFromXMLfails (fieldname N(1))

On Error ? Message(), "in Line", Lineno()

XMLToCursor(lcXML,"crsFromXMLfails", 8192)
Okay, yes, VFP "blames" the parser. because it cant eval("."). That means it happens after the type inferring stage simply when the cursor fields don't work or evaluate or val or whatever VFP uses for conversion of string to data type.

That makes me wonder how much of the XML is parsed for type inference. We know from the SQL engine quirks that inferring field width fails when the first result value is short or even empty, it's one of the buggy VFP behaviors you need to know. I have seen type inferring only taking a few rows. This sample shows VFP (or MSXML) will go through more than just the first record to infer the fieldtype.

Code:
Local lcXML

Text To lcXML NoShow
<?xml version="1.0" encoding="Windows-1252" standalone="yes"?>
<VFPData>
   <tablename>
      <fieldname>.</fieldname>
   </tablename>
   <tablename>
      <fieldname></fieldname>
   </tablename>
   <tablename>
      <fieldname>+</fieldname>
   </tablename>
   <tablename>
      <fieldname>-</fieldname>
   </tablename>
   <tablename>
      <fieldname>0</fieldname>
   </tablename>
   <tablename>
      <fieldname>9</fieldname>
   </tablename>
   <tablename>
      <fieldname>A</fieldname>
   </tablename>
   <tablename>
      <fieldname>Z</fieldname>
   </tablename>
</VFPData>
EndText

On Error ? Message(), "in Line", Lineno()

XMLToCursor(lcXML,"crsFromXMLworks")

So it's not just ".", it's first inferring a numeric type from it and then failing to evaluate "." to a numeric value. You can see I tried to let VFP infer numeric for several rows, but it makes a full pass, I tried 2048 nodes with "." and then an "A" and it still doesn't error but infers char.

Which is good and bad news. That also means one value off the norm of, for example, really a numeric type can let VFP convert this to char field, just because once the XML has "." or "e", perhaps, or some other unusual value. When you're used to getting a numeric field and VFP then creates a char field the XMLToCursor might work, but your own code then fails. So it might also be a good idea to use the inferring with sample XML and during your initial development while it works, then store one sample result as DBF And use that as a template for further XMLToCursor conversions. In which case you fail less often, but can also fail when the XML changes and has more/other fields.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Olaf said:
Hm, shouldn't there be a cursor then, or is it just in some proto state?

XMLTOCURSOR() seems to follow these three steps, in case there is no schema or a target cursor in place:
[ol 1]
[li]MSXML.Load the document[/li]
[li]Go through the nodes tree and build a mapping from XML elements to VFP columns, including the determination of the data type of the columns[/li]
[li]Go through the nodes tree again, and fetch the contents from the XML nodes into the VFP columns[/li]
[/ol]
If something goes wrong with step 3, VFP will create a cursor, nevertheless, and will fill with as many rows it's able to import without error.

That is, this will create a cursor with RECCOUNT() = 0
Code:
Local lcXML

CLEAR

USE IN SELECT("crsFromXMLfails")

Text To lcXML NoShow
<?xml version="1.0" encoding="Windows-1252" standalone="yes"?>
<VFPData>
   <tablename>
      <fieldname>.</fieldname>
   </tablename>
</VFPData>
EndText

On Error ? Message(), "in Line", Lineno()

XMLToCursor(lcXML,"crsFromXMLfails") 

ON ERROR

BROWSE

So will this (the import finishes at the first error):

Code:
Local lcXML

CLEAR

USE IN SELECT("crsFromXMLfails")

Text To lcXML NoShow
<?xml version="1.0" encoding="Windows-1252" standalone="yes"?>
<VFPData>
   <tablename>
      <fieldname>.</fieldname>
   </tablename>
   <tablename>
      <fieldname>0</fieldname>
   </tablename>
</VFPData>
EndText

On Error ? Message(), "in Line", Lineno()

XMLToCursor(lcXML,"crsFromXMLfails") 

ON ERROR

BROWSE

But if the dot value comes after some other valid numeric values, the RECCOUNT() will reflect the number of rows validly imported (in this case, RECCOUNT() = 1):

Code:
Local lcXML

CLEAR

USE IN SELECT("crsFromXMLfails")

Text To lcXML NoShow
<?xml version="1.0" encoding="Windows-1252" standalone="yes"?>
<VFPData>
   <tablename>
      <fieldname>0</fieldname>
   </tablename>
   <tablename>
      <fieldname>.</fieldname>
   </tablename>
</VFPData>
EndText

On Error ? Message(), "in Line", Lineno()

XMLToCursor(lcXML,"crsFromXMLfails") 

ON ERROR

BROWSE

On its own, XMLTOCURSOR() is a great function to quickly import data from an XML document into a VFP cursor for inspection, but, because of the decisions it takes on column mapping, I never use it in production code without a previously prepared schema or cursor.

For instance, in the following example, the imported data maybe not exactly what we would (or could) expect:

Code:
Local lcXML

CLEAR

USE IN SELECT("crsFromXMLfails")

Text To lcXML NoShow
<?xml version="1.0" encoding="Windows-1252" standalone="yes"?>
<VFPData>
   <tablename>
      <fieldname>0</fieldname>
      <other>2020-02-30</other>
   </tablename>
   <tablename>
      <fieldname>1</fieldname>
      <other>2020-12-00</other>
   </tablename>
</VFPData>
EndText

On Error ? Message(), "in Line", Lineno()

XMLToCursor(lcXML,"crsFromXMLfails") 

ON ERROR

BROWSE
 
OKay, I have to look into these code samples later. Nevertheless, XMLToCursor is not really any good unless your structure has the necessary type of nesting. if your XML only is about a few single key-values XMLToCursor won't help, neither inferring a cursor nor preparing one.

Code:
Local lcXML
CLEAR

USE IN SELECT("crsFromXMLfails")

Text To lcXML NoShow
<?xml version="1.0" encoding="Windows-1252" standalone="yes"?>
<person>
   <firstname>Olaf</firstname>
   <lastname>Doschke</lastname>
</person>
EndText

On Error ? Message(), "in Line", Lineno()
XMLToCursor(lcXML,"crsFromXMLfails") 
ON ERROR

So in such cases, you'd just add a layer by surrounding it with an extra node to get a record from it. But even that won't work for any XML
XML:
<?xml version="1.0" encoding="Windows-1252" standalone="yes"?>
<person>
   <firstname>Olaf</firstname>
   <lastname>Doschke</lastname>
   <speaks>
      <naturallanguage>German</naturallanguage>
      <naturallanguage>English</naturallanguage>
      <programminglanguage>Basic</programminglanguage>
      <programminglanguage>6502 assembler</programminglanguage>
      <programminglanguage>Pascal</programminglanguage>
      <programminglanguage>68000er Assembler</programminglanguage>
      <programminglanguage>C/C++</programminglanguage>
      <programminglanguage>...</programminglanguage>
   <speaks>
</person>

The default of reading in any XML should be object, shouldn't it? Especially since we have the empty class as basis, you can have a class that allows any name as property (unless you leave english or use two words, but XML disallows that as node names, too) and go to cursor from there or if you already know the structure allows so.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Yes, Olaf, all these cases would require some kind of preprocessing. The XML ecosystem is great for that, because of its parsers and transformers, eventually leading to an XMLCURSOR(), at the end.

Some examples:
[ul]
[li][/li]
[li][/li]
[li][/li]
[/ul]
 
OK, I see. I must have overlooked this all the time, the cursor is created even for the older examples. Bummer.

Anyway, at least the problem is settled to be specifically interpreting a period and more generally when not just inferring an unsuitable type but also failing conversion of the xmltext to that data type.

I tried your XMLSerializer class XMLtoVFP() method with my last XML example and to be fair it's not necessarily how multiple fields with same name would appear in XML, or is it? DOMDocument reads this in and I find all languages. I see your conversion puts collection objects as VFP object nodes, but I only find each first naturallanguage and programminglanguage in the result. If it would matter for real-world cases you'd rewritten that differently. I'll dig deeper into this, I may just use this wrong or look into the wrong nodes.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Olaf, using the serializer to ingest your document, it could be something like this:

Code:
LOCAL XMLS AS XMLSerializer
LOCAL XMLV AS Empty
LOCAL Source AS String

TEXT TO m.Source NOSHOW FLAGS 1
<?xml version="1.0" encoding="Windows-1252" standalone="yes"?>
<person>
   <firstname>Olaf</firstname>
   <lastname>Doschke</lastname>
   <speaks>
      <naturallanguage>German</naturallanguage>
      <naturallanguage>English</naturallanguage>
      <programminglanguage>Basic</programminglanguage>
      <programminglanguage>6502 assembler</programminglanguage>
      <programminglanguage>Pascal</programminglanguage>
      <programminglanguage>68000er Assembler</programminglanguage>
      <programminglanguage>C/C++</programminglanguage>
      <programminglanguage>...</programminglanguage>
   </speaks>
</person>
ENDTEXT

m.XMLS = CREATEOBJECT("XMLSerializer")

m.XMLV = m.XMLS.XMLtoVFP(m.Source)

* fetch the text from nodes directly

? "------- Natural languages --------"
? m.XMLV.person.Speaks.Naturallanguage(1).xmltext(1)
? m.XMLV.person.Speaks.Naturallanguage(2).xmltext(1)

* or programmatically, but assuming simple non-mixed text nodes

? "------- Programming languages --------"

LOCAL Cases AS Integer
LOCAL CaseIndex AS Integer

m.Cases = m.XMLS.GetArrayLength(m.XMLV.person.speaks.programminglanguage)

IF m.Cases != 0
	FOR m.CaseIndex = 1 TO m.Cases
		? m.XMLV.person.speaks.programminglanguage(m.CaseIndex).xmltext(1)
	ENDFOR
ELSE
	? m.XMLV.person.speaks.programminglanguage.xmltext(1)
ENDIF

Showing all text from a point in a tree in one go is offered by the DOM as a convenience, but it's not strictly XML.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top