How to transform XML in "canonical" W3C Recommendation C14

memarques · Sep 28, 2017

I need to transform “regular” XML files in XML “format” that attends “ canonical W3C Recommendation C14”. For example: “no white space”, no tags like “<TAGS />”, and so on.

Any ideas? Thx.

Mauro

TamarGranor · Sep 28, 2017

If you're trying to read XML into cursors or tables, check the XMLToCursor() function or the XMLAdapter class.

Tamar

memarques · Sep 28, 2017

Tamar, Thx for info. My need is to read a XML file, and save a "canonical W3C Recommendation C14" compilant XML on another file. Nothing to do with tables. I read the XMLAdapter class, but not find a way to Read and Write a "generic" XML in C14 format.

Mauro

Dan Freeman · Sep 28, 2017

There's no magic about XML files. There's actually no such thing as "standard XML". They're text. You can read the whole thing using FileToStr(), you can read line by line using FREAD()/FGETS(), you can suck it into a memo field and use all the string parsing tools in the language.

Or you can load it in an XML parser and use the object hierarchy created by the parser to read node by node. Some XML parsers can read and write multiple XML formats.

This really isn't a Foxpro question at all. It sounds like you need to do some research.

memarques · Sep 28, 2017

Dan

First of all, the premisse that you implied about to do "some reserch" without any knowledge about me, or anybody, is at least impolite.

I am absolutely sure that everybody (include me) knows that this is a VFP forum (maybe for some decades already). And of course I asked about an idea how to do what I need using VFP.

Your idea to read a file and parse the XML's contents for myself is like "reinvent the wheel". I was hopping a solution using msxml for example.

BTW, you can also open a DBF file and a CDX file, and manage for yourself the data and index content using your suggestion (fopen, fread and fwrite) it only takes "some research" on how to do it.

No harm here (I'm too old for this), and I don't want to argue anything else but VFP.

mplaza · Sep 28, 2017

You can use

http://www.freeformatter.com/xml-formatter.html

and select "compact mode",

or nfXml (

https://github.com/VFPX/nfXML

), but it supports only all-lower-case xml output:

Given the next XML:

XML:

<?xml version="1.0" encoding="UTF-8"?>
<xml>
	<doc>
<field3>def</field3>
	<field1 attr1="123" attr2="456" attr3="789"/>
		<field2>abc</field2>
	</doc>
	<doc>
		<field2>abc</field2>
		<field1 attr3="789"  attr2="456" attr1="123" />
<field3>def</field3>
	</doc>
	<doc>
		<field1 attr2="456" attr3="789" attr1="123" />
<field3>def</field3>
		<field2>abc</field2>
		<tags />
	</doc>
</xml>

? nfXmlCreate( nfXmlRead( m.cXml ))

outputs:

XML:

<?xml version="1.0" encoding="utf-8"?>
   <doc>
      <field1 attr1="123" attr2="456" attr3="789"/>
      <field2>abc</field2>
      <field3>def</field3>
   </doc>
   <doc>
      <field1 attr1="123" attr2="456" attr3="789"/>
      <field2>abc</field2>
      <field3>def</field3>
   </doc>
   <doc>
      <field1 attr1="123" attr2="456" attr3="789"/>
      <field2>abc</field2>
      <field3>def</field3>
      <tags/>
   </doc>

Marco Plaza
@vfp2nofox

memarques · Sep 28, 2017

thx Marco, I'll check this.

Regards, Mauro

Olaf Doschke · Sep 29, 2017

I think you're in good hands with that library, you could of course reduce whitespace with STRTRAN() and turn everything lower case with LOWER(), but then textual data within tags should stay as is, nevertheless. So it's clear you'd rather need some parsing.

It also won't work to simply loadXML and then spit out, what DOM a parser build up, such code cleans up a bit, but also works no miracle wonder:

Code:

TEXT TO lcXML noshow
<?xml version="1.0" encoding="UTF-8"?>
<xml>
   <DOC>
<field3>def</field3>
   <field1 attr1="123" attr2="456" attr3="789"/>
      <field2>abc</field2>
   </DOC>
   <DOC>
      <field2>abc</field2>
      <field1 attr3="789"  attr2="456" attr1="123" />
<field3>def</field3>
   </DOC>
   <DOC>
      <field1 attr2="456" attr3="789" attr1="123" />
<field3>def</field3>
      <field2>abc</field2>
      <tags />
   </DOC>
</xml> 
ENDTEXT

LOCAL oParser as MSXML2.DomDOCument
oParser = CreateObject( "MSXML2.DomDocument" )
oParser.loadXML(lcXML)
STRTOFILE(oParser.xml,AddBS(GetEnv("TEMP"))+"output.xml")
Mofiy File (AddBS(GetEnv("TEMP"))+"output.xml")

The output is a bit better indented, but that's it. You could somehow tell the parser to check compliance with a certain schema, but not to adhere to some specification, I doubt that.

Bye, Olaf.

memarques · Sep 29, 2017

Olaf

Yes, I guess that you are right. I can't find a way to do 100% of the "conversion" from "regular" XML to "canonical c14 compliant".

I think that this maybe can be done with .NET. When we need to sign a XML document it must be first "translated" in "c14 format" (to guarantee that the XML’s signature data is the same when it was signed and when it will be checked). I'm not a .NET guy, but I'll do some research for something in this area (NET). If I find a way, then I can create a DLL to be used by VFP.

Thx, regards, Mauro

Olaf Doschke · Sep 29, 2017

On the topic of a signature (embedded within the XML) is, that this does not alter the XML aside of adding a new signature node. And so the signature can be recomputed and checked, whether or not tags are lower or UPPER case, whether or not whitespace is limited and whther or not short tags are used or not. In the end it's just text instead of binary bytes.

What's true is, that even slight changes (eg removing a space) cause another signature. But signing is not compressing or encrypting, it only adds the signature to whatever data, it doesn't alter it.

The necessity of a certain formatting only would be given, if some outset of data is output with a certain XML structure and there are some known signatures (test cases) for certain sets of data given in any raw or binary format and not yet XML. I would just doubt such a setup as sensible.

The idea I had can also be extended. Using the xml property of the MSXML2.DomDocument is just one straight way of getting some reformatted XML. You may try to influence how the xml stored here can be brought into the direction of this specification. One thing is for sure, it's not the original lcXML value. What LoadXML does is build up an object tree of nodes, siblings, childnodes, a DOM (document object model). You can also recurse that object, the root node properties are atored in the object itself after LoadXML and as any XML is a straight forward tree structure it is sufficient to have a routine for a single node and let that recursively call itself with all childnodes.

This will retreive some basic informations from each node and display them nested by indention:

Code:

TEXT TO lcXML noshow
<?xml version="1.0" encoding="UTF-8"?>
<xml>
   <DOC>
<field3>def</field3>
   <field1 attr1="123" attr2="456" attr3="789"/>
      <field2>abc</field2>
   </DOC>
   <DOC>
      <field2>abc</field2>
      <field1 attr3="789"  attr2="456" attr1="123" />
<field3>def</field3>
   </DOC>
   <DOC>
      <field1 attr2="456" attr3="789" attr1="123" />
<field3>def</field3>
      <field2>abc</field2>
      <tags />
   </DOC>
</xml>
ENDTEXT

Local oParser As MSXML2.DomDOCument
oParser = Createobject( "MSXML2.DomDocument" )
oParser.LoadXML(lcXML)

Clear
outputxml(oParser)

Procedure outputxmlinfo()
   Lparameters toNode, tnLevel
   tnLevel = Evl(tnLevel,0)
   ?  Space(tnLevel*3)+"node type:"+toNode.nodeTypeString+;
      ", node name:"+Evl(toNode.baseName,"NONE")+;
      ", node value:"+Nvl(toNode.nodeValue,"NONE")+;
      ", node attributes:"+Evl(outputattributes(toNode),"NONE")+;
      ", has child nodes:"+Iif(toNode.hasChildNodes,"Yes","No")
   Try
      For Each loNode In toNode.childNodes
         outputxmlinfo(loNode, tnLevel+1)
      Endfor
   Catch
      *
   Endtry
Endproc

Procedure outputattributes
   Lparameters toNode
   lcAttributes = ""
   Try
      For Each loAttribute In toNode.Attributes
         lcAttributes = lcAttributes +' '+ loAttribute.Name+'="'+loAttribute.Value+'"'
      Endfor
   Catch
      *
   Endtry
   Return Evl(lcAttributes,"")
Endproc

Use intellisense after setting a breakpoint, and typing toNode. in the command window or inspecing the Local variables in the debugger will reveal properties available at runtime to make use of when building the XML text output.

And this will output simple XML based on what I learned from iterating the DOM nodes basic informations:

Code:

TEXT TO lcXML noshow
<?xml version="1.0" encoding="UTF-8"?>
<xml>
   <DOC>
<field3>def</field3>
   <field1 attr1="123" attr2="456" attr3="789"/>
      <field2>abc</field2>
   </DOC>
   <DOC>
      <field2>abc</field2>
      <field1 attr3="789"  attr2="456" attr1="123" />
<field3>def</field3>
   </DOC>
   <DOC>
      <field1 attr2="456" attr3="789" attr1="123" />
<field3>def</field3>
      <field2>abc</field2>
      <tags />
   </DOC>
</xml>
ENDTEXT

Local oParser As MSXML2.DomDOCument
oParser = Createobject( "MSXML2.DomDocument" )
oParser.LoadXML(lcXML)

Clear
outputxml(oParser)

Procedure outputxml
   Lparameters toNode, tnLevel
   tnLevel = Evl(tnLevel,0)
   Local llElement, llInstruction, lnCountChildnodes
   lnCountChildnodes = 0
   llInstruction = (toNode.nodeTypeString=="processinginstruction")
   llElement = (toNode.nodeTypeString=="element")
   IF llInstruction
      ?? "<?"+Lower(toNode.baseName)
   Endif
   If llElement
      ? Space(tnLevel*3)+"<"+Lower(toNode.baseName)+outputattributes(toNode)+">"
   Endif
   If !Empty(Nvl(toNode.nodeValue,""))
      ?? toNode.nodeValue
   Else
      lnCountChidnodes = 0
      Try
         For Each loNode In toNode.childNodes
            lnCountChildnodes = lnCountChildnodes + 1
            outputxml(loNode, tnLevel+1)
         Endfor
      Catch
         *
      Endtry
   Endif
   If llElement
      If lnCountChildnodes>1
         ? Space(tnLevel*3)
      Endif
      ?? "</"+Lower(toNode.baseName)+">"
   Endif
   If llInstruction
      ?? ">"
   Endif

Endproc

Procedure outputattributes
   Lparameters toNode
   lcAttributes = ""
   Try
      For Each loAttribute In toNode.Attributes
         lcAttributes = lcAttributes +' '+ loAttribute.Name+'="'+loAttribute.Value+'"'
      Endfor
   Catch
      *
   Endtry
   Return Evl(lcAttributes,"")
Endproc

As long as your initial XML can be parsed and read into a DOMDocument, you have the basis of outputting that as you like and need. It's still a long way to apply all the specifications, but for one example my code does not re-output the field1 tags as short tag with "/>" closing, but has a </field1> closing tag.

You could also turn attributes into inner nodes or whatever is necessary to comply. I don't see a library around MSXML and surely not within VFP, but I also don't know all of VFPX.
Yes, some .NET assembly doing that might be available. I didn't deep dive into specifications, but you might be able to get what you want with such an approach, just ensure all your input XML can be read in at least.

Bye, Olaf.

memarques · Sep 29, 2017

Olaf

Thx a lot.

First I'll give a try using .NET. I "thing" that it will be a lot easier to convert all the "canonical c14n" compliant considerations with it. I'll let you know how it goes.

If I can't with .NET I'll use your idea (as starting point) that solves a lot, and implement other things, although there are a lot to do.

Regards, Mauro

atlopes · Oct 5, 2017

Coming here a few days later, hoping that this may be still useful to you, Mauro.

I have already put together some of the bits required for a VFP-based XML Canonicalizer, using MSXML2 (I will probably go away from it in the future, because of the way MSXML2 handles the ingestion of line feeds as entities). But, for now, it can pass the basic W3C tests.

It is available at

https://bitbucket.org/atlopes/xml.

Unfortunately, I haven't written an external documentation for it yet, but if you want to give it a try, it is fairly basic to set up. Download the XML classes sources (there is an external dependency from another class, at

https://bitbucket.org/atlopes/names),

and, for a quick test:

Code:

DO LOCFILE("xml-canonicalizer.prg")
m.Canon = CREATEOBJECT("XMLCanonicalizer")
? m.Canon.Canonicalize("[URL unfurl="true"]http://producthelp.sdl.com/sdl%20trados%20studio/client_en/sample.xml")[/URL]

My purpose of writing the canonicalizer is the same of yours, but since nowadays I'm exploring Chilkat's XML signing capabilities (that includes an XML canonicalizer) I put my efforts into pushing further this development on hold.

memarques · Oct 5, 2017

@atlopes

Thx so much for sharing this. For sure I'll check "your site" upside-down.

Regards, Mauro

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

How to transform XML in "canonical" W3C Recommendation C14

memarques

IS-IT--Management

TamarGranor

Programmer

memarques

IS-IT--Management

Dan Freeman

Programmer

memarques

IS-IT--Management

mplaza

Programmer

memarques

IS-IT--Management

Olaf Doschke

Programmer

memarques

IS-IT--Management

Olaf Doschke

Programmer

memarques

IS-IT--Management

atlopes

Programmer

memarques

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor