Hi,
I need to compare two XML files and determine if they are similar. Similar for me is that they share the same structure, regardless the actual content of them.
The idea is to discard the files before further processing based on this criteria. For example, suppose I've mirrored freshmeat.net and only want to store or process the articles pages, I'd submit every page through Xerces to balance the tags and through this "tool" to determine which pages are similar to a given example.
I do not want to reinvent the wheel so I am looking for algorithms, snippets of code, tools to aid me.
Any ideas ?
I need to compare two XML files and determine if they are similar. Similar for me is that they share the same structure, regardless the actual content of them.
The idea is to discard the files before further processing based on this criteria. For example, suppose I've mirrored freshmeat.net and only want to store or process the articles pages, I'd submit every page through Xerces to balance the tags and through this "tool" to determine which pages are similar to a given example.
I do not want to reinvent the wheel so I am looking for algorithms, snippets of code, tools to aid me.
Any ideas ?