Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

get 'nice' XML from a Word 2003/2007 doc?

Status
Not open for further replies.

StuckInTheMiddle

Programmer
Mar 3, 2002
269
US

I've asked this in the office forum but didn't get any response, so hoping that you guys can shed a little light on this for me as this will be an VB.NET eventually.

Does anyone know of the easiest method of getting good XML from a word doc either 2003 or 2007. Eventually want to be able to do this for XL, Powerpoint and Access too but for now strip tidy 'nice' looking XML from a word doc will do.

I know there's the save as XML option in 2003, and 2007 natively supports XML, but both these options wrap the usual amount of MS rubbish tags in there. I simply want to have tags that tell me the text in the document and the formatting of that text, most likely in paragraphs.

I considered using words document model and listing through storyranges for the properties i want in VBA and adding my own XML tags, but this seems very cumbersome/slow/not extensible.

Any ideas appreciated, there most be someone else out there you needs XML from word docs :)

A,

"If you can stay calm, while all around you is chaos...then you probably haven't completely understood the seriousness of the situation.
 
Hi Rick

I looking to get OOXML, an XML representation of the word document, basic text and formatting.

A,

"If you can stay calm, while all around you is chaos...then you probably haven't completely understood the seriousness of the situation.
 
I think there is an OOXML plug-in for 2003, but I'm not sure. Alternatively, you could see about using Open Office. It's open source and I think they have a lot of the API accessable. It's not OOXML though, it's ODF. And it will hopefully become the new ISO standard for document XML formats.

-Rick

VB.Net Forum forum796 forum855 ASP.NET Forum
[monkey]I believe in killer coding ninja monkeys.[monkey]
 
Hi Rick,

Open Office definitely sounds like an improvement, unfortunately my client is a Microsoft only house and has a load of word 2003/2007 documents that they want to catalog and scan through electronically for which they need an XML version of all their office docs. Microsofts 'save as XML' in 2007 comes close, but wraps too many of its own tags in their. Maybe I need to come up with some schema or DTD that can strip the 'crap' (pardon my French) from MS XML for them.

Thanks anyways,

A,

"If you can stay calm, while all around you is chaos...then you probably haven't completely understood the seriousness of the situation.
 
I'm pretty sure MS:Office will open ODF files. The thing that would concern me about striping the MS crap out of the XML is that MS Office might not like trying to re-open the trimed up XML.

Another option would be to use an off the shelf document storage system like Kwiktag to scan, store, and OCR your docs. Last time I worked with Kwiktag though, I wasn't really impressed with it's performance or interface. So I would suggest looking into another competing product, or if you have the time/budget just write your own.

-Rick

VB.Net Forum forum796 forum855 ASP.NET Forum
[monkey]I believe in killer coding ninja monkeys.[monkey]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top