Hi all!
I have had to develop the cross-platform app allowing to read ONLY text from MS-Word doc file into text box.
Inspecting the .DOC file I found text begins at offset 600h. So I put the pointer to this offset and read the file as binary stream until 3 or more 0-valued bytes encounter.
In the most cases this is OK. But there are problems with some MS-WORD docs: the text DOES NOT BEGINS at 600h but somewhere else.
Maybe someone has had met with such a problem and found these zealous offsets. I've heard, that .DOC file header (or even footer) contains such an information, but I did't find the right way in the garbage put there yet.
Thanks!
I have had to develop the cross-platform app allowing to read ONLY text from MS-Word doc file into text box.
Inspecting the .DOC file I found text begins at offset 600h. So I put the pointer to this offset and read the file as binary stream until 3 or more 0-valued bytes encounter.
In the most cases this is OK. But there are problems with some MS-WORD docs: the text DOES NOT BEGINS at 600h but somewhere else.
Maybe someone has had met with such a problem and found these zealous offsets. I've heard, that .DOC file header (or even footer) contains such an information, but I did't find the right way in the garbage put there yet.
Thanks!