Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Converting word documents

Status
Not open for further replies.

Glasgow

IS-IT--Management
Jul 30, 2001
1,669
GB
I have a number of word documents to convert. They are not rocket science nor are they particularly large. They do have different headings and numbered lists.

I can save them as HTML from Word but it seems to generate all sorts of 'unnecessary baggage' in the HTML. If I simply take the text and manually convert it to html I end up with a much leaner html file (about a quarter of the size of Word's attempt).

Is there a way of getting a 'reasonably lean' html version without doing things the hard way?

Thanks in advance.
 
If you have PERL installed, I have a program written in PERL that will strip out MOST of the junk that Word puts in generated HTML. The result is pretty basic HTML, but easier to modify than the original.


Tracy Dryden

Meddle not in the affairs of dragons,
For you are crunchy, and good with mustard. [dragon]
 
Thanks folks.

Gus - thanks. Looked quite exciting initially but didn't actually strip out that much. Compressed to about three quarter size versus one quarter when manual.
Tracy - thanks also. Don't have Perl though.

The manual route I am taking is pasting text only from Word then re-generating the numbered lists and paragraphs. A bit painful but the files are small so it doesn't take too long.
 
Look at Aurelia Reporter ( - This is a virtual printer that does an outstanding job of converting any printable document to HTML. It has a free 21-day trial version, but if it works for you, it only costs $50 for a single-user license.

I have converted about a dozen documents and as far as I can tell, it works flawlessly.

A note about how it works: If the documents contain fonts that browser users wouldn't have, Aurelia makes them into graphic backgrounds, so they present exactly as they were created.

Mike Krausnick
Dublin, California
 
Dreamweaver does a pretty good job stripping Word HTML. You may also try HTML Tidy (I haven't tried it). You can also google on 'clean up word html' and find a slew of choices.

Greg
"Personally, I am always ready to learn, although I do not always like being taught." - Winston Churchill
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top