Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

converting a word doc to web page, nicely?

Status
Not open for further replies.

warby1212

Programmer
Jun 9, 2003
183
AU
Hi, I've got a bi-monthly newsletter (in Word) which needs to be put on a web site. It's too big to do manually. If I save it in Word as a web page, it does the job but pretty clumsily. Is there a neater way to do it?

I should write something here.
 

As far as I know, you cannot change how Word chooses to save its HTML output (i.e. as bloated as a big bloated thing), aside from the things you can change on the "Save Options" settings page.

If you want something very streamlined, and as good as hand-coding (assuming your hand-coding is good, of course), then you will have to do the job by hand.

Of course, rather than going down the option of stripping all the junk out of your HTML files, you could go the other way: Save as plain text files, and put only the minimum formatting back in.

The other option is to look on Google to see if anyone has produced a decent HTML saving filter for Word.

Hope this helps,
Dan

 
I've also noticed that Open Office does a lot better job of saving out a Word document than Word as HTML. I'd recommend opening the document up with that and if everything imported OK, then save out to HTML for significantly smaller files.
 
If you have Dreamweaver, it does a pretty good job of stripping out the nonsense from Word-generated html.

traingamer
 
Word creates pretty bloated files whether you convert them to HTML or not.

I would convert them to plain text or better yet create them as plain text, using word as a spell-checker only.

Very few HTML tags would be required for a newsletter.
See an example of converting text to HTML at:
This will give you a template to use if you press the button to create the book.

You could probably get by with just the following few tags:

Code:
<h1></h1> thru <h6></h6>
<p></p>
<br />
<b></b>
<i></i>
<img src="picture.jpg" alt="My Picture" />
<table><tr><th>  </th><tr><td>  </td></tr></table>
<ul><li> </li></ul>
<ol><li> </li></ol>

a bit of css would enable you to wrap text around images and change the font to a sans-serif:

<style>
img  {float:left}
body {font-family: arial, sns-serif}
</style>


Clive
 
And, as always on a project like this, the first time you do it will take a long time, but then you'll have a template to use the next time.

Good luck.

traingamer
 
Fantastic, I'll look into all of this. Because I think clean word conversions could be very handy. Will post response here when I have tried various suggestions.
Cheers stephen(warby1212)

I should write something here.
 
It seems that everyone has forgotten about exporting the Word document to PDF (retaining all it's original formatting, fonts, images etc) and allowing pretty much anyone to read your document.

Hope this provides some food for thought.

Jeffy
 
(PDFs) allowing pretty much anyone to read your document
Anybody but the blind, and some search engines (Google seems to be able to cope with PDFs). Why make your information less accessible than it needs to be?

Steer clear of PDF unless you really need it. Nine times out of ten it's a waste of bandwidth.

-- Chris Hunt
Webmaster & Tragedian
Extra Connections Ltd
 
I saved it as a web page in Word, got a 544 kb file; then I saved it as a filtered web page in Word and it went to 249kb; then I imported it into Dreamweaver and used its filters to remove unecessary stuff and got to around 180kb; then I hand stitched the wacky things for two and a half hours. I'm happy. Couldn't try OpenOffice, not installed at this moment but next time I'll go through it first. Thanks for your advice all. Cheers Stephen

I should write something here.
 
I'd bet that this is a page that actually has less than 10K needed to display the content as you want. Recently I worked with a site that had pages created by MS Publisher, the same package that creates the Word pages. The HTML content of the pages ran between 200K and 500K. I used simple HTML, and the largest page I came up with was 5K, the smallest around 2K. The page looks better than the original, sizes to various screen resolutions (rather important nowadays), and the pictures aren't skewed out of proportion.

I ended up not even messing with the original code, but copying the text from the browser display and writing the pages from scratch to match the original layout. The pages also take FAR less time to load, even locally, and the scrolling is much smoother because the browser has so much less to process to move up or down a line or more.

I'm not sure if there's a way to get someone a copy of this page to show you an example of the difference between program-generated code and human-created, but if you have a URL for one page, you might post it to see if there are any takers.

Lee
 
I anyone is interested I have a PERL program that will filter a Word-generated html doc down to just the basics (and I do mean BASICS). You need to have PERL installed on your machine to run it, but you can have it if you want.


Tracy Dryden

Meddle not in the affairs of dragons,
For you are crunchy, and good with mustard. [dragon]
 
This sounds a LOT like a project I inherited several years ago. Another thing to try (if you're not doing so already) is to break up the newsletter into multiple pages, rather than one long document. It makes it much more readable (and faster to load 'cause it's smaller).
I replaced lots of images in backgrounds with background colors (or with very small tiled images for some dithered colors). 2 meg Word html became under 15k with stylesheet. (Under 50k with images). Replace WordArt with styled text.

traingamer
 
A little experiment I tried. I saved a word documant as a webpage with absolutely nothing on it, not text at all and the size of the file was 1.54 k. the same valid document written in notepad was 151 bytes. Hmm bloated code or what?

Glen
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top