johnsmith98
Instructor
Posted to TekHints Perl forum.
I need a program that will really clean up Word HTML.
I want it to leave only the simple text tags like <p> <b> <html> and <body> (correctly opened and closed).
I need a program that can search and replace using wild cards.
Like Find all instances of <span *> and replace with a blank space.
Now THAT would save me a lot of time. And then I would be getting closer to cleaning up the Word HTML in a way that I don't have to Save as Text in word. I hate doing it that way because that eliminates the img tags.
I do Save as Text in Word before Save for Web because then only one style is left and I can Find Replace that easily.
Does anyone have a program like this or can you write one for me. It seems to me like it would not be difficult for someone with programming experience. A Find and Replace that can use a wild card, not a big demand is it?
First Post
Really Clean up Word HTML
I am using Word 2000 and Dreamweaver 4.
How do I take a word document that has jpg's in it and clean up all the code so the that I am left with only the text, correct links to the images, and your most basic tags like <p> and <br> etc.
Even after Dreamweaver's Command / Clean Up Word HTML there is still so much crap code in there.
My most successful method is to work with two versions of the html files in Dreamweaver.
1. A version of the Word files that I did Save As Text and then Save for Web (the pictures are gone) then I use find replace to clean up the code. Find Replace works relatively quickly because by converting to text there is only one Style left.
2. A version of the Word file that I only did Save for Web to (with the pictures).
Then with both windows open in Dreamweaver I combine them. This is still very time consuming.
There MUST be a better way.
I am taking Word Documents (that have pictures in them) doing Save For Web to create HTML. I then open it in Dreamweaver and I do Command / Clean Up Word HTML. This is pretty good but not good enough.
I wouldn't do this by choice, My boss gave me the Word files and said Do It.
Tom
I need a program that will really clean up Word HTML.
I want it to leave only the simple text tags like <p> <b> <html> and <body> (correctly opened and closed).
I need a program that can search and replace using wild cards.
Like Find all instances of <span *> and replace with a blank space.
Now THAT would save me a lot of time. And then I would be getting closer to cleaning up the Word HTML in a way that I don't have to Save as Text in word. I hate doing it that way because that eliminates the img tags.
I do Save as Text in Word before Save for Web because then only one style is left and I can Find Replace that easily.
Does anyone have a program like this or can you write one for me. It seems to me like it would not be difficult for someone with programming experience. A Find and Replace that can use a wild card, not a big demand is it?
First Post
Really Clean up Word HTML
I am using Word 2000 and Dreamweaver 4.
How do I take a word document that has jpg's in it and clean up all the code so the that I am left with only the text, correct links to the images, and your most basic tags like <p> and <br> etc.
Even after Dreamweaver's Command / Clean Up Word HTML there is still so much crap code in there.
My most successful method is to work with two versions of the html files in Dreamweaver.
1. A version of the Word files that I did Save As Text and then Save for Web (the pictures are gone) then I use find replace to clean up the code. Find Replace works relatively quickly because by converting to text there is only one Style left.
2. A version of the Word file that I only did Save for Web to (with the pictures).
Then with both windows open in Dreamweaver I combine them. This is still very time consuming.
There MUST be a better way.
I am taking Word Documents (that have pictures in them) doing Save For Web to create HTML. I then open it in Dreamweaver and I do Command / Clean Up Word HTML. This is pretty good but not good enough.
I wouldn't do this by choice, My boss gave me the Word files and said Do It.
Tom