Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Reading each word in a paragraph using Excel

Status
Not open for further replies.

scootgp200

Technical User
Jul 10, 2009
8
GB
Hi,

Thanks to some great advice on a previous thread, using Excel I am able to open a word doc exctract each paragraph test for the applied style and write it into a line of html.

However I realise now that I need to go deeper into the document as normal paragraphs will possibly contain some text formatted as bold, italic or as a hyperlink. In those cases I would want to write the appropriate html tags around those words and in the case of a hyperlink extract the url and to make a working hyperlink in the html code.

Here is the code I now have to extract a paragraph
Code:
Select Case iParagraph.Style       
Case "Heading 1"       
' heading 1 stuff           
Sheets("Sheet3").Range("a1000").End(xlUp).Offset(1, 0).Value = "<H1>" & iParagraph.Range.Text & "</H1>"       Case "Heading 2"       
' heading 2 stuff       
Case "Heading 3"       
' heading 3 stuff       
Case "Normal"       
' Normal stuff       
Case Else       
' if it is something else?       
End Select

Can anyone tell me the code to read the style of each word in the paragraph.

Many Thanks
Bryan
 
You can't work at the word level for this - consider a hyperlink like Tek Tips. Which word is he hyperlink? There are potentially all sorts of things that will cause you problems. You need to identify what Word calls 'Runs' - but that is a concept behind the scenes, not really made available to the UI or VBA. What you propose is a lot of code and you might find it a better approach to save the document as html and extract the tagged text - difficult to say without knowing what you are really trying to do.

Enjoy,
Tony

------------------------------------------------------------------------------------
We want to help you; help us to do it by reading this: Before you ask a question.

I'm working (slowly) on my own website
 
Good answer Tony.

Bryan, this is the reason why serious Word users avoid any and all manual formatting. We use - and as much as possible, ONLY use - Styles. This includes any character format within a paragraph.

There is some bolding in this paragraph.

Now suppose the whole paragraph used MyMainText style, and the "some bolding" used the MyBold character style.
Code:
Dim aWord
Dim msg As String
For Each aWord In ActiveDocument.Paragraphs(1) _
            .Range.Words
   msg = msg & aWord.Style & vbCrLf
Next
msg would be:

MyMainText
MyMainText
myBold
myBold
MyMainText
MyMainText
MyMainText
MyMainText
MyMainText

See? You CAN get character level style information...but only if you have used character level styles.

Otherwise (you are going to hate to hear this) you have to test for EVERY single possible character format individually. Font, font size, bold, underline, italics...everything. In other words:

"Can anyone tell me the code to read the style of each word in the paragraph."

The above code can, but ONLY, repeat ONLY, if a style has actually been used. If any text was manually formatted - and I am willing to bet it is - there is no code that will do this, other than testing every possible formatting option, one by one. So I would like to re-enforce Tony's comment:

Tony said:
What you propose is a lot of code and you might find it a better approach to save the document as html and extract the tagged text - difficult to say without knowing what you are really trying to do.

I am still wondering about using Excel to do this. I can not for the life of me understand why.

"A little piece of heaven
without that awkward dying part."

advertisment for Reese's Peanut Butter Cups (a chocolate/peanut butter confection)

Gerry
 
Thanks for your input Tony and Gerry, it's been helpful. The content written by other users would be created using word (a tool they can all use) and would be created using specific and limited formats. Using excel enables me to hold a site map with appropriate meta values etc. I can then build a complete web site with folder structures and navigation includes files based on the structure, creating blank pages where content docs had not been provided for editing later and pulling in the content in the word documents if provided.

I have taken onboard the suggestion of saving the word doc as html. Of course good old MS Word still wants to create masses of unwanted formating even if using the filtered html option which is a pain when you want the site controlled by css.

Thanks to this and your help on my previous thread I now have a spreadsheet that based on a site map will build a complete site and if the file name matches a .doc file name in a specific folder it will open the word file and save it as filtered html. I can now open the html file and read each string into a cell, as each paragraph will end in either </P> </H1> </H2> etc I can concatonate wrapped paragraphs into one string.

As strict formatting will be used on the word docs I can remove the MS in page style/formatting created using the substitute function with "" resulting in a code ready for my css and it will now bring across bold, italic and working hyperlinks as required by the author.

I have a bit more work to do to perfect the main navigation as it will build a different menu depending on where the file sits within the site structure.

I have run a test creating 20 html files folders and sub folders, each page navigating between each other all with general and page specific metadata and 4 word documents whoes content was stripped out and placed within the 4 associated html files successfully which took less than 30 seconds.

A new site of 1000 pages would take no time to create compared to building it by hand.
Thanks
Bryan
 
Hi Bryan, if you would...when you have solid, working, code, if you would be willing to share, that would be nice.

I must admit your task has me quite curious.

"A little piece of heaven
without that awkward dying part."

advertisment for Reese's Peanut Butter Cups (a chocolate/peanut butter confection)

Gerry
 
Hi Gerry, I have all the elements almost built, in test it works well but need to pull it all together now and add in the additional html code that the site will need. It may not be the best code but it works. I will post when finished or if I get stuck again.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top