Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Word Selected to text to HTML on Clipboard?

Status
Not open for further replies.

lameid

Programmer
Jan 31, 2001
4,207
US
I am really unfamiliar with Word's object model.

Would there be a way to take selected text, convert it to HTML (really only need formatting tags for bold, underline, color etc.) and put it on the clipboard?

On the surface it seems like it may be easy if Word has a built in conversion to HTML. Otherwise where would I look for the individual formatting to convert to tags?

I am hoping somebody knows it well enough to get me going down the right path.
 
I am not following what you want to do.

Are you trying to underline or bold text or are you trying to copy it somewhere?

And yes, text can be converted to HTML, just don't remember how at the moment.

It has taken me a while to make sense of what I hear at work involving computers. There is much talk of bugs and questions about Raid.
Therefore I have come to the logical conclusion that the only way to have a properly functioning computer is to regularly spray it with Raid bug killer.
 
I want to take the formatted word text and put it on the clipboard as HTML so I can paste the HTML.
 
The most common things tags I expect to need are bold, italics and underline... Come to think of it, break tags would be nice too for line breaks.

Do you have any thoughts on the nomenclature of converting the text to HTML... I'm happy to search for the individual pieces and do the leg work, I just don't know what things I need are called.

 
So if in Word I have...


Code:
Hello [b]World[/b]

I would want the corresponding HTML below on my clipboard...

Code:
Hello <b>World</b>
 
If you need to convert selected text in word to HTML, you could use ms html objects and libraries and create small converter on the userform.

References:
- dhtmled.ocx, add DHTMLSafe to userform's toolbox,
- MSHTML.tlb (Microsoft HTML Object Library, not necessary, but helpful to get only body).

UserForm with:
- DHTMLSafe,
- TextBox, set to multiline, with vertical scrollbar, resize for easier reading,
- CommandButton.
Code:
Private Sub CommandButton1_Click()
With Me
    .TextBox1.Text = .DHTMLSafe1.DOM.body.innerHTML
    ' without reference to MSHTML:
    ' .TextBox1.Text = .DHTMLSafe1.documentHTML
End With
End Sub

Now copy text in word, display userform, paste text onto HTMLSafe control and click CommandButton. You should get html. It needs some polishing, as there are mso styles in the html text.



combo
 
>the corresponding HTML

The problem is that Word's corresponding HTML is sadly not what you have shown. As combo has alluded, rather than

[tt]Hello <b>World</b> [/tt]

The HTML on the clipboard is more like:

[tt]<p class=MsoNormal>Hello <b style='mso-bidi-font-weight:normal'>World</b><o:p></o:p></p>[/tt]

(not completely true, the HTML on the clipboard is substantially more than this, but out of all that HTML this represents the text selection copied)

which the DHTML solution (which we could avoid if we wanted, since it is pretty straightforward extracting the HTML straight from the clipboard) mangles further to:

[tt]<P style="MARGIN: 0in 0in 10pt" class=MsoNormal><FONT face=Calibri>Hello <B style="mso-bidi-font-weight: normal">World</B></FONT><?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:eek:ffice:eek:ffice" /><o:p></o:p></P>[/tt]

So you'd actually need to do quite a lot of interpretaion and conversion of the HTML to get your simplified version.

Perhaps you could outline what it is you are really trying to do. yes, I understand you want to get HTML as text in the clipboard - but why? What is the purpose of that?
 
I work on web surveys and implement them from a version in Word. Any text I paste into the the application, pastes the text and not the formatting and it is a pain to manually add the formatting... there is a cheat tool bar much like on this site or in word but still copy, paste AND format gets tedious.

Each survey question has to be implemented separately, kind of like a Userform control but that analogy is a little weak. And since these are web surveys, obviously the end product is HTML so text formatting is HTML. So really I want to pick up the bold, italic and underline formats and put the text with the tags for those on the clipboard and I'd also probably replace the CRLF's with br tags. Then when I assign that macro to a key stroke, I replace my use of ctrl+C and skip the tedium of formatting the plain text with HTML in the target application. Or Sawtooth could implement better paste functionality (doubt it, I requested ctrl+v instead of right click, context menu paste a few years ago...). I figure I can bash this into the Word side and save a few steps.

It seems like using any native Word HTML converter is more trouble than it is worth. This leads me to my other thought of detecting the format and inserting the tags via code. Unless there is a better method... I still would not know how to detect the formatting without searching for it.
 
>implement them from a version in Word

I appreciate that you maybe didn't get the choice - but Word is a terrible tool to be using as an HTML editor, and that's pretty much the root cause of your problems here ...
 
I think you are making this harder than it is. It should be as easy as doing text manipulation to insert several HTML tags on the clipboard and not the document itself. All I need to know really is the methodology to pull the formatting in selection which I suspect maybe a little involved but I would hope not.
 
> making this harder than it is

Your early assumption was that:

"seems like it may be easy if Word has a built in conversion to HTML"

Well, Word does have - but the HTML that comes out of Word is clumsy. That's not us making it hard, it's Word. Word drops a ton of proprietary stuff into the HTML source. It's why there are any number of free and paid for applications available to clean up Word HTML.

Sure, you can parse the text selected in Word yourself, for example by examining each character in Selection.Range.Characters and tracking whether the format has changed - but unless you are only interested in very simple HTML tags I'd suggest it'll be a pain.




 
You don't say what the target application is. If it supports formatting it seems odd to me that you cannot paste formatted text, but I haven't fully grasped all of the detail.

That said, if what you want is Word's formatting expressed as HTML, then taking what Word provides should do the trick - yes, it's not entirely straightforward and it's ridiculously overblown HTML, but does that actually matter for the job at hand?

Enjoy,
Tony

------------------------------------------------------------------------------------
We want to help you; help us to do it by reading this: Before you ask a question.

I'm working (slowly) on my own website
 
In my brief poking around I've seen the font object that hangs off Selection which I am guessing is technically a Range object. So then the piece I still don't have in my mind is how to select a range for a character in a selection.

Seems like I need about 120 lines of code assuming that looping is a basic loop and my assumptions are right.
 
>the piece I still don't have in my mind is how to select a range for a character in a selection

I've already provided that: Selection.Range.Characters
 
Characters as I understands it returns all the characters of the selection... clearly what I need to do is loop over each character and test its font properties for formatting to know where to insert the tags...

Or maybe what I need to do is create a new document as a scratch pad copy the selection, paste in there and do a find and replaces for the tags I am interested in...

Seems like the former is easier to write because I can test all formats for each character at a time. But I am just guessing at specifics because I don't know Word.



 
The help files are really useful for this sort of thing. They specifically state:

Characters Object
"A collection of characters in a selection, range, or document. There is no Character object; instead, each item in the Characters collection is a Range object that represents one character.

And I was too explicit in accessing that object; we can go with Selection.Characters rather than Selection.Range.Characters (they get the same result).

So something like the following works fine for seeing if a character is bold or not:

Code:
[blue]Dim myChar As Range
    
For Each myChar In Selection.Characters
    Debug.Print myChar.Font.Bold
Next[/blue]

 
Just a start, good for bold tags... Italic and underline should be easy from there... CRLF ... probably just the </br> tag on the suffix tag variable.

Code:
Sub x()
    
    Dim rngChar As Range
    Dim fnt As Font
    
    Dim bolBold As Boolean
    Dim bolUnderLine As Boolean
    Dim bolItalic As Boolean
    Dim bolCRLF As Boolean
    Dim strHTML  As String
    Dim strTagSuffix As String
    
    
    For Each rngChar In Selection.Characters
        strTagSuffix = ""
        Set fnt = rngChar.Font
        If fnt.Bold Then
            If bolBold Then
                'Do nothing not a new bold character
            Else
                'New bold character, flag and insert HTML tag
                bolBold = True
                strHTML = strHTML & "<b>"
            End If
        Else
            If bolBold Then
                bolBold = False
                strTagSuffix = "</b>" & strTagSuffix
            End If
        End If
        
        'Repeat nested if blocks for each format like bold above... 
        
        strHTML = strHTML & rngChar.Text & strTagSuffix 'beginning tags are added before the character, 
                                                         'and the logged suffix / ending tags are logged in strTagSuffix to be added here
    Next rngChar
    
    Debug.Print strHTML 'Grab API call from MSDN to put strHTML on clipboard
    Set fnt = Nothing
    Set rngChar = Nothing
End Sub
 
Not as easy as you think, actually (hence my comments earlier). I'll let you figure put why your code won't always work for tracking bold (or other basic formatting) tags.
 
A fairly simple, though crude, workaround would be to simply add the html tags to the text before copying it as text:
Code:
Sub Demo()
Application.ScreenUpdating = False
With Selection.Range
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = ""
    .Forward = True
    .Wrap = wdFindStop
    .Format = True
    .MatchWildcards = True
    .Font.Bold = True
    .Replacement.Text = "<b>^&</b>"
    .Execute Replace:=wdReplaceAll
    .ClearFormatting
    .Font.Italic = True
    .Replacement.Text = "<i>^&</i>"
    .Execute Replace:=wdReplaceAll
    .ClearFormatting
    .Font.Underline = True
    .Replacement.Text = "<u>^&</u>"
    .Execute Replace:=wdReplaceAll
  End With
End With
Application.ScreenUpdating = True
End Sub

Cheers
Paul Edstein
[MS MVP - Word]
 
I revisited this, and came up with:

Code:
[blue]Public Sub test()
    MsgBox Tag
End Sub

[green]' Applies some simple HTML tags to text in current Selection[/green]
Private Function Tag() As String
    Dim myDoc As New Document
    
    Selection.Copy 
    myDoc.Select
    Selection.Paste
    myDoc.Select

    Selection.Find.ClearFormatting
    Selection.Find.Font.Bold = True
    TagFormat "b"
    
    Selection.Find.ClearFormatting
    Selection.Find.Font.Italic = True
    TagFormat "i"
    
    Selection.Find.ClearFormatting
    Selection.Find.Font.Underline = True
    TagFormat "u"
    
    Tag = Selection.Text
    [green]' You could do a Selection.Copy here to get the tagged text on the clipboard[/green]
    myDoc.Close False
End Function

Private Sub TagFormat(strTag As String)
    With Selection.Find
        .Replacement.Text = "<" & strTag & ">^&</" & strTag & ">"
        .Execute Replace:=wdReplaceAll
    End With
End Sub
[/blue]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top