Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Import PDF into Word Document 1

Status
Not open for further replies.

KristianDude

Technical User
Mar 18, 2016
65
US
Well, I am stumped so I am back with another one. I have manually imported a PDF successfully using the insert -> object function in word, but am not really wanting to go with this method as of the long wait associated and it just doesn't format very nicely. Rather, if I open a PDF, use the "take a snapshot" in any of my PDF reader programs and paste that into word as an image it looks great. Is there anything you guys can point me to in which I can have VBA doing this process for me? Thank you for anything! :]
 
So, I mostly have it.. I just need help with what reference to use to open a file, take a snapshot and close the PDF file pretty please!! :]
 
I have made a bit of progress with being able to use this concept via Adobe Reader within the Internet Explorer window. Can anyone help me finesse this a bit please?? I use a time delay procedure hence the WaitSeconds(10) and when this file opens, I can't manually hit ctrl+c and ctrl+a. I have to left click on the document and then I can ctrl+c. Any help here would be GREATLY appreciated guys. Thanks!

Code:
Sub PDFcopy()
    Dim ie As Object
    Dim sPath As String
    Dim fso, fls
    
    sPath = "C:\Folder Path\"
    
    Set ie = CreateObject("InternetExplorer.Application")
    
    ie.Visible = True
    
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set fls = fso.GetFolder(sPath)

            ie.navigate "file://" & sPath & "M.pdf"
            
            WaitSeconds (10)
            'SendKeys "^(a)", True
            
            WaitSeconds (5)
            
            Err.Clear
            Do
                On Error Resume Next
                SendKeys "^(c)", True
            Loop While Err.Number <> 0 'And Len(ClipBoard_GetText) = 0
            
    
    ie.Quit
    
    Set ie = Nothing
    Set fls = Nothing
    Set fso = Nothing
End Sub
 
Well, I came across these commands and they don't seem to be triggering. I have stepped through the code with no success. Anyone here have any experience using these commands? I can open the PDF via reader in the InternetExplorer but just don't seem to be able to pass these commands to the explorer window. An error message pops up saying that the "object invoked has disconnected with the client".. any pointers would be amazing!! :]

Code:
  With ie
            .ExecWB OLECMDID_SELECTALL, OLECMDEXECOPT_DONTPROMPTUSER
            .ExecWB OLECMDID_COPY, OLECMDEXECOPT_DODEFAULT
            End With
 
Given that the entire purpose of a PDF is to maintain a fixed layout no matter where/how it is displayed or printed, what do you mean by "it just doesn't format very nicely"?
>I came across these commands and they don't seem to be triggering.

Well, you seem to be missing some key code, where your constants are defined (e.g. OLECMDID_SELECTALL). This also suggests that you do not have Option Explicit set

So, you might like to add to your code

Code:
[blue]Option Explicit
Const OLECMDID_SELECTALL = 17 
Const OLECMDEXECOPT_DODEFAULT = 0
Const OLECMDID_COPY = 12[/blue]

Having said that, I'm not certain this is going to work quite the way you want it to work - neither your original PDFCopy function nor your OLECMDID_SELECTALL variant will copy/snapshot the PDF; they will simply try and copy the text in the PDF, if it is accessible (some PDFs are protcted against this).
 
Well, I was able to use your recommendation and the code is operational now!.. It opens the PDF and saves it as a word doc now. I am thinking this code can be used in a different way without creating a new document.. can anyone see this process happening above in a different order?? It would be

1) use dialog box to select a PDF
2) open the pdf with a separate instance of Word (I believe this function is available only in 2013 or after?)
3) copy the content of the PDF as an image
4) paste in the active document
5) close the new instance of word

I will work on this, but you guys always seem to come up with better solutions than I. Thank you for your time and help!! :]
 
Hey everyone.. sorry to keep beating this dead horse, but what if I used this in my code??..

ActiveDocument.CommandBars.ExecuteMso ("ObjectSaveAsPicture")

I step through the code and it's not pasting, but if I run this command manually either by right clicking the mouse or adding it to the ribbon bar, it will actually paste the image!!.. I just don't seem to be able to get this to work out. Maybe I have the focus on the wrong instance of Word??.. getting super close, I can feel it! Anyone able to help connect the dots here?.. here is the code I have running!..

Code:
Sub convertToWord()
   Dim MyObj As Object, MySource As Object, file As Variant
   file = Dir("C:\Users\FilePathGoesHere\" & "examplefile.pdf") 'pdf path
   Do While (file <> "")
   ChangeFileOpenDirectory "C:\Users\FilePathGoesHere\"
          Documents.Open FileName:=file, ConfirmConversions:=True, ReadOnly:= _
        False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:= _
        "", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", _
        Format:=wdOpenFormatAuto, XMLTransform:=""
    
    ActiveDocument.SelectAllEditableRanges
    ActiveDocument.CommandBars.ExecuteMso ("ObjectSaveAsPicture")
    ActiveDocument.Close

'paste into original document (running this code instance)
selection.paste

End Sub
 
Can anyone figure out what type of shape/range/object is being created in the below mentioned code? It is meant to convert a SCANNED PDF file by opening it in Word 2013. If I select it (the resulting image) with the mouse I am then able to manually copy & paste it (the image) into the word document that I am trying to get it in to!.. I can't figure out how to select the image in VBA. That is all that I need and this code is done! Help please anyone!! :]

Code:
Sub convertToWord()
   Dim MyObj As Object, MySource As Object, file As Variant
   file = Dir("C:\Users\FilePathGoesHere\" & "examplefile.pdf") 'pdf path

   ChangeFileOpenDirectory "C:\Users\FilePathGoesHere\"
          Documents.Open FileName:=file, ConfirmConversions:=True, ReadOnly:= _
        False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:= _
        "", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", _
        Format:=wdOpenFormatAuto, XMLTransform:=""
    
    'THIS IS WHERE I NEED TO SELECT THE IMAGE
    'THIS IS WHERE I NEED TO COPY THE IMAGE
    ActiveDocument.Close

'paste in to original document I am working with (running this code)
selection.paste

End Sub
 
Well.. I have it working. I will incorporate a File Dialog box into this code to select which file/s to select for the end user, but here is a working rough draft. Any of the higher up thinkers on this forum I would love a little help in case you see anything I could be doing better here. Thanks.

Code:
Sub convertToWord()
   Dim MyObj As Object, MySource As Object, file As Variant
   Dim lHwnd As Long
   Dim wordApp
   
   wordApp = ActiveDocument
   file = Dir("FolderPathHere" & "FileNameHere.pdf") 'pdf path
   
   ChangeFileOpenDirectory "FolderPathHere"
          Documents.Open FileName:=file, ConfirmConversions:=False, ReadOnly:= _
        False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:= _
        "", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", _
        Format:=wdOpenFormatAuto, XMLTransform:=""
        
        'convert/select/copy/close PDF
        Selection.WholeStory 'Select whole document
        Selection.Expand wdParagraph 'Expands your selection to current paragraph
        Selection.Copy 'Copy your selection
        
        'close converted file
        ActiveDocument.Close (Word.WdSaveOptions.wdDoNotSaveChanges)
        
        'paste into active document
        Selection.EndKey wdStory 'Move to end of document
        Selection.PasteAndFormat wdPasteDefault 'Pastes in the content
        
    'clear clipboard
    ClearClipboard
    
    'save
    SaveDoc

End Sub
 
I'm guessing that you must be using Word 2013, since previous versions do not have the ability to directly open PDFs (previous versions treat PDFs as text documents, and import all the contents as text ...)
 
Strongm: The OP mentions Word 2013 in his second most-recent post.

KristianDude: The code you posted copies the entire contents of the file, not just images. For images in PDFs opened in Word 2013, you should be able to access them as inlineshape objects. Regardless, your code is quite inefficient. You should be able to do what you're now doing without selecting anything and without using copy & paste. Furthermore, apart from the 'file' and 'wordApp' variable declarations, the rest seem redundant since you never use them. SaveDoc is not a valid Word VBA command. As for 'wordApp', that's a particularly poor choice for a variable assigned to a document rather than an application.

Assuming you're trying to convert an entire PDF to Word, try:
Code:
Sub ConvertPDF2Word()
  Dim StrFile As String
  With Application.FileDialog(msoFileDialogOpen)
    .Filters.Clear
    .Filters.Add "PDF Files", "*.pdf"
    .AllowMultiSelect = False
    .Show
    If .SelectedItems.Count = 0 Then Exit Sub
    StrFile = .SelectedItems(1)
  End With
  Documents.Open FileName:=StrFile, AddToRecentFiles:=False
  ActiveDocument.SaveAs2 FileName:=Split(StrFile, ".pdf")(0) & ".docx", _
    FileFormat:=wdFormatXMLDocument, AddToRecentFiles:=False
  ActiveDocument.Close False
End Sub
Assuming you're trying to import an entire PDF to the end of an existing Word document, try:
Code:
Sub ImportPDF2Word()
  Dim StrFile As String, Rng As Range, DocSrc As Document
  With Application.FileDialog(msoFileDialogOpen)
    .Filters.Clear
    .Filters.Add "PDF Files", "*.pdf"
    .AllowMultiSelect = False
    .Show
    If .SelectedItems.Count = 0 Then Exit Sub
    StrFile = .SelectedItems(1)
  End With
  Set Rng = ActiveDocument.Range.Characters.Last
  Rng.InsertAfter vbCr
  Rng.Collapse wdCollapseEnd
  Set DocSrc = Documents.Open(FileName:=StrFile, AddToRecentFiles:=False)
  With DocSrc
    Rng.FormattedText = .Range.FormattedText
    .Close False
  End With
  Set Rng = Nothing: Set DocSrc = Nothing
End Sub

Cheers
Paul Edstein
[MS MVP - Word]
 
Strongm & Paul,

Sorry.. been over-busy these past few weeks!! Thank you very much for your insight. Due to a lack of coding skills on my end and this just being a side hobby of mine, the way I was looking to make this work (until now that is) is to only import scanned PDF documents. The code I posted will copy and paste the entire page as an image. Importing/Pasting the PDF into my document as an image is ideal as I need to size it to a 7" width while keeping the aspect ratio. My request for the copy/paste in the other post was directly related to this PDF import, but it seemed to be a dead post, but here we are!! :D ... I will definitely be trying out this ImportPDF2Word function!! Thank you guys!!
 
The code I posted will copy and paste the entire page as an image.
Sorry to have to pop your balloon, but that's not what your code does. It merely copies & pastes the entire PDF - images & text alike. Of course, if the PDF only contains scanned images that haven't been OCR'd, all you'll get is a set of images. Conversely, if your PDF has a mix of text and images, including PDFs that contain page images that have been OCR'd, you'll get both. In the latter case, the OCR'd text will still be sitting behind the pasted images.

Cheers
Paul Edstein
[MS MVP - Word]
 
Good catch.. yes this is intended for non-OCR scanned PDF documents only. I tested it with other PDF versions with layers and text and it is not good at all for that. For my purposes, I don't use OCR so this is actually a great fit for what I am in need of. Thank you gentlemen for all the help. I hope to get some time to run through your latest post Paul and will report back!

A note to anyone else that may be reading this post.. there is a ton more functionality when using references to say, the full paid version of Adobe Acrobat. I like to make my stuff usable with default and/or free software.. ideally, packaged entirely within the Word Document that I am working in at the time that I am running the code (ie: how I am using this code). This is just a workaround for other easy to use features that do require upgraded PDF software and/or 3rd party software. The hope is that one day, the office suite will make it easier somehow... but until then. :]


-Kristian
 
Paul,

This tidbit of code is working pretty great. Do you know what this import is? I combined it with the code from the other post (which is this one Link) and when I step through it sees "0" inlineshapes. I realize this was in reference to pasting an image and this is not exactly what we are doing here. Do I need to convert it to an inlineshape to be able to lock the aspect ratio and then resize it to a desired size?... Here's how I combined them if it helps better to see it?... thanks!

Code:
Sub ImportPDF2Word()
  Dim dlgOpen As FileDialog, StrFile As String
  Dim Rng As Range, DocSrc As Document
  With Application.FileDialog(msoFileDialogOpen)
    .Filters.Clear
    .Filters.Add "PDF Files", "*.pdf"
    .AllowMultiSelect = False
    .Show
    If .SelectedItems.Count = 0 Then Exit Sub
    StrFile = .SelectedItems(1)
  End With
  Set Rng = ActiveDocument.Range.Characters.Last
  Rng.InsertAfter vbCr
  Rng.Collapse wdCollapseEnd
  Set DocSrc = Documents.Open(FileName:=StrFile, AddToRecentFiles:=False)
  With DocSrc
    Rng.FormattedText = .Range.FormattedText
    .Close False
  End With
  With Selection
  .Start = .Start - 1 'move the start back one postion to include the image
  If .InlineShapes.Count = 1 Then
    'resize the image
    With .InlineShapes(1)
    .LockAspectRatio = True
    .width = InchesToPoints(3)
    '.Height = InchesToPoints(2)
    End With
  End If
End With

  Set Rng = Nothing: Set DocSrc = Nothing
End Sub

 
Paul.. I should also followup on the comment regarding the "ClearClipboard" and "SaveDoc" items I posted in the code a few posts ago.. those are references to a public sub I use to do those very things. :]
 
I doubt your "ClearClipboard" and "SaveDoc" reference are of much benefit here. "ClearClipboard", especially, is irrelevanmt since the code I posted never uses the clipboard. "SaveDoc" may do something useful, but I doubt it's doing much that ActiveDocument.Save wouldn't. You also wouldn't use:
Code:
  With Selection
  .Start = .Start - 1 'move the start back one postion to include the image
  If .InlineShapes.Count = 1 Then
    'resize the image
    With .InlineShapes(1)
    .LockAspectRatio = True
    .width = InchesToPoints(3)
    '.Height = InchesToPoints(2)
    End With
  End If
End With
since nothing is being selected. Besides which, that code would only cope with a single scanned PDF page. Instead, you'd use code like:
Code:
Sub ImportPDF2Word()
  Dim StrFile As String, Rng As Range, DocSrc As Document, i As Long
  With Application.FileDialog(msoFileDialogOpen)
    .Filters.Clear
    .Filters.Add "PDF Files", "*.pdf"
    .AllowMultiSelect = False
    .Show
    If .SelectedItems.Count = 0 Then Exit Sub
    StrFile = .SelectedItems(1)
  End With
  Set Rng = ActiveDocument.Range.Characters.Last
  Rng.InsertAfter vbCr
  Rng.Collapse wdCollapseEnd
  Set DocSrc = Documents.Open(FileName:=StrFile, AddToRecentFiles:=False)
  With DocSrc
    Rng.FormattedText = .Range.FormattedText
    .Close False
  End With
  With Rng
    For i = 1 To .InlineShapes.Count
      With .InlineShapes(i)
        .LockAspectRatio = True
        .Width = InchesToPoints(3)
        '.Height = InchesToPoints(2)
      End With
    Next
  End With
  Set Rng = Nothing: Set DocSrc = Nothing
End Sub
Indeed, if I were doing this, I'd probably modify the code to automatically get the page dimensions and fit the PDF images to that, so they always fill, or at least scale to, the page size.

Cheers
Paul Edstein
[MS MVP - Word]
 
Thanks Paul. I stepped through it and it is still not recognizing the inserted image.. might you know what this object is so that I can use a simple resize like the example in this latest code... or how I can find it out? When I double click on it, it acts like a picture as the Format picture tab pops up on the ribbon.

Also, just curious of your opinion on these subs that I run. I use them in multiple functions which is why I ended up just putting them out there on their own and referring to them.

Code:
Public Sub SaveDoc()
'Save Document No Prompt Original Format
    Documents.Save NoPrompt:=True, _
     OriginalFormat:=wdOriginalDocumentFormat
End Sub

The ClearClipboard is a pretty robust function created or referred to by Strongm. I find it a must when working with any clipboard events... but as this topic doesn't currently have a clipboard function in it, it really is not necessary for this one! :]

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top