Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Reading PDF files

Status
Not open for further replies.

KKit

Technical User
Jun 13, 2007
28
CA
I have an application for a client, and one of the functions is that it interacts with Outlook to save attachments.

The customer has asked me if I can scan through each attachment and automatically extract information for the database dependent on certain keywords.

All of their attachments in question are .pdf files.

Does anyone have any idea where to point me to find code, or some type of instructions how to scan a .pdf file for certain keywords, and how to extract information from that file?

Thanks so much.
 
I think you're going to struggle with this one.
AFAIK, there is no way in straight VBA to do what you want. There are some activex controls that will enable you to open and read PDF files, perhaps one of those can help. I've also found this
Code:
Public Function ReadAcrobatDocument(strFileName As String) As String 

Dim AcroApp As CAcroApp 
Dim AcroAVDoc As CAcroAVDoc 
Dim AcroPDDoc As CAcroPDDoc 
Dim AcroHiliteList As CAcroHiliteList 
Dim AcroTextSelect As CAcroPDTextSelect 
Dim PageNumber, PageContent, Content, i, j 

Set AcroApp = CreateObject("AcroExch.App") 
Set AcroAVDoc = CreateObject("AcroExch.AVDoc") 

If AcroAVDoc.Open(strFileName, vbNull) <> True Then 
    Exit Sub 
End If 

Set AcroAVDoc = AcroApp.GetActiveDoc 
Set AcroPDDoc = AcroAVDoc.GetPDDoc 

For i = 0 To AcroPDDoc.GetNumPages - 1 

    Set PageNumber = AcroPDDoc.AcquirePage(i) 
    Set PageContent = CreateObject("AcroExch.HiliteList") 

    If PageContent.Add(0, 9000) <> True Then 
        Exit Sub 
    End If 

    Set AcroTextSelect = PageNumber.CreatePageHilite(PageContent) 

    For j = 0 To AcroTextSelect.GetNumText - 1 
        Content = Content & AcroTextSelect.GetText(j) 
    Next j 

Next i 

ReadAcrobatDocument = Content 

AcroAVDoc.Close True 
AcroApp.Exit 

Set AcroDoc = Nothing 
Set AcroApp = Nothing 

End Function

that gets you somewhere near, but I think it requires you to have acrobat on the machine.

As a final thought, you could create your own addin using Visual Studio & the iTextSharp project, which will let you manipulate your PDF document.

Hope that makes some sense.

Ben

----------------------------------------------
Ben O'Hara
David W. Fenton said:
We could be confused in exactly the same way, but confusion might be like Nulls, and not comparable.
 
Thanks. That is a great place for me to start. I will play with this one today, and see how far I get.

Thank much!

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top