Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

extract pdf file by contract name using vba

Status
Not open for further replies.

abenitez77

IS-IT--Management
Oct 18, 2007
147
US
I have about 1,000 pdf files and each file has about 50 pages. I want to split/extract the pages out of each file onto it’s own file (should be 1-3 pages). The pdf file contains Contract Name. I want the file to print every time it finds a new contract name. It is usually 1 contract per page, but some contract may have up to 3 pages (could be more but that is what I found so far). How can i do this? I have adobe acrobat and ms office 2010. I’m very familiar with vba but I am open to doing it with another language /technology. Any help is appreciated.
 
I'm not running 2010 but if I remember correctly, from Office 2010 onwards, you can save PDFs from word. Just check if you can open a PDF in word - then it is a matter of using Word VBA to do the searching. Haven't done anything like that before but quite sure it is possible.
 
My goto would be to dig into the Adobe Pro forum and see if I could figure it out. But by acrobat, I think you mean Reader so you might not have it? Another thought, if you can print to PDF would be to try to figure out doing the find and print within reader. I have no idea if it is possible. You may find other programs or libraries out there.
 
You can gain access to all the functionality Acrobat exposes through what it calls IAC (Interapplication Communication) by setting Acrobat as a reference in your VBA project. Then your pseudocode would look something like this:

'create an adobe AVDoc document object
'open the pdf file in the AVDoc
'acquire the PDDoc object of the AVDoc
'acquire the first PDPage object in the PDDoc
'process start:
'create another PDDoc
'read the contract name
'do until contract name changes or out of pages:
' add the current page to the second PDDoc
' get the next page if there is one
' read the contract name
'loop
'save / print the second PDDoc as your individual contract file
'repeat from process start until out of pages in the contract file

Getting documentation of the functions you'll need will be easiest through adobe. What functions you use is probably going to depend on how uniform your page layout is. If your contract name is always in the same physical position, for example, you can grab your text by area rather than looping through individual words to find what you want.

TMTOWDI - it's not just for Perl any more
 
This assumes the OP has the full Acrobat application, rather than just Reader
 
I have adobe acrobat and ms office 2010."

OP explicitly says they have Adobe Acrobat. I don't feel like that's so much an assumption as me having read the post and trusted OP to know what they have installed. YMMV.

TMTOWDI - it's not just for Perl any more
 
My point is that many people say Acrobat when they mean Reader, and alerting the OP that if they actually meant Reader, then full Acrobat solutions will not work.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top