Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extract pages from a PDF that have certain “text or numbers”? 2

Status
Not open for further replies.

bhphoto

IS-IT--Management
Jul 30, 2003
57
CH
Hi there,

How are you today?

Is there a possibility to get from acrobat or any PDF tool to extract pages that only have a certain “text or numbers”?

Here is an explanation, we asked from one of our vendors to give us all bills form last year, so they answered us that they have it archived with into a PDF format and it is saved by month but they wont give it to us since they combined all of their clients in one file and they wouldn’t give us the info from our competitors, but they gave us an option which is:

If we give them and idea of how to extract all of our bills only from these PDF’s they will do it, so I am thinking if there is a way that acrobat or any PDF tool should be able to look for certain text like a customer ID and only these pages to extract into a new file?!

I would appreciate very much your help and any ideas are welcomed

Thanks-in-Advanced
Joel Braver

 
This can be done, but only in a roundabout manner. My approach was with JavaScript. I made a function, and then a menuitem to execute the function. It's a hack, since there is no API into the search results. Also, my code actually divides the documents into discrete sections based on the search string, rather than a "targeted" extraction. But you hopefully get the idea and can derive something useful.

Code:
var nStart;
var nEnd;
var nFileName;

app.addMenuItem(
  { cName:   "SectionK", 
    cParent: "File",
    cExec:   "K();",
    cEnable: "event.rc = (event.target != null);",
    nPos: 0
  });
 
function K() {
  var i = 0;
  var p = 0;
  var ret = 0;
  var lastPage = (this.numPages - 1);
  var rootFile = this.path.split(".pdf")[0];
  searchHits = new Array();

  var x = search.query("Your Search here","ActiveDoc");

  app.alert("Wait for search to complete, then click OK.");

  while (ret==0) 
    { ret = this.pageNum;
    };

  while (true)
    { p = this.pageNum; 
      searchHits[i] = p; 
      app.execMenuItem("FindAgain");
      if (this.pageNum == p) { break };
      i++;
    };

  while (i > -1)
  {
    nStart = searchHits[i];
    nEnd   = lastPage;
    nFileName = rootFile + "_" + i + ".pdf";

    this.extractPages(nStart,nEnd,nFileName);
    this.deletePages(nStart,nEnd);

    lastPage = (nStart - 1);
    i--;  
  };

  this.closeDoc(true);

};



Thomas D. Greer
Providing PostScript & PDF
Training, Development & Consulting
 
Wow, that was fast!

First of all thanks a lot for your reply and for the code you gave, for now I am still looking for a tool that is more user end and more automated and I hope that there is one out there (if I find one I will let you guys know), if I will not find one I will defiantly use this.

Thanks again.

 
Hey guys,

I found many software that could do this, but the best I found for this is one that has been made just for this it is called “PDF Split File” makers are


Thanks again for everybody

Have a wonderful day

Joel Braver
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top