Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Please help with my script to count pages in a PDF file

Status
Not open for further replies.

Onenguyen

Technical User
Oct 12, 2010
19
US
Hi All!

I have a script that counts the pages in a PDF. I come across a lot of jobs were there are numerous subfolders within a folder and I want the script to be able to give me counts off all the PDF files, including the subfolders. Right now I have to place the script in each subfolder, run it, and combine the text file to get the total number of pages for the whole folder. Is there anyway I can modify the script to go into the subfolders and give me a complete page count of all the PDFs?

Also, is there a way to modify the script to give me a complete page count of all the PDFs?

Thanks for your help

Script:

'File: pdfpagecount.vbs
' Purpose: count pages in pdf file in folder
Const OPEN_FILE_FOR_READING = 1

Set gFso = WScript.CreateObject("Scripting.FileSystemObject")
Set gShell = WScript.CreateObject ("WSCript.shell")
Set gNetwork = Wscript.CreateObject("WScript.Network")

directory="."
set base=gFso.getFolder(directory)
call listPDFFile(base)

Function ReadAllTextFile(filespec)
Const ForReading = 1, ForWriting = 2
Dim f
Set f = gFso_OpenTextFile(filespec, ForReading)
ReadAllTextFile = f.ReadAll
End Function

function countPage(sString)
Dim regEx, Match, Matches, counter, sPattern
sPattern = "/Type\s*/Page[^s]" ' capture PDF page count
counter = 0

Set regEx = New RegExp ' Create a regular expression.
regEx.Pattern = sPattern ' Set pattern "^rem".
regEx.IgnoreCase = True ' Set case insensitivity.
regEx.Global = True ' Set global applicability.
set Matches = regEx.Execute(sString) ' Execute search.
For Each Match in Matches ' Iterate Matches collection.
counter = counter + 1
Next
if counter = 0 then
counter = 1
end if
countPage = counter
End Function

sub listPDFFile(grp)
Set pf = gFso.CreateTextFile("pagecount.csv", True)
for each file in grp.files
if (".pdf" = lcase(right(file,4))) then
larray = ReadAllTextFile(file)
pages = countPage(larray)
pf.WriteLine(pages)
end if
next
pf.Close
end sub
 
I would take another look at the regular expression. I tested it on various pdf files I have and on some it reported the correct number of pages, but on others it did not.

Are you generating the pdf files or are they coming in from external sources? If you are generating all of them then the format should be pretty consistent. If the files are coming from various sources, different pdf authoring packages may format things a little differently. The differences may require a change to the regex pattern.
 
throw in some wscript.echo's, have it echo out the name of the pdf file it is looking at and the count of pages, you could even write this a log file, it may point you to which pdf's are causing problems?

I Hear, I Forget
I See, I Remember
I Do, I Understand

Ronald McDonald
 
Blah. I'm just going to download a program to do this. Thanks for all your help guys!
 
[0] That's the right move.

[1] Use fso to read and regex to match thereafter is bound to cause problem, in general, for every possible conceivable pdf.

[2] You can download the absolutely free pdf toolkit (I have 1.4). You use it on this command line with system32\find.exe.
[tt]
C:\path>pdftk.exe c:\xyz\yourpdf.pdf dump_data | find.exe "NumberOfPages"
[/tt]
[3] The echo back would look neatly like this:

[tt]NumberOfPages: 123[/tt]

[4] You use wshshell's exec to parse out the stdout for "123" data. That would be the correct page count.

[5] The above would be a freeware solution. It does not exclude that there be some free ActiveX component for even a more seamless integrating with the vbs. I have not looked hard.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top