Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parse a PDF file to determine the page count 1

Status
Not open for further replies.

ilektronik

Programmer
Oct 13, 2003
7
US
Hello all -

I am a Java developer who recently was plunged by management into the world of .NET. The last six months were spent coding C# projects (I have become a fan of C# since). I am now on a VB.NET project and have zero VB experience of any kind. Here is the question:

I have a directory with multiple PDF files listed in them. There is a web application that allows users to FTP these PDF files from point A to point B. Simple stuff. Before a user sends these files anywhere, I have a display screen that displays row by row a list of the PDF's in that directory with some information on the PDF (file location, modified date, etc.) One of the pieces of information that I need to display is the number of pages that exist within the PDF.

I tried opening a random PDF in notepad and found that there is a line that has "/COUNT" that signifies the number of pages within that PDF. The question is what is the best way to simply read that page count and export it as a local variable. Thanks in advance!!
 
I'd use the System.IO.BinaryReader class from the framework. Just open your PDF and read through it until you find what you want.

C# and VB.NET only differ in their syntax in this respect, the functionality is in the BinaryReader class.

If you're really ambitious, you could read up on the PDF spec, and then traverse the XREF section(s) of the PDF to find the specific byte offset of the relevant object, and then jump right to it.

It works like this: most PDFs will contain an object that looks like this:

Code:
20 0 obj<</Pages 16 0 R/Type/Catalog/Metadata 17 0 R>>
endobj

This tells you that the /Pages dictionary (which is the parent object of the /Count entry) is contained in object defintion 16 0.

Find the xref dictionary, find the 16th entry (the list is 0 based, so find the 17th entry). The long 10-digit is the byte offset of that object. You could read that number of bytes with the .ReadBytes(int) method, then .ReadChars() to pull in the /Count entry.

Thomas D. Greer
 
Thanks so much that is exactly what I was looking for!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top