How do I get a PDF file page size and page count programatically?

rsevero · Jul 31, 2003

I need to get the page size of a PDF file programatically, with acrobat or ghostscript. How can I get it?

And it's page count?

I am running FreeBSD.

TIA,

Rodrigo Severo

tgreer · Jul 31, 2003

PDFs are organized as a hierarchy of objects. To get the number of pages, you want the page catalog dictionary.

Just open the file in a good text editor, and search for the /Pages entry. That will give you the number of the object, for example "19 0".

That tells you to search for "19 0 obj", which is an object definition followed by a dictionary. That dictionary will contain the /Count entry, which is your page count.

You can also search for the /MediaBox and /CropBox entries, which are followed by arrays. The arrays contain 4 numbers, representing the lower left to upper right coordinates, in PostScript points (72 points per inch), of the Media (paper size) and Crop (what size to trim the page down to).

So you can determine page count and page dimensions with basic file i/o.

Here are some snippets from a 6 page PDF , 8.5x11 inch paper:

Code:

29 0 obj<</Contents 38 0 R/Type/Page/Parent 19 0 R/Rotate 0/MediaBox[0 0 612 792]/CropBox[0 0 612 792]/Resources 30 0 R>>
endobj

Code:

19 0 obj<</Count 6/Kids[29 0 R 1 0 R 4 0 R 7 0 R 10 0 R 13 0 R]/Type/Pages>>
endobj

Thomas D. Greer

http://www.tgreer.com

rsevero · Jul 31, 2003

First of all, thanks for your answer.

As I need some automatic process, I might try some regex to find the info but I wonder if there isn't a more appropriate way to get this info. I.e., some code (possibly postscript) that would get this info from the PDF file.

As far as I know there are binary postscript files, aren't there binary PDF files? If this is true, the regex solution wouldn't work for the binary ones, I believe.

Rodrigo Severo

tgreer · Jul 31, 2003

PostScript code won't help you here, unless you want to use PostScript for file i/o.

PDFs can contain binary streams, yes that's true. That's irrelevant to what you're after.

I know of several C and .NET kits for creating PDFs programmatically, ie through calls, but not for extracting information out.

You can check here:

http://www.pdf-tools.com/en/index.html,

I haven't used their tools.

For direct PDF manipulation and data extraction, I use C# or VB to read the data out as I've suggested.

Thomas D. Greer

http://www.tgreer.com

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

How do I get a PDF file page size and page count programatically?

rsevero

Programmer

tgreer

Programmer

rsevero

Programmer

tgreer

Programmer

Similar threads

Part and Inventory Search

Sponsor