Assuming you already know how to open and read a binary file and search input lines for text.
1) Look for the page tree object. Search the file for /Pages
2) Search backwards for obj<< and capture everything up to >>
3) Look for the tokens. These begin with / and end with either a space, [ or another token.
4) If you have tokens /Count, /Type, /Pages, /Kids and nothing else, you have the page tree object.
5) The number after /Count is the number of pages.
For instance, if you get
Code:
obj<</Count 251/Type/Pages/Kids[...>>
there are 251 pages.
The tokens may be in any order. You could get
Code:
obj<<
/Type /Pages
/Kids [...]
/Count 20
>>
endobj
on non-Adobe generated PDF. If you get /Parent in the token set, it is not the page tree object. There will be lots of these but they normally occur after the page tree object.
You can get the PDF specification from
Note that there are 8 different versions but for finding out the number of pages, any one will do. Version 1.6 (for Adobe 7) is about 9Mb. Version 1.7 (for Adobe 8) is about 32Mb. I haven't looked at 1.7 yet. Probably a lot of pictures.