Information on PDF Index Format?

rfedyk · Dec 5, 2002

PDF files that have been indexed create a .PDX file which is just the starting point in the indexing info chain. The real info is held in the .DDD and .DID files that are saved in the "parts" subdirectory.

I have a project which would work much better if I could read those files directly. Does anyone have any information on the format of those files?

Thanks
Roger

Murgle · Dec 9, 2002

Did a quick google, and this popped up - don't know if it helps.

The .ddd files contain token data (usernames, filenames, object handles, etc. - data that does not need stemming during search). The .did file contains stream data (data that needs stemming during search).

http://www.e-quip.com.au/docushare/dscgi/ds.py/Get/File-1129/Docushare_Technical_Notes.doc

Ahhhhh, I see you have a machine that goes Bing!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Information on PDF Index Format?

rfedyk

Programmer

Murgle

Technical User

Similar threads

Part and Inventory Search

Sponsor