Just wondering if there is a PHP method for uploading a document (say a Word doc or a PDF), saving this somehow (mySQL) and then making the contents of the document searchable?
upload - easy
store in the db - easy
search - hard !. Remember that all docs have there own formats (assduming opendoc etc arn't an option) PDF is very differecnt from word as word is differexnt from excel. You might be able to incorparate something like dtsearch (
Personally I prefer to store only data in the database. An entire document is not data.
I would use [tt]catdoc[/tt] to extract the plain text content from the .doc. Then either store that plain text in the database, or build a word table and word-file link table with weights.
Of course, the .doc would be stored in the file system and the database would contain its path.
Thanks.
I understand your responses and I know there is no single solution really.
Ideally, this would be a data capture form where I stored data in the database.
Its just a pipe-dream right now so I'm not looking to implement anything right now.
i have just (about 4 days ago) started using a pdf text extraction solution (php) because i needed to extract certain text (rather than search). Equally I am also working on implementing an OCR solution for imaged pdf's for the same reason. If you have a linux install with root access then both solutions are very very easy to implement.
i will be expanding the solution to cater for mTiff and docx too.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.