Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Looking for a way to publish documents

Status
Not open for further replies.

mufka

ISP
Dec 18, 2000
587
US
I'm looking for ideas on a system that will allow me to take documents (scanned as tif or jpg) and post them to a web site in a format that is searchable. The document quality is low so I'd need apowerful OCR engine. Maybe something that converts the images to PDF to make them searchable but uses the images for online viewing.
 
Is there any chance you can have the documents scanned at higher resolution? Low res documents... I hate to tell you this, but to have them at all usable, you're gonna be retyping the entire contents. For example, most OCR software, even the most "powerful" will have trouble telling S from 5 and 1 from l if the document is low resolution.

Yes, I think you're going to have to go with PDF. The only other solution I can think of would have parallel unformatted text and scan files, which sounds like a maintenance headache to me. Acrobat has some built in indexing capabilities, but I suspect your needs exceed that. What about a single site Google search once you've got your PDFs uploaded?
 
Documents are at 600dpi. Using FineReader, I get acceptable text conversion. I think the following is what I need:

I have a file named 123.txt and one names 123.pdf. 123.txt has the plain text version of the file. 123.pdf has a nice viewable image of the file. Now I need something that will allow me to publish the files and search the 123.txt and output the 123.pdf. I'm thinking mySQL or something. Is it out there?
 
I vaguely remember an article teching you how to create some sort of a help desk/bug tracker app that would use a mysql table with a full text index on a text column that would return a file location pointer which would be used to create links that would allow people to view/download solutions from their web server.


--== Anything can go wrong. It's just a matter of how far wrong it will go till people think its right. ==--
 
I used to work for a scanner manufacturer and started thinking 100 dpi when you said "low res." Believe me, the software in the human eye and brain has trouble reading 100 dpi!

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top