Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

use office 2003 automation to ocr image to text easily

Status
Not open for further replies.
Sep 17, 2001
673
US
I had to figure out how to ocr an image cheaply (if you have office 2003 already). Since we already had Office 2003 I could use the OCR engine in office. This code converts base64 to file and ocr's it to text. You can do tiff, jpg, and some others.


DEFINE CLASS imagetools as custom

PROCEDURE img2txt
LPARAMETERS lcBase64String as STRING

lcImage = STRCONV(lcBase64String,14)
IF FILE("tmpimg.jpg",1)

ERASE tmpimg.jpg

ENDIF

STRTOFILE(lcImage,"tmpimg.jpg",0)

miDoc = NEWOBJECT('MODI.Document')
miDoc.Create("tmpimg.jpg")
midoc.Images(0).ocr
mitext = midoc.Images(0).layout
lcText = mitext.Text
midoc.close()
RELEASE miDoc
IF FILE("tmpimg.jpg",1)

ERASE tmpimg.jpg

ENDIF

RETURN lcText
ENDPROC

ENDDEFINE

Regards,

Rob
 
This looks interesting. Are you reading a directory of imcoming faxes and storing them in a database?

Mike Lewis has some interesting twain aquire routines that will work with camra or scanner on his website.
 
We are using foxpro to convert the image to base64 then store in Mysql field. Then we extra back out via foxpro. We use STRCONV() to convert back and forth. While we have website, we are not using the image we are ocr'ing on the website. I will take a look at mike's routines. I would prefer to have some simple native foxpro code to OCR the image but the simplicity and effectiveness of this code seems the answer for now. I looked at Leadtools and others but they all want $600-$2000. Microsoft's product is just as good from what I see as Leadtools engine which is $2000. Other cheaper ocr's just don't OCR very accurately.

Regards,

Rob
 
Rob,

I was intrigued by your post and thought I would try it out.

I scanned in a simple document, saving it as a tiff.

When I ran your code, I got an error
Not enough storage is available to complete this operation

The tiff file comes out at 60KB. I copied your code into a prg file, commenting out the DEFINE & PROCEDURE lines.

Can you suggest what's wrong?

Thanks,

Stewart
 
This code uses automation via Windows 2003/2007 document imaging which is under Start/Programs/Microsoft Office/Microsoft Office Tools/Microsoft Document Imaging. If you don't find this, the code won't work. It is an option you can install from Microsoft Office 2003/2007, I don't know about other versions. Anyway try opening your tiff from the office tool manually to see if you get an error.

Regards,

Rob
 
If you want email me the tiff file by adding my tek-tips user name to (at symbol) yahoo.com. I won't literally type it out for spam reasons.

Regards,

Rob
 
Aha - the OCR recognition feature wasn't installed!

Hopefully once I get IT to do that, it'll work.

Thanks,

Stewart
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top