Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

OCR to get data in VB.

Status
Not open for further replies.

rssql

Programmer
Jul 22, 2002
87
0
0
US
I'm trying to find a way to use a JPG or other image and OCR code in VB.NET.
I want to get data from the image which will have a part number in a specific location.
I want this part number placed in a VB variable so I can use it.

any ideas?
 
Using your friends (yahoo, google, ask, answers, bing) you should be able to find several different, good to excellent, ocr engines that will work with WIA (See M$ for more details on WIA).



Good Luck

 
They are out there. The company I got laid off from last summer used a couple of them (Don't remember the names tho). They aren't excatly cheap and you'll go through a lot of trial and error in order to get good, reliable data.
 

Hi - I did the same thing - recognising invoice numbers on PDF's & then storing them away.

I got good results using Tessaract. It's open source (used to be an HP research thing, releaassed to open source, looked after by Google). Google tessaract


I used it as a stand alone app - passed a trimmed tif to it & read the result text file. I believe there is an intergrated .NET version of it.

Some words of wisdom. By default - Tessaract only recognises basic text in a basic font. You can train it to recognise other fonts.

For me - I did some experimenting & then changed our default invoice to print the invoice number in a basic font. I then trimmed a section of the PDF - saved it as a tif to present to tessaract.

With a successful result - I stored the PDF away. With an unsuccesful result - e-mail the adnministrator the pdf & tell him to manually look at it.

We do 20,000 printed invoices a month. In 18 months - not one has been e-mailed - and so far - when we reference the stored invoices - they are all named correctly.


Also - did you know there was an OCR tool that came with Office 2003 & 2007 - AND did you know as long as you own the relevant version of Office on the PC you run your code on - it's license legal.

It works very well - fully automated - aligns the image, auto adjust the image for best scan results. You can use it from within .NET code. I got demo code from the Code Project web site & went from there.

I got the results I needed for this project using Tessaract - but the MS Office OCR tool was very interesting...


Hope my rambuling helps... :)


Regards

Paul
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top