Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

PDF to text converter for batch folder processing

KarenLloyd

Programmer
Nov 23, 2005
140
GB
Hello my Tek-Tips Gurus. It has been a while...

I could use your insight once again please.

I need to find a PDF to text converter that can be run on batches of files for the purpose of importing supplier invoices.

The objective is to automate import of text (invoice lines & values) from PDF invoices, where suppliers do not have an e-invoicing system. I will match points in the format to find a job number or PO No that links into the VFP data.

Of course, there will be differences according to supplier - headers / logos, reference sections formats, etc to handle and/or ignore - but by identifying key points in each document type, I hope to map each invoice format by source in order to pull in the data I need. (Starting with maybe 5-10 high volume suppliers)...

Do you have any experience / recommendations for PDF to Text converter software please?
Bearing in mind the following criteria:

-This cannot be an online / manual upload process
-There will be several hundred invoices to process each month
- It needs to be an executable process, shell (maybe) or command line (!Run) process
- I am still using VFP6 (limited resources meant I never managed to get VFP9 in time!)
- The client will want three or four licenses ( though a free evaluation period would be grand for me! )

The software solution doesn't need to be free, or even cheap - it just needs to do the job well.

Any suggestions please?

Thank you, in advance.
 
This will be a bit negative. Sorry in advance.
Suppliers will give you a full time job keeping up with changes to their system.

Forgive me, do not waste your time.
 
Hiya Griff! I know it's a big ask.

I have a customer hoping to save time on their staff typing invoices into the system & matching them to the jobs.

I'm hoping some firms will be able to provide spreadsheets/csv reports to support the billing, those I can deal with

But it doesn't hurt to ask.

Glad to see you're still [Smile]ing

Thanks for replying
 
I can do this in python. It would be a .exe. Essentially, it would work like this:

A supplier saves their version of invoice as a .pdf.
They launch the .exe, it reads the pdf file and parses out the sections.
The sections will likely be wrong the first scan, so the pdf owner interacts with the application to fine-tune it and confirm data points.
Fine tuning is saved against that pdf format name.
Then, from that point forward, that pdf format name will parse out the same info in the a .csv, a .xlsx or a sqlite3 database.

Let me know if you're interested and I'll get a sample .exe ready this weekend.

C.
 
Last edited:
I actually have a similar request on my to-do list. But I haven't tackled it yet because it's still a low priority.
Just every month in a year I need to convert a PDF file into a plain text file, which I can store in a MEMO field.
At the moment i just open the PDF with Chrome and CopyPaste it into my memofield.
 
Hello my Tek-Tips Gurus. It has been a while...

I could use your insight once again please.

I need to find a PDF to text converter that can be run on batches of files for the purpose of importing supplier invoices.

The objective is to automate import of text (invoice lines & values) from PDF invoices, where suppliers do not have an e-invoicing system. I will match points in the format to find a job number or PO No that links into the VFP data.

Of course, there will be differences according to supplier - headers / logos, reference sections formats, etc to handle and/or ignore - but by identifying key points in each document type, I hope to map each invoice format by source in order to pull in the data I need. (Starting with maybe 5-10 high volume suppliers)...

Do you have any experience / recommendations for PDF to Text converter software please?
Bearing in mind the following criteria:

-This cannot be an online / manual upload process
-There will be several hundred invoices to process each month
- It needs to be an executable process, shell (maybe) or command line (!Run) process
- I am still using VFP6 (limited resources meant I never managed to get VFP9 in time!)
- The client will want three or four licenses ( though a free evaluation period would be grand for me! )

The software solution doesn't need to be free, or even cheap - it just needs to do the job well.

Any suggestions please?

Thank you, in advance.
Hi, KarenLloyd

I discussed your question with a friend (Chinese). He is sure it can be solved in VFP6. However, I don't know how to establish a connection between you and him because the forum doesn't allow to leave an e-mail address. Do you have a good suggestion?
 
Xinjie, you can click on Karen Loyds user name and then click "start conversation" and tell her the contact mail, because that conversation is private between you and her only. Karen will see a red notification dot on the envelope icon in the top right.
 
Last edited:
xinjie, you can click on Karen Loyds user name and then click "start conversation" and tell her the contact mail, because that conversation is private between you and her only and Karen will see a red notification dot on the envelope icon in the top right.
Hi, Chris Miller
Thank you!
 
Karen,

Another possibility might be Adobe Acrobat Pro DC (64-bit). Using the Scan & OCR feature, it will convert a pdf to several formats including plain text. You would of course need to rearrange the text to fit your form. My sample test showed it to be pretty accurate.

Not sure how much, if any, you could do within VFP.

Steve
 

Part and Inventory Search

Sponsor

Back
Top