Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Get text from pdf

Status
Not open for further replies.

AlastairP

Technical User
Feb 8, 2011
286
AU
I seem to remember doing this before, I just can't remember how I did this or find the relevent code.

I would like to extract specific text from purchase orders in PDF format to create a register.
That way my users can just drag the pdf into the control and extract the relevent info.

So after filetostr, there must be another step perhaps to convert the string to a more readable format?

I have done this before to get proof of concept, but abandoned the idea because of all the different formats and layouts of the pdf files.
Now we have a specific PDF we create in our accounting program, so the layout is controlled.

 
Its a bit of a kludge method, but it works.
The downside being it may have to be altered for specific versions of Abobe Reader

Code:
lparameter lcFile



*open the file for viewing
oShell = CreateObject("WScript.Shell")
_cliptext=""

lcWindowTitle=  JUSTFNAME(lcfile) + " - Adobe Reader"
* the Exact Acrobat Window Title should go here, so check it manually first !!
oShell.Run(FULLPATH(lcfile))

lnsec=SECONDS()
DO WHILE NOT oShell.AppActivate(lcWindowTitle) and SECONDS()-lnsec <60
	WAIT WINDOW TIMEOUT 1.0 "Open "+lcfile
ENDDO
llok=.f.

IF oShell.AppActivate(lcWindowTitle)
	* Invoke Acrobat Reader TEXT tools, make sure, there are proper combinations of keys you may enter manually from keyboard to get the text
	oShell.Sendkeys("%El") && select all
	oShell.Sendkeys("%Ec") && copy selected
	* wait while selection goes to the clipboard
	lnsec=SECONDS()
	lnsize=-100
	DO WHILE (NOT lnsize#LEN(_cliptext) OR LEN(_cliptext)=0) and;
		SECONDS()-lnsec <60
		lnsize= LEN(_cliptext)
		WAIT WINDOW "Copying..." TIMEOUT 2
	ENDDO
	IF LEN(_cliptext)>0
		oShell.SendKeys("%Fx") && EXIT Acrobat reader
		llOK=.t.
	ENDIF
ENDIF

oShell =.null.

IF llok
	* do what you need to do with _cliptext

ENDIF

Return

Credit to Yuri Rubinov
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top