Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extracting HMTL data into Foxpro Form

Status
Not open for further replies.

MSiddeek

Programmer
Apr 14, 2019
15
LK
Hi experts,

I used the below code in a form command button to display a web page content then copy the content into an edit box, when the url is this does work. But for some other web sites it says member BODY does not evaluate to an object (specially when the url is a pdf file.)

Please help me to solve this problem.

*****************************
SET TALK OFF

LOCAL oInet

mc=ALLTRIM(thisform.url.Value)
lcURL = [&mc]

oInet=CREATEOBJECT("InternetExplorer.Application")
oInet.Navigate([&lcURL])

DO WHILE oInet.busy
wait "busy" window nowait timeout 1
ENDDO


thisform.edit2.value=oinet.document.body.InnerText
release oInet

**************************************************
 
Hi

You could test for the document having a body of type object before you try and take the InnerText.

Code:
if type("oinet.document.body") = "O"
  thisform.edit2.value=oinet.document.body.InnerText
else
  thisform.edit2.value="Unreadable"
endif

Why are you doing all that macro substitution?

Would this not work? oInet.Navigate(alltrim(thisform.url.value))

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.
 
Hi Griff,

I tried your code still I am getting the error for it works but for it throws and error message member BODY does not...
for online pdf files it says unreadable. Is there any way I can read online pdf file contents (my main object is reading a list of online pdf files and copy the text into the edit box.

I used the macro because the form refuse to navigate the url typed in the text box.

Regards MSiddeek.
 
specially when the url is a pdf file

Well, if the docuement is a PDF, it won't have a <body> tag, hence the error.

If your aim is to extract the text from a PDF, you will need to find some other way of doing it. Internet Explorer can dispaly PDF text, but it doesn't know what the text contains.

Mike




__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Just dabbling with Intellisence, I can see the following:
[tt]
oInet=CREATEOBJECT("InternetExplorer.Application")
oInet.Navigate(" < some PDF file >")
oPDF = oInet.Document[/tt]

oPDF has PIMs that will allow you to navigate the pages of the PDF, print it, change things like the zoom factor and the number of pages to view, etc. But nothing that will let you get at the contents of the PDF.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
You wait for busy being false. You have to wait for readystate being 4 (DONE): Besides your cascades of macro substitutions make no sense, but they introduce no problem.

Let's look at it:
Code:
SET TALK OFF

LOCAL loInet, lcURL

lcURL = "[URL unfurl="true"]www.yahoo.com"[/URL]  && ALLTRIM(thisform.url.Value)

oInet=CREATEOBJECT("InternetExplorer.Application")
oInet.Navigate(lcURL)

DO WHILE loInet.readystate<>4
? loInet.busy, loInet.reqadystate
ENDDO
? Left(loInet.document.body.innerText,160)+"..."

Works for me, and busy gets .T. after readystate becomes 4, so that alone also doesn't explain what you experience. I wouldn't guarantee checking busy is sufficient. The readystate speaks of the document you load and is the more relevant status.

If that doesn't work for you for some sites, are they perhaps blocked for you?

Bye, Olaf.

Olaf Doschke Software Engineering
 
Just to step back a bit.

Am I right in saying that there are two separate problems here:

1. How to prevent the "not an object" error; and

2. How to extract the contents of a PDF.

I can't reproduce the first problem. Olaf has given you some advice that might be useful.

As far as the contents of the PDF are concerned, you won't be able to display its contents even when you have solved the first problem, for the reasons I have stated. Instead, you could try this:

1. Drop a Microsoft Web Browser OLE control onto your form. Name it, say, oBrowser.

2. At the point at which you want to display the PDF: [tt]oBrowser.Navigate2(" < url of your pdf > ")[/tt]

The PDF should now appear within the control.

Whether this works or not will depend on a setting within Internet Explorer that determines whether the browser itself displays PDFs or whether it opens them in the default PDF viewer. I haven't used IE for years, so I can't say where that setting is; you will have to dig around.

NOTE: The above remark re Internet Explorer will apply even if you are using a different browser, or even if you have a version of Windows in which IE has been replaced by Edge. The Microsoft Web Browser control is a wrapper for IE, which is always present, even in Windows 10.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Mike / Olaf

As Mike mentioned in his last post, my main problem is copying the content from PDF. Actually I am in the process of dumping of about 102,000 records from online pdf result files saved in a web site, each PDF is one record, therefore I need to open all those files and copy them back to a table using a FoxPro form.

Yes Mike I did drop the Microsoft Web Browser OLE control into a form and navigated, the pdf is now opening within the form control.
but my problem is getting the content from the web control to a text / edit box.

MSiddeek
 
Olaf,

this works better:

Code:
SET TALK OFF

LOCAL loInet, lcURL

lcURL = "[URL unfurl="true"]www.yahoo.com"[/URL]  && ALLTRIM(thisform.url.Value)

loInet=CREATEOBJECT("InternetExplorer.Application")
loInet.Navigate(lcURL)

DO WHILE loInet.readystate<>4
? loInet.busy, loInet.readystate
ENDDO
 clea
? Left(loInet.document.body.innerText,160)+"..."

Regards,
Koen
 
Correct, I forgot to put th l everywhere.

MSiddeek, well, Mike is right about PDF, when a browser displays a PDF, it's not within the HTML DOM. You can forget your idea to get the PDF content from the DOM.

Bye, Olaf.



Olaf Doschke Software Engineering
 
my problem is getting the content from the web control to a text / edit box.

Why do you want to do that? Is it because you want the user to be able to edit the text? And then save it back to the PDF? If so, you would do better to look for a dedicated PDF-editing tool.

But if you simply want to display the text, the web browser control already does that for you. It is different from an edit box in as much as it also displays all the original formatting, which an edit box does not. It also lets you follow hyperlinks, which might or might not be desirable. But the main thing is that, when displaying a PDF, the web browser control is to all intents and purposes read-only.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Olaf

Yes these pdf test results are belongs one of my client, to whom I am developing a new sql back end database application, he needs his old data to be incorporated into this new application. The main problem is that my client has access to the pdf files and not for the cloud database. The previous developer cannot be traced at all, who has the password for cloud database.

Mike

Presently I do have a table containing web url of these pdf files ie What I did was, I developed a small project and created a form to fetch the records from navigated pdf files
below I attached the form which I created

each CR I have treated as in the edit box as one field and filling the respective fields by pressing the get record button then the save button replaces the fields into the table and advance the record by one and place the url on the url field and the get pdf button load the pdf file from web then do the manual copy paste from web content to the edit box.

MSiddeek.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top