Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help with IE, not selecting the page properly

Status
Not open for further replies.

TinyNinja

Programmer
Oct 1, 2018
99
0
6
US
Hello!
This is the first time I can't get a website selected. I am trying to screen capture the whole page and then will filter through it. I open up IE, go to the website, let it load, then try to select all, copy the whole page and throw the info into a temp table but nothing gets populated. I have tried clicking on the IE page after the code ran and then trying running oie.ExecWB(17,1) & oie.ExecWB(12,1) and this time it will select everything. I have tried opening a random website and the ExecWB codes work just fine. I have done some searching but at a lost. I need this to be able to run by itself.

Does anyone know how to get the page selected without having to manually clicking the site in IE?

Code:
Select a
Create Table Sys(2023)+[\detailimport] (Import C(250))

oie = Createobject([internetexplorer.application])
oie.Visible = .T.

lnav=[[URL unfurl="true"]https://www.pricecharting.com/game/pokemon-scarlet-&-violet/lightning-energy-257[/URL]]
oie.Navigate(lnav)
_Cliptext=[]
WaitForIE()
oie.document.getElementbyId([game_search_box]).focus(1)
oie.document.getElementbyId([game_search_box]).value = "Spidops ex #19 Pokemon Scarlet & Violet"
oie.document.forms(0).submit(1)
WaitForIE()
*!*	INKEY(3)
oie.ExecWB(17,1)  **************************
oie.ExecWB(12,1)  **************************

Select a
Zap
Erase Tempz.txt
Strtofile(_Cliptext,[Tempz.txt],0)
Append From Tempz.txt Sdf
Set Deleted On
Delete For Len(Alltrim(Import))=0
Select detailimport
brow

PROCEDURE WaitForIE
  DO WHILE oIE.Busy OR oIE.ReadyState <> 4
    DOEVENTS
  ENDDO
ENDPROC
 
Not sure about this, but shouldn't the second argument to ExecWB be 0 rather than 1? In other words, instead of this:

Code:
oie.ExecWB(17,1)
oie.ExecWB(12,1)

do this:

Code:
oie.ExecWB(17,[highlight #FCE94F]0[/highlight])
oie.ExecWB(12,[highlight #FCE94F]0[/highlight])

At least, that's what I've always done.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
I just tried that and no dice. I have normally done it with the 1 option.
The 0 & 1 both have selected the webpage for me but I had to manually click on the IE and then re-run the ExecWB which then worked.
 
When you are at the page you want, why not do this?

Code:
oBody = oie.document.body
Erase Tempz.txt
Strtofile(oBody.innerText,[Tempz.txt],0)

Chriss
 
Besides, there are a lot better tools for web crawling.

And you find a lot of misinformation about legality of it in both aspects of what's allowed and what not. It's not true just because anything is available without having a user account,i.e. anybody can see that images, data, etc, it's public domain, for example. And by scraping a lot of pages you make yourself visible in their access logs. There are many ways to react to scrapers, if a site doesn't wnat them. Block them, once the same IP visits a lot of pages in a short time, or the most clever way I think: Give them content that's wrong. So don't be so sure just because google crawls the whole web and other web sites offer price comparisons, including the site you want to crawl, it's by definition legal to get at their data this way.

Chriss
 
Chris that solved my problem! Thank you!
I was unaware of that option under the document.
Now I seem to be having a problem with "Append From Tempz.txt Sdf" line, nothing gets thrown into the table, I'm not sure why since the text file gets populated with everything. Jeez SMH...

That's a fair point Chris. I just want to track prices so I don't need to do it manually and thought some coding could speed up the process.
 
TinyNinja said:
the text file gets populated with everything

Well, there you have the problem, the whole page text does not only have the lines with the rows of data you actually want. Of course it does not just append and split itself properly into the data you want, you still have to work on this harder before you have text that APPPEND can put into a DBF.

Chriss
 
If I run the code 1 line at a time then it will append into the table without a problem. If the code runs quickly/normally then it doesn't append into the DBF. I'm not sure why this is the case since whenever I done this in the past it works just fine.
 
oBody.innerText is the whole page, including all navigation buttons texts, etc. If what you want is only a pricelist, find the top level element like a <table> tag, and get its innerText.
This has nothing to do with speed, you always wait for the page to load, no matter if it's done one line at a time or all in one go.

Chriss
 
Chris I appreciate all the help you are giving, just no need to be rude here. If you are having a rough day then I get it and totally understand.

That makes sense that innertext takes everything. I found where the prices are that I need in the elements of the website so I will work at grabbing that info out. I was just pointing out the speed concern since it was weird that it would work one way and not the other.
 
Where was I rude? I was only pointing out what to do and what I think makes no difference whatsoever.

I don't know what code oyu ended up with overall, so I can't share your experience, sorry, but as your code always waits for the page to finish loading, what could be the difference in running it line by line and not? If there is a difference, I doubt it's about that only, unless you skipped waiting for the resultpage to load fully, which you do semiautomatically, if you run code line by line, as you're not as fast then.

Chriss
 
When I originally read this,

Chris Miller said:
Of course it does not just append and split itself properly into the data you want
This has nothing to do with speed
I took it in a rude tone. I was mistaken, sorry, my bad.

I find it weird if others aren't having the same problems as me. With the code, I'm going to try to figure out how to get the element ids out since I found it on the inspection page but I see no price info in the element. I'm also thinking of just opening the text file and deleting all the way down to the parts I need and then trying to append the file.

Random thought, would these Set commands cause any issues by chance? I place these at the top of my code.
Code:
Set Talk Off
Set Safety Off
Set Exclusive Off
Set Exact On
Set Escape On
Set Optimize On
Set Deleted On
Set View Off
 
Your settings don'T matter, but just think about what it means to append txt to a dbf.

The major intent is that the text is csv format, or tab delimited, when you select all text of a web page there's a lot of clutter before the actual rows of data in the text and afterwards. And you did, The ExecWB mcode were doing a Select All and Copy, weren't they? So you're just lucky if this works out with a simple append, no matter what exactly now did change with timings or not. It's just a lucky coincidence that all the clutter would not hinder you to get at the actual data.

This is just a well meant observation of what is generally wrong about your approach. It breaks one day or the other, when the website adds something to the clutter that's actually not of interest to you, it easily brings everything out of whack. I don't know what you got previously, but I do assume you have to throw out a lot of records after the append to get at the core data and remove the clutter. Why not do it first?

In your orignal code you have an inteesting line, that you may reuse for the purpose of finding the element of interest containing the actual data and getting rid of all the clutter:

Code:
oie.document.getElementbyId([game_search_box])
That fins an HTML element by it's ID "game_search_box", maybe you find an element that's where the data is in the innerText, and that will be what you can append easier.

Chriss
 
That's a good point about the append txt to dbf. I hadn't thought of it like that before. You're observation is correct about having to fix the code when the website changes. In my old job, I would have to update maybe once a month due to the website changing their look. I never had enough time to search for the elements that I needed and code it that way instead. I should have just sat down and fixed that old code to search and find what I needed through the website elements. Ehhhh you live and learn right.

That interesting line you speak of, I found on a forum and liked that it worked nicely. I used it to throw the values into the search box rather than the URL address.

Chris Miller said:
oBody = oie.document.body

Your line gave me the idea to create this to quickly find what I needed and assign it to variables.
Code:
lcBody=oie.document.body.Innertext
lcShort=SUBSTR(lcbody , AT([AAAA],lcbody) , AT([BBBB],lcbody) - AT([AAAA],lcbody)-4)
lcName=ALLTRIM(SUBSTR(lcShort, AT([ ], lcShort, 4 )+1 , AT([ZZZZ], lcShort) - AT([ ], lcShort, 4 )-4))
 
Try this routine to get all the tables from your site:

Code:
*----------------------------------------
* get html tables from url
*----------------------------------------
local owbf,tc,ntable,crows,ccells,coln,ccn,xmltable,tname,table,tablerow,cell

curl = "[URL unfurl="true"]https://www.pricecharting.com/game/pokemon-scarlet-&-violet/lightning-energy-257"[/URL]

owbf = Createobject("webbrowser")

With owbf.wbc
	.silent = .T.
	.Navigate(m.curl)
	Do While .readystate # 4 Or .busy
	Enddo
	tc=.Document.getelementsbytagname('table')
Endwith


ntable  = 0

For Each Table In tc

	Set Textmerge To Memvar Xmltable Noshow
	Set Textmerge Delimiters To "{{","}}"
	Set Textmerge On

\ <{{"table"}}>

	crows=Table.getelementsbytagname('tr')

	For Each tablerow In crows
	\<row>
		ccells=tablerow.getelementsbytagname('td')
		coln = 0
		For Each cell In ccells
			coln=coln+1
			ccn=Transform(coln,'@l 99')
			\<{{"col"+m.ccn}}>{{cell.InnerText}}</{{"col"+m.ccn}}>
		Endfor
	\</row>
	Endfor

\ </{{"table"}}>

	Set Textmerge Off
	Set Textmerge To
  set textmerge delimiters to

	ntable = m.ntable+1

	Xmltable = Strtran( m.xmltable,'&','&amp;')

	tname = 'Table_'+Transform(m.ntable)
	Xmltocursor( m.xmltable,m.tname,4)

	Browse Normal Font 'consolas',14 nowait

Endfor

sys(1500,"_mwi_cascade","_msm_windo")


********************************************

Define Class webbrowser As Form
	Add Object wbc As se2
Enddefine

Define Class se2 As OleControl
	OleClass ='shell.explorer.2'
Enddefine

Marco Plaza
@nfoxProject
 
TinaNinja,

Code:
lcShort=SUBSTR(lcbody , AT([AAAA],lcbody) , AT([BBBB],lcbody) - AT([AAAA],lcbody)-4)

T get the string between AAAA and BBBB there is a function STREXTRACT:
Code:
lcShort=STREXTRACT(lcbody , [AAAA], [BBBB],1,0)

mplaza, well, okay - kind of. What about the headers? I know, they would spoil the ability ot use XMLTOCUROSR and they are likely not valid fieldnames.

Tiny Ninja, for this specific site I'd rather look for everything inside tbody elements:
Code:
tc=.Document.getelementsbytagname('tbody')

And instead of mplazas code you then could also iterate all tc elements and get their innerText:
Code:
tc=.Document.getelementsbytagname('tbody')
For Each tbody in tc
   * make whatever with tbody.innerText
Endfor

It will depend per site, what's best to extract, if you know AAAA and BBBB for a site, that could also work out quite nicely. StrExtract also allows to keep AAAA and BBBB within the extracted portion, or extract from AAAA to the end of the text or only optionally need the end delimiter. See help on STREXTRACT.

Another approach would be taking in all tr HTML elements, no matter from which table or tbody and then sorting out which are the interesting table rows (tr), and respectively any html tag that could be a row of data. There's no general recipe working for any sites html or text. And, by the way, you could also download the html simply with an API function like URLDownloadToFile, or use XMLhttprequests and their xpath query abilities.

Chriss
 
mplaza that is awesome code!!! It grabs the prices and throws them into a table. I'll just need to play around on how to do a loop through it so I can throw the data into my records.

Chris I'm ashamed to say I never used STREXTRACT in the past and wish I knew of it sooner.

Chris I will try out your option too and try iterating through the InnerText to see which works out the easiest.

The API is nice but I really don't want to pay for it since this isn't a super serious thing I'm trying to make.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top