Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

parse(?) some info from a html page 1

Status
Not open for further replies.

RodP

Programmer
Jan 9, 2001
109
GB
Hi Everyone,

Can you help / advise as I'm not too sure what search terms / terminology to use when trying to find a program / some script to help me.

I have a website:
in which I want to grab (parse?) the data in the table and save / append it to a csv file.

I found a few programs but they seem to use perl but unfortunately I don't have perl and can't install it on this PC. Do you think there is something in VB Script out there which could grab this little table and add it to a simple csv file?

I'd want to cycle through a number of pages (all the same format) where by just the product number would change in teh URL. I've looking into Java Script and had a go using the Sarissa modules but I've never used them before and am not sure how to go about buidling a filter.

Hope you can help / set me off in the right direction.

Many thanks

RodP
 
I think it might depend on how you have coded the pages. Do some searches on the term "InnerHTML" for examples of how to grab the contents of a web page.

I hope you find this post helpful.

Regards,

Mark

Check out my scripting solutions at
Work SMARTER not HARDER. The Spider's Parlor's Admin Script Pack is a collection of Administrative scripts designed to make IT Administration easier! Save time, get more work done, get the Admin Script Pack.
 
You can probably use InternetExplorer.Application object...like markdmac said though...you will need to look at the code in your pages and see if there is something in common...preferably unique tag IDs you can use to get the values or a pattern of some sort.

--------------------------------------------------------------------------------
dm4ever
My philosophy: K.I.S.S - Keep It Simple Stupid
 
>I have a website ... in which I want to grab (parse?) the data in the table and save / append it to a csv file.
You've a website and you want to grab the data of your website... well!

I can show you the general sketch how to do it for one website of the design you cited. The exact format of the data need further work on to further extract only the bare price "xxp" (pence) - that is possible if you know further the design principle of the page which will not remain always the same, hence, you've always to monitor every time you need data.
[tt]
'You input parameters
sfile = "d:\xyz\out.txt" 'your csv file name
surl = "

'details of web pages' design
set odic=createobject("scripting.dictionary")
with odic
.add "Price1",""
.add "Offer1",""
.add "Price2",""
.add "Offer2",""
.add "Price3",""
.add "Offer3",""
end with
akeys=odic.keys


set fso=createobject("scripting.filesystemobject")
set ofile=fso.createtextfile(sfile,2,true)

set oie=createobject("internetexplorer.application")

[blue]'for more surl, you loop the functional block below over the collection
with oie
.navigate surl
do while .readystate<>4 : wscript.sleep 50 : loop
'.visible=true 'uncomment in testing stage
end with

for each skey in akeys
odic(skey)=oie.document.getElementById(skey).innerHTML
next

aitems=odic.items
ofile.writeline join(aitems,",")

'functional block to be looped upon ends here[/blue]

ofile.close
set ofile=nothing
set fso=nothing

oie.quit
set oie=nothing

wscript.echo "done"
[/tt]
Whether you are able to profit from the above depends on your own further effort. If you do not know everyone of the technical consideration reflected in the above script, we cannot hold your hand to step through it.
 
Hi Tsuji,

Thanks very much for the help and the code, that's really helped me. This is the code I have so far. I've put a loop in it and now need to put some error checking as sometimes the web addresses used bring up an unavailable / error page. This is a little sporadic and so I wanted the page to be retried 5 times before moving onto the next page. I used 'goto try_again' in the coding but this throws up a 'Expected Statement' error. As I'm using a normal text file to create this script I am finding it a little hard to debug. Can you (or anyone) see what I'm doing wrong (or perhaps in VBScript you can't use GoTo?!??) ???

Many thanks in advance

RodP

Code:
'VB script to capture prices of products from mysupermarket.co.uk website and create a csv file

'You input parameters
sfile = "C:\data\mysupermarket1_files\out.txt"    'your csv file name
'surl = "[URL unfurl="true"]http://www.mysupermarket.co.uk/Shopping/ProductDetails.aspx?Product=11498&amp;Store=1"[/URL]

'surl1 = "C:\data\mysupermarket"
'surl1 = "[URL unfurl="true"]http://www.mysupermarket.co.uk/Shopping/ProductDetails.aspx?Product="[/URL]
surl1 = "C:\data\mysupermarket1_files\error"

surlstart = 1
surlend = 2

'surl2 = ".htm"
'surl2 = "&amp;Store=1"
surl2 = ".htm"

'details of web pages' design
set odic=createobject("scripting.dictionary")
with odic

    .add "Title",""
    .add "LblProductName",""
    .add "Shop1",""
    .add "Price1",""
    .add "Offer1",""
    .add "Shop2",""
    .add "Price2",""
    .add "Offer2",""
    .add "Shop3",""
    .add "Price3",""
    .add "Offer3",""
end with
akeys=odic.keys

set fso=createobject("scripting.filesystemobject")
set ofile=fso.createtextfile(sfile,2,true)

set oie=createobject("internetexplorer.application")

'for more surl, you loop the functional block below over the collection


for surlcounter = surlstart to surlend
	vtry_again = 0
	surl = surl1 & surlcounter & surl2

try_again:

	with oie
	    .navigate surl
	    do while .readystate<>4 : wscript.sleep 1000 : loop
	    .visible=true    'uncomment in testing stage
	end with

'check for error page and rety loading page 5 times

set varTitle = oie.document.all.tag("TITLE")

	if VarTitle.innerHTML = "mySupermarket - Error Page" And vtry_again < 5 then
	msgbox "error"
	wscript.sleep 1000
	vtry_again = vtry_again + 1
	goto try_again
	end if

if VarTitle.innerHTML = "mySupermarket - Error Page" And vtry_again = 5 then
	ofile.writeline surlcounter & "," & "Error fetching this product: " & surl
	goto next_surl
end if

	for each skey in akeys

	odic(skey)=oie.document.getElementById(skey).innerHTML

	'msgbox odic(skey)

	'tidy up odic(skey) string

	if instr(odic(skey),"<") > 0 then
		odic(skey) = replace(odic(skey),"<B>","")
		odic(skey) = replace(odic(skey),"</B>","")
	'msgbox odic(skey)
	end if

	next

aitems=odic.items
ofile.writeline surlcounter & "," & join(aitems,",")

next_surl:

next

'functional block to be looped upon ends here

ofile.close
set ofile=nothing
set fso=nothing

oie.quit
set oie=nothing

wscript.echo "done"
 
[tt]'VB script to capture prices of products from mysupermarket.co.uk website and create a csv file

'You input parameters
sfile = "C:\data\mysupermarket1_files\out.txt" 'your csv file name
'surl = "
'surl1 = "C:\data\mysupermarket"
'surl1 = "surl1 = "C:\data\mysupermarket1_files\error"

surlstart = 1
surlend = 2

'surl2 = ".htm"
'surl2 = "&amp;Store=1"
surl2 = ".htm"

'details of web pages' design
set odic=createobject("scripting.dictionary")
with odic

.add "Title",""
.add "LblProductName",""
.add "Shop1",""
.add "Price1",""
.add "Offer1",""
.add "Shop2",""
.add "Price2",""
.add "Offer2",""
.add "Shop3",""
.add "Price3",""
.add "Offer3",""
end with
akeys=odic.keys

set fso=createobject("scripting.filesystemobject")
set ofile=fso.createtextfile(sfile,2,true)

set oie=createobject("internetexplorer.application")

'for more surl, you loop the functional block below over the collection


for surlcounter = surlstart to surlend
vtry_again = 0
surl = surl1 & surlcounter & surl2

[red]'[/red]try_again:
[blue]bup=false
do while (not bup) and vtry_again<5 [/blue]

with oie
.navigate surl
do while .readystate<>4 : wscript.sleep 1000 : loop
.visible=true 'uncomment in testing stage
end with

'check for error page and rety loading page 5 times

set varTitle = oie.document.all.tag("TITLE")

[red]'[/red]if VarTitle.innerHTML = "mySupermarket - Error Page" And vtry_again < 5 then
[blue]'This checking is a bit fragile.[/blue]
[blue]if VarTitle.innerHTML = "mySupermarket - Error Page" then[/blue]
msgbox "error"
wscript.sleep 1000
vtry_again = vtry_again + 1
[red]'[/red]goto try_again
[blue]else
bup=true
exit do[/blue]
end if
[blue]loop[/blue]

[red]'[/red]if VarTitle.innerHTML = "mySupermarket - Error Page" And vtry_again = 5 then
[blue]if not bup then[/blue]
ofile.writeline surlcounter & "," & "Error fetching this product: " & surl
[red]'[/red]goto next_surl
[red]'[/red]end if
[blue]else[/blue]

for each skey in akeys

odic(skey)=oie.document.getElementById(skey).innerHTML

'msgbox odic(skey)

'tidy up odic(skey) string

if instr(odic(skey),"<") > 0 then
odic(skey) = replace(odic(skey),"<B>","")
odic(skey) = replace(odic(skey),"</B>","")
'msgbox odic(skey)
end if

next

aitems=odic.items
ofile.writeline surlcounter & "," & join(aitems,",")

[red]'[/red]next_surl:
[blue]end if[/blue]

next

'functional block to be looped upon ends here

ofile.close
set ofile=nothing
set fso=nothing

oie.quit
set oie=nothing

wscript.echo "done"
[/tt]
 
Hi tsuji,

Thanks for the alternative method - never really used while and loop before. I assume then that goto doesn't really work in vbscript?

I've amended the code but there's one other difference between running the code in excel and vbscript.

Code:
set varTitle = oie.document.all.tag("TITLE")

An error occurs at this line saying this object doesn't support this property or method: 'oie.document.all.tag'. it works fine in excel. I tried looking around for such examples to grab title info in vbscript but have failed so far.

Is there a better way to get the title?

Many thanks and here's a star for all your help so far. :)

RodP
 
[1]
>set varTitle = oie.document.all.tag("TITLE")
[tt]set varTitle = oie.document.all.tag[highlight]s[/highlight]("TITLE")[/tt]
[2]
>if VarTitle.innerHTML = "mySupermarket - Error Page" then
[tt]if VarTitle[blue](0)[/blue].innerHTML = "mySupermarket - Error Page" then[/tt]
[3] The default error page on title is a very much localized, hence, it is not applicable anywhere except English version.
[4] In case one worries about the very existence of title tag in case error occurs, one has to check the length not being equal to zero before proceeding further... but that may narrate too far off.
 
Hi Tsuji

Thanks very much for this. in a word, doh!

I also noticed that I needed to get rid of the line

Code:
.add "Title",""

One more thing, I've made the ie object invisible but whilst running, the focus on whatever window i'm working on is lost. I have to keep alt+tab'ing back into it.

Can you suggest a way round this?

Many thanks

RodP
 
When you acquire the functionality in an application, the purpose is to establish the csv file. All the interactives (.visible, msgbox etc) are anecdotal. By commenting them all out, focus problem should it not be eliminated? (Besides, window focus losing and/or stealing are also os dependent.) If for some reason the application does lose focus due to that functionality, maybe you may try to put that in a vbs (say makecsv.vbs) and launch it through the wshshell.run with hidden window and batch switch.
[tt]
createobject("wscript.shell").run "wscript.exe //nologo //b makecsv.vbs",0,true
[/tt]
where 0 for hidden window and true for synchronous processing (surrounding control until the script is complete.
 
Fantastic idea,

Thanks again for your all your help

RodP
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top