Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Internet explorer - Automation/Scripting 2

Status
Not open for further replies.

nvhuser

Programmer
Apr 10, 2015
48
0
0
DE

Hello everyone,

Yesterday I faced a difficulty and I would like that someone give me a very good piece of advice in order to never have this problem again. Here is the description:

I was working on the computer of my client using a webpage that he logged in (i.e., I was using his account). My task was really easy, though tedious, I needed to output images available in this webpage on internet explorer and save them properly in a given directory.
After 3 hours I was already tired of clicking into images and save them in this directory.

So, I am here to ask you if there is a possibility of performing the same task using a script, or any other way to automatize this boring process. Note that I am not allowed to install any program on his computer (like perl and others). It should be already available on a perfectly normal windows machine (or in internet explorer).

I though on Windows Powershell, but I am not sure if it is the most suitable for it.
Maybe HTML is the best way to do it. Please let me know your opinion about it.

Thank you in advance



 
File -> Save as ... -> Complete webpage -> (browse then OK)
(I think that menu progression is correct, as it's been a while since I used M$ Windows)

All the images should be in the folder (files_.....) where .... is the title element text content of the page you just saved, or the file name you provided.



Chris.

Indifference will be the downfall of mankind, but who cares?
Time flies like an arrow, however, fruit flies like a banana.
Webmaster Forum
 

Thank you for your reply Chris!

I agree with you, but if I want to save the images with a given name, probably it will not be the best method, or if I want to grab a info that is on a table always in the same position.
which language do you think is the most suitable to do it? I was researching a bit, and it seems that using javascript I will be able to do it.

Regards
 
There isn't one that you can use while viewing the URL in a browser, as javascript is not allowed to run, unless it is being requested from the same domain as the document is.

If you were on Linux you could copy and paste the URL into wget, and there is Wget for Windows

You could use to scrape the entire site or just the one URI, failing that you would have to write an 'image crawler' that retrieved the URLs from the src attributes then went off to the URI and saved the returned content data according to the MIME-Type in the response data.

Or you could look for a browser 'plugin' for what ever browser is going to be used, though, depending on the site, 'stealing' the images only might not be looked on too kindly by the site owner.

Chris.

Indifference will be the downfall of mankind, but who cares?
Time flies like an arrow, however, fruit flies like a banana.
Webmaster Forum
 

Hi Chris!

Thank you for your tip! Wget seems to be very powerful :)
Although I am wondering if there is any other language already installed on a normal windows machine that can perform the same task in a easy way. I am looking for something that either interact with internet explorer (maybe Java or HTML) or interact with the windows (maybe C/Powershell).

One time I used AutoHotKey script (I needed to install it) to perform a simple task of clicking from time to time in the same button, that it was always in the same position, every two minutes.

So, if I need to do it on internet explorer without installing a special software (like AHK, Wget), how could I do it? Which language would be the most suitable?
 
if I need to do it on internet explorer without installing a special software (like AHK, Wget), how could I do it?

You can't, browser security policies (same-origin policy) specifically preclude scripts from a different source accessing the object model of another document.
And even if you could get the site operator to add some javascript to the document for you, ... ... javascript running in an http context is not allowed to access the storage on the client machine to save the files.

Chris.

Indifference will be the downfall of mankind, but who cares?
Time flies like an arrow, however, fruit flies like a banana.
Webmaster Forum
 
Ok, I see... Sadly it seems that next time I need to spend another 3 hours exporting images :( Thank you for your support anyway!
 
This is an interesting discussion but how does this topic fit within copyright law?

Wouldn't it be easier to work with the publisher to get their content the way you want it ...or are you going through all this because you want to do something with the content that the publisher did not license you to do?
 

No, actually I want to extract the information that the publisher is providing.

Well, the main question is: Does the publisher allow the exportation with only one click or not? Since I can get exactly the same information just doing the same repetitive task for a couple of hours. So, I am wondering if it is intentional... They simply don't have a "export all" button. Is this laziness from the developers or they want that you to stay logged on, doing for each image "select image -> export"? Maybe they get money from the hours that you are using it, but I don't think so.

It is just a question of time, I am not extract any further information that I can't extract through the act of spending time working in the platform. That is why I would like to have a tool to do it automatically.





 
have you spoken to the publisher ? they may have a Web Service / REST API that you can use or at the very least they could raise it as a Feature Request

Greg Griffiths
Livelink Certified Developer & ECM Global Star Champion 2005 & 2006
 
We're still speaking vaguely so here is a general answer.

The stuff you find on the internet is not inherently public domain.

If what you are doing falls within the legal definition of fair use, then there is no copyright issue. But the sketchy description makes me suspect you and/or the client is skirting around the intent of the publisher.

Internet publishers offer content in a specific format, such as in a web page layout with ads. The presentation of ads is how the publisher can afford to offer a service to you. If you go about accessing or copying content without the ads, you are swiping content in a manner the publisher did not intend...like going to a screening of Star Wars and using the Periscope app to share the movie with random strangers on the internet.

Perhaps if you post a link to the site you are scraping, we can point you to the "terms of use" or "terms of service" document on it.
 
I was working on the computer of my client using a webpage that he logged in (i.e., I was using his account).

You're already indicating that you are skirting the publisher's intent if you are sharing an account. I'm assuming this is a paid account (or otherwise secured account) or else you would create your own account with the publisher. Your client has probably already violated their terms of service by sharing their account info.
 
Thank you guys, I spoke with the company and they inserted a feature that allows me to export it with only a couple of clicks. Thank you for your support!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top