Internet explorer - Automation/Scripting 2

nvhuser · Dec 11, 2015

Hello everyone,

Yesterday I faced a difficulty and I would like that someone give me a very good piece of advice in order to never have this problem again. Here is the description:

I was working on the computer of my client using a webpage that he logged in (i.e., I was using his account). My task was really easy, though tedious, I needed to output images available in this webpage on internet explorer and save them properly in a given directory.
After 3 hours I was already tired of clicking into images and save them in this directory.

So, I am here to ask you if there is a possibility of performing the same task using a script, or any other way to automatize this boring process. Note that I am not allowed to install any program on his computer (like perl and others). It should be already available on a perfectly normal windows machine (or in internet explorer).

I though on Windows Powershell, but I am not sure if it is the most suitable for it.
Maybe HTML is the best way to do it. Please let me know your opinion about it.

Thank you in advance

ChrisHirst · Dec 11, 2015

File -> Save as ... -> Complete webpage -> (browse then OK)
(I think that menu progression is correct, as it's been a while since I used M$ Windows)

All the images should be in the folder (files_.....) where .... is the title element text content of the page you just saved, or the file name you provided.

Chris.

Indifference will be the downfall of mankind, but who cares?
Time flies like an arrow, however, fruit flies like a banana.
Webmaster Forum

nvhuser · Dec 11, 2015

Thank you for your reply Chris!

I agree with you, but if I want to save the images with a given name, probably it will not be the best method, or if I want to grab a info that is on a table always in the same position.
which language do you think is the most suitable to do it? I was researching a bit, and it seems that using javascript I will be able to do it.

Regards

ChrisHirst · Dec 11, 2015

There isn't one that you can use while viewing the URL in a browser, as javascript is not allowed to run, unless it is being requested from the same domain as the document is.

If you were on Linux you could copy and paste the URL into wget, and there is Wget for Windows

You could use

http://www.httrack.com/

to scrape the entire site or just the one URI, failing that you would have to write an 'image crawler' that retrieved the URLs from the src attributes then went off to the URI and saved the returned content data according to the MIME-Type in the response data.

Or you could look for a browser 'plugin' for what ever browser is going to be used, though, depending on the site, 'stealing' the images only might not be looked on too kindly by the site owner.

Chris.

Indifference will be the downfall of mankind, but who cares?
Time flies like an arrow, however, fruit flies like a banana.
Webmaster Forum

nvhuser · Dec 15, 2015

Hi Chris!

Thank you for your tip! Wget seems to be very powerful

Although I am wondering if there is any other language already installed on a normal windows machine that can perform the same task in a easy way. I am looking for something that either interact with internet explorer (maybe Java or HTML) or interact with the windows (maybe C/Powershell).

One time I used AutoHotKey script (I needed to install it) to perform a simple task of clicking from time to time in the same button, that it was always in the same position, every two minutes.

So, if I need to do it on internet explorer without installing a special software (like AHK, Wget), how could I do it? Which language would be the most suitable?

ChrisHirst · Dec 15, 2015

if I need to do it on internet explorer without installing a special software (like AHK, Wget), how could I do it?

You can't, browser security policies (same-origin policy) specifically preclude scripts from a different source accessing the object model of another document.
And even if you could get the site operator to add some javascript to the document for you, ... ... javascript running in an http context is not allowed to access the storage on the client machine to save the files.

Chris.

Indifference will be the downfall of mankind, but who cares?
Time flies like an arrow, however, fruit flies like a banana.
Webmaster Forum

nvhuser · Jan 7, 2016

Ok, I see... Sadly it seems that next time I need to spend another 3 hours exporting images

Thank you for your support anyway!

spamjim · Jan 7, 2016

This is an interesting discussion but how does this topic fit within copyright law?

Wouldn't it be easier to work with the publisher to get their content the way you want it ...or are you going through all this because you want to do something with the content that the publisher did not license you to do?

nvhuser · Jan 7, 2016

No, actually I want to extract the information that the publisher is providing.

Well, the main question is: Does the publisher allow the exportation with only one click or not? Since I can get exactly the same information just doing the same repetitive task for a couple of hours. So, I am wondering if it is intentional... They simply don't have a "export all" button. Is this laziness from the developers or they want that you to stay logged on, doing for each image "select image -> export"? Maybe they get money from the hours that you are using it, but I don't think so.

It is just a question of time, I am not extract any further information that I can't extract through the act of spending time working in the platform. That is why I would like to have a tool to do it automatically.

ggriffit · Jan 7, 2016

have you spoken to the publisher ? they may have a Web Service / REST API that you can use or at the very least they could raise it as a Feature Request

Greg Griffiths
Livelink Certified Developer & ECM Global Star Champion 2005 & 2006

http://www.greggriffiths.org/livelink/

spamjim · Jan 7, 2016

We're still speaking vaguely so here is a general answer.

The stuff you find on the internet is not inherently public domain.

If what you are doing falls within the legal definition of fair use, then there is no copyright issue. But the sketchy description makes me suspect you and/or the client is skirting around the intent of the publisher.

Internet publishers offer content in a specific format, such as in a web page layout with ads. The presentation of ads is how the publisher can afford to offer a service to you. If you go about accessing or copying content without the ads, you are swiping content in a manner the publisher did not intend...like going to a screening of Star Wars and using the Periscope app to share the movie with random strangers on the internet.

Perhaps if you post a link to the site you are scraping, we can point you to the "terms of use" or "terms of service" document on it.

spamjim · Jan 7, 2016

I was working on the computer of my client using a webpage that he logged in (i.e., I was using his account).

You're already indicating that you are skirting the publisher's intent if you are sharing an account. I'm assuming this is a paid account (or otherwise secured account) or else you would create your own account with the publisher. Your client has probably already violated their terms of service by sharing their account info.

nvhuser · Jan 20, 2016

Thank you guys, I spoke with the company and they inserted a feature that allows me to export it with only a couple of clicks. Thank you for your support!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Internet explorer - Automation/Scripting 2

nvhuser

Programmer

ChrisHirst

IS-IT--Management

nvhuser

Programmer

ChrisHirst

IS-IT--Management

nvhuser

Programmer

ChrisHirst

IS-IT--Management

nvhuser

Programmer

spamjim

Instructor

nvhuser

Programmer

ggriffit

Programmer

spamjim

Instructor

spamjim

Instructor

nvhuser

Programmer

Similar threads

Part and Inventory Search

Sponsor