Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Python saving images from a webpage

Status
Not open for further replies.

tfhwargt3

Programmer
Sep 19, 2006
61
0
0
US
This is my first post so hopefully it will go ok...

So I want to download an image to disk from a remote webpage by scraping
the page using python with help from the beautifulsoup package
(
So I made a simple web page containg a <p> and <img> tag seen below:

Code:
<html>
<head> 
<title> Python Pictures </title>
</head>
<body>
    <p id="first_p" align="center">
        Some text...
        <img src="Picture-1.jpg"/>
        <b> Some more text </b>
    </p>
</body>
</html>

Then I wrote a document that kind-of parses the document, it is pretty much
hard coded because I am still learning, but I am stumped on what to do once
I get to the <img> tag. How do extract that image and successfully
download it to my disk?

Here is my code and output.

Code:
import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("[URL unfurl="true"]http://www-static.cc.gatech.edu/~iceman3/Python/")[/URL]
soup = BeautifulSoup(page)
head = soup.contents[0]
print "head"
print head
print
print 'head.contents[3].contents[1]'
print head.contents[3].contents[1]
print
print 'head.contents[3].contents[1].contents[1]'
print head.contents[3].contents[1].contents[1]

OUTPUT:

Code:
head
<html>
<head>
<title> Python Pictures </title>
</head>
<body>
<p id="first_p" align="center">
        Some text...
    
        <img src="Picture-1.jpg" />
<b> Some more text </b>
</p>
</body>
</html>

head.contents[3].contents[1]
<p id="first_p" align="center">
        Some text...
    
        <img src="Picture-1.jpg" />
<b> Some more text </b>
</p>

head.contents[3].contents[1].contents[1]
<img src="Picture-1.jpg"/>

So as you can see, I get to the <img> tag finally! What do I do? I have
been googling this thing all day and have had no luck. I know python can do
it... I must be searching wrong or just overlooking something.
 
You have to go get the image with urlopen, just like your browser does.

You may want to look at the builtin htmlparser module to do this parsing for you. You would simply subclass the parser and define an action for "start_img()" in which you go get the image file an save it off somewhere.
 
(tested on Windows)
[tt]img_bin_data = urllib2.urlopen(descr_img_url).read()
new_img_file = open(new_img_file_name, 'wb')
new_img_file.write(img_bin_data)
new_img_file.close()[/tt]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top