Python saving images from a webpage

tfhwargt3 · Sep 19, 2006

This is my first post so hopefully it will go ok...

So I want to download an image to disk from a remote webpage by scraping
the page using python with help from the beautifulsoup package
(

http://www.crummy.com/software/BeautifulSoup/documentation.html)

So I made a simple web page containg a <p> and <img> tag seen below:

Code:

<html>
<head> 
<title> Python Pictures </title>
</head>
<body>
    <p id="first_p" align="center">
        Some text...
        <img src="Picture-1.jpg"/>
        <b> Some more text </b>
    </p>
</body>
</html>

Then I wrote a document that kind-of parses the document, it is pretty much
hard coded because I am still learning, but I am stumped on what to do once
I get to the <img> tag. How do extract that image and successfully
download it to my disk?

Here is my code and output.

Code:

import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("[URL unfurl="true"]http://www-static.cc.gatech.edu/~iceman3/Python/")[/URL]
soup = BeautifulSoup(page)
head = soup.contents[0]
print "head"
print head
print
print 'head.contents[3].contents[1]'
print head.contents[3].contents[1]
print
print 'head.contents[3].contents[1].contents[1]'
print head.contents[3].contents[1].contents[1]

OUTPUT:

Code:

head
<html>
<head>
<title> Python Pictures </title>
</head>
<body>
<p id="first_p" align="center">
        Some text...
    
        <img src="Picture-1.jpg" />
<b> Some more text </b>
</p>
</body>
</html>

head.contents[3].contents[1]
<p id="first_p" align="center">
        Some text...
    
        <img src="Picture-1.jpg" />
<b> Some more text </b>
</p>

head.contents[3].contents[1].contents[1]
<img src="Picture-1.jpg"/>

So as you can see, I get to the <img> tag finally! What do I do? I have
been googling this thing all day and have had no luck. I know python can do
it... I must be searching wrong or just overlooking something.

ericbrunson · Oct 6, 2006

You have to go get the image with urlopen, just like your browser does.

You may want to look at the builtin htmlparser module to do this parsing for you. You would simply subclass the parser and define an action for "start_img()" in which you go get the image file an save it off somewhere.

dvska · Feb 7, 2007

(tested on Windows)
[tt]img_bin_data = urllib2.urlopen(descr_img_url).read()
new_img_file = open(new_img_file_name, 'wb')
new_img_file.write(img_bin_data)
new_img_file.close()[/tt]

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Python saving images from a webpage

tfhwargt3

Programmer

ericbrunson

Technical User

dvska

Programmer

Similar threads

Part and Inventory Search

Sponsor