This is my first post so hopefully it will go ok...
So I want to download an image to disk from a remote webpage by scraping
the page using python with help from the beautifulsoup package
(
So I made a simple web page containg a <p> and <img> tag seen below:
Then I wrote a document that kind-of parses the document, it is pretty much
hard coded because I am still learning, but I am stumped on what to do once
I get to the <img> tag. How do extract that image and successfully
download it to my disk?
Here is my code and output.
OUTPUT:
So as you can see, I get to the <img> tag finally! What do I do? I have
been googling this thing all day and have had no luck. I know python can do
it... I must be searching wrong or just overlooking something.
So I want to download an image to disk from a remote webpage by scraping
the page using python with help from the beautifulsoup package
(
So I made a simple web page containg a <p> and <img> tag seen below:
Code:
<html>
<head>
<title> Python Pictures </title>
</head>
<body>
<p id="first_p" align="center">
Some text...
<img src="Picture-1.jpg"/>
<b> Some more text </b>
</p>
</body>
</html>
Then I wrote a document that kind-of parses the document, it is pretty much
hard coded because I am still learning, but I am stumped on what to do once
I get to the <img> tag. How do extract that image and successfully
download it to my disk?
Here is my code and output.
Code:
import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("[URL unfurl="true"]http://www-static.cc.gatech.edu/~iceman3/Python/")[/URL]
soup = BeautifulSoup(page)
head = soup.contents[0]
print "head"
print head
print
print 'head.contents[3].contents[1]'
print head.contents[3].contents[1]
print
print 'head.contents[3].contents[1].contents[1]'
print head.contents[3].contents[1].contents[1]
OUTPUT:
Code:
head
<html>
<head>
<title> Python Pictures </title>
</head>
<body>
<p id="first_p" align="center">
Some text...
<img src="Picture-1.jpg" />
<b> Some more text </b>
</p>
</body>
</html>
head.contents[3].contents[1]
<p id="first_p" align="center">
Some text...
<img src="Picture-1.jpg" />
<b> Some more text </b>
</p>
head.contents[3].contents[1].contents[1]
<img src="Picture-1.jpg"/>
So as you can see, I get to the <img> tag finally! What do I do? I have
been googling this thing all day and have had no luck. I know python can do
it... I must be searching wrong or just overlooking something.