My objective: to move through a range of IDs, pull the HTML down, and convert it to plain text.
Below is the actual link:
An example range: 86650 - 87000
Below is the actual code to pull down the requested data:
I am thinking I need to supply some sort of range and define a scenario of x+1 to move through the IDs...pulling down the data after each new URL ID is reached.
Any ideas would be helpful...
Below is the actual link:
Code:
[URL unfurl="true"]http://www.albme.org/index.cfm?fuseaction=app.LicenseeDetails2&ID=86699[/URL]
An example range: 86650 - 87000
Below is the actual code to pull down the requested data:
Code:
import sys, urllib
from StringIO import StringIO
import html2text
if __name__ == '__main__':
url = '[URL unfurl="true"]http://www.albme.org/index.cfm?fuseaction=app.LicenseeDetails2&ID=86699'[/URL]
encoding = 'utf-8'
f = urllib.urlopen(url)
try: s = f.read()
finally: f.close()
ustr = s.decode(encoding)
b = StringIO()
old = sys.stdout
try:
sys.stdout = b
html2text.wrapwrite(html2text.html2text(ustr, url))
finally: sys.stdout = old
text = b.getvalue()
b.close()
print text
I am thinking I need to supply some sort of range and define a scenario of x+1 to move through the IDs...pulling down the data after each new URL ID is reached.
Any ideas would be helpful...