Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations biv343 on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Scrape HTML Description and save to a DB

Status
Not open for further replies.

humbleprogrammer

Programmer
Oct 30, 2002
315
US
Hi,

I would like to grab a description off a page and save it into a database. I know how to save it into a db once I aquire the info but am not sure how to scrape info from pages. Is it possible to scrape the HTML for certain info? The good news is the description I need always starts with "Description:" so I would be able to search a page and grab info starting at this point and ending at antoher point. Is this possible?

Thanks in advance!
 
Yes, :)
You can get the page quite easily using the XMLHTTP object. Once you receive the page, the response can be stored in a variable and then treated as a (long) string variable. Using a combination of InStr and mid you should be able to get the required data from it.
Code:
<%
Response.Buffer = True
Dim objXMLHTTP, URL

' Create an xmlhttp object:
Set objXMLHTTP = Server.CreateObject(&quot;Microsoft.XMLHTTP&quot;)

'Set the URL of the page
URL = &quot;[URL unfurl="true"]http://www.yoursite.com/pagename.asp&quot;[/URL]

' Opens the connection to the remote server.
objXMLHTTP.Open &quot;GET&quot;, URL, False

' Actually Sends the request and returns the data:
objXMLHTTP.Send

'Declare a variabl to hold the data
Dim thePage

'assign the response to your variable
thePage = objXMLHTTP.ResponseText

'A few more variables for the mid function.
Dim startLoc   'Position for the beginning of the mid
Dim midLen     'Length of string to get
Dim startWord  'Word the prefixes the section you want
Dim endWord    'Word that follows the section you want
Dim content    'content from between the start and end words

startWord = &quot;Description.&quot;
endWord = &quot;</p>&quot;

startLoc = InStr(thePage,startWord,1) + len(startWord)
midLen = InStr(startLoc,thePage,startWord,1) - startLoc
content = mid(thePage,startLoc,midLen)

Response.Write &quot;The content from &quot; & Server.HTMLEncode(startWord) & &quot; to &quot; & Server.HTMLEncode(endWord) & &quot; is:<br>&quot; & content
%>

That should give you a pretty big headstart, hope it helps :)

-Tarwn --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
For my next trick I will pull a hat out of a rabbit (if you think thats bad you should see how the pigeon feels...) :p
 
Thanks Tarwn! This is what I needed. I haven't used the XMLHTTP object yet, so this is something new and exiting.
 
technically your suppose to use it to retrieve xml being generated in a foreign location, but hey, leave it to programmers to find 30 new uses for every tool :)
(I'm still trying to get the back of my hammer to fit a phillips head screw.)

-Tarwn --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
For my next trick I will pull a hat out of a rabbit (if you think thats bad you should see how the pigeon feels...) :p
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top