Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

complicated excel macro question

Status
Not open for further replies.

lucasm

IS-IT--Management
Feb 13, 2002
114
US
I've got a spreadsheet with about 1600 hyperlinks, all to pages with information on the corresponding item for each hyperlink. On each of these web pages, there is an address field and a phone number field, address is inside "<div class='address'" tags, and phone # is inside "<td class=ttldata>" tags. I'd like to programmatically extract this information from each of these pages and put them in corresponding columns alongside the url for each item in the spreadsheet. Is there some way to make a macro that will launch each url, grab this info from the html source - maybe using XPath? - and then put it back into the spreadsheet?
 
there is the Internet explorer application COM object which may be of some help? or those HTTP ones
 
You could use regular expressions to split common tags into an array and read the value. Something similar to....

Code:
Function ExecuteWebRequest(url As String) As String

    Dim oXHTTP As Object
    
    Set oXHTTP = CreateObject("MSXML2.XMLHTTP")
    oXHTTP.Open "GET", url, False
    oXHTTP.send
    ExecuteWebRequest = oXHTTP.responseText
    Set oXHTTP = Nothing

End Function

sub test()
Dim trimmedResult As String

URL = "[URL unfurl="true"]http://www.xxxxxxxxxreplace[/URL] with each URLxxxxxxxxxxx"
htmlOutput = ExecuteWebRequest(URL)
resultArray = Split(htmlOutput, "<div")


    Dim regex As Object, regexMatches As Object
     
    Set regex = CreateObject("vbscript.regexp")
    regex.MultiLine = True
    regex.Global = True


    For i = LBound(resultArray) + 1 To UBound(resultArray) 

    trimmedResult = Trim(resultArray(i))
     regex.Pattern = ">[\s\S]+?<" 'example regex for the url (change to suit or see [URL unfurl="true"]http://www.regexlib.com/CheatSheet.aspx)[/URL]

     Set URLMatches = regex.Execute(trimmedResult)

Next i
End sub
 
For your macro, set a reference to "Microsoft HTMLObject Library".
The following (untested) framework should get you started:
Code:
Dim objMSHTML As HTMLDocument, src As HTMLDocument
Dim div As HTMLDivElement, tbl As HTMLTableCell, j as Long

Set objMSHTML = New MSHTML.HTMLDocument
Set src = objMSHTML.createDocumentFromUrl([green][url][/green], vbNullString)

   For j = 0 To src.getElementsByTagName("div").length - 1
                Set div = src.getElementsByTagName("div")(j)
                If div.className = "address" Then
'Do your stuff
   Next j

   For j = 0 To src.getElementsByTagName("td").length - 1
                Set tbl= src.getElementsByTagName("td")(j)
                If tbl.className = "ttldata" Then
'Do your stuff
   Next j
;-)

Instead of parsing through the Tag collection like this, you can of course try using an XPath statement.

Cheers,
MiS

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
thanks guys - I'm looking forward to tinkering with this stuff soon
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top