Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Using VBA to extract data from HTML

Status
Not open for further replies.

ksicard

IS-IT--Management
Dec 30, 2008
2
US
I am looking for the best way to extract values from HTML via excel. Specifically I would like to get a value for a specific class and id in the HTML.
 
Open the HTML file ("open" command) and find unique pieces of text either side of that which you want. This will enable you to extract the required info and use it as you wish within Excel. "InStr" and "Mid" will be useful commands.

However, the easiest way in my opinion would be to use a Office Add-in or even just a DLL written in C++, Delphi, VB etc as parsing is easy in these languages - there are plenty of free components and code examples available. The result is likely to be more robust and faster. How much of an issue this is depends on the size of the HTML and the details of what you are trying to find and the exact structure of the HTML.

I'm puzzled as to what you mean by "class" in HTML. Do you mean a CSS class?



Paul Rigby
VBA - C# - Delphi
 
Have a look at the "Microsoft HTML Object" reference.
With it, you can parse through HTMLs similarly as you would through an XML.

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
Couple of ideas.
Code:
Sub ClassNameTest()
  Dim objIE As Object
  Dim objIETag As Object, objIEElement As Object
  
  Set objIE = CreateObject("InternetExplorer.Application")
  objIE.Visible = True
  objIE.Navigate "[URL unfurl="true"]http://finance.yahoo.com"[/URL]
  
  While objIE.Busy: Wend
  While objIE.readyState <> 4: Wend
  
  'Discover the class without an id
  For Each objIETag In objIE.document.all
    Debug.Print objIETag.tagName, "id: "; objIETag.ID, "className: "; objIETag.className
  Next objIETag
  
  'I've never used this before but it works in Win XP, Office 2k, IE 7
  Set objIEElement = objIE.document.getElementById("quoteContainer")
  For Each objIETag In objIEElement.all
    Debug.Print objIETag.tagName, "id: "; objIETag.ID, "className: "; objIETag.className
  Next objIETag
  
  Set objIEElement = Nothing
  Set objIETag = Nothing
  objIE.Quit
  Set objIE = Nothing
End Sub

Hope this helps,
CMP

[small]For the best results do what I'm thinking, not what I'm saying.[/small]
(GMT-07:00) Mountain Time (US & Canada)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top