Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extract data from HTML table 1

Status
Not open for further replies.

ineedhelplease

Technical User
Dec 18, 2002
18
US
Thanks to Tek-Tips I am able to logon to a secured website and navigate to specific URLs. I now have a problem. I need to know how to extract data from tables once I navigate to each URL. I tried the webquery approach, but the table number is not always the same. What I need to do is look at the value of each cell of a table to determine if what I am looking for is in the table, and if it is, extract the entire table.

What is the best way to do this?

Thanks for everyones help.
 
ineedhelplease,
The base structure of the routines should look familiar to you, it's a variation on the FAQ and should locate the actual stock values table in the spawned web page and create a local file with that data.
Code:
[navy]Function[/navy] GetWebTable() [navy]As Boolean[/navy]
[navy]On Error GoTo[/navy] GetWebTable_Error
[navy]Dim[/navy] objIE [navy]As Object[/navy]
[navy]Dim[/navy] varTables, varTable
[navy]Dim[/navy] varRows, varRow
[navy]Dim[/navy] varCells, varCell
[navy]Dim[/navy] strBuffer [navy]As String[/navy]

[green]'Set the return value[/green]
GetWebTable = [navy]True[/navy]

[green]'Spawn Internet Explorer[/green]
[navy]Set[/navy] objIE = CreateObject("InternetExplorer.Application")

[navy]DoEvents[/navy]

[green]'Remove all the controls since we Don't want the user to'monkey with it[/green]
[navy]With[/navy] objIE
  .AddressBar = [navy]False[/navy]
  .StatusBar = [navy]False[/navy]
  .MenuBar = [navy]False[/navy]
  .Toolbar = 0
  .Visible = [navy]True[/navy]
  .Navigate "[b][URL unfurl="true"]http://finance.yahoo.com/q/hp?s=ALU[/URL][/b]"
[navy]End With[/navy]

[green]'let IE do it's thing and Settle before we touch it[/green]
[navy]While[/navy] objIE.Busy
  [green]'Do Nothing[/green]
[navy]Wend
While[/navy] objIE.Document.ReadyState <> "complete"
  [green]'Again Do Nothing[/green]
[navy]Wend[/navy]

[navy]Set[/navy] varTables = objIE.Document.All.tags("TABLE")
[navy]For Each[/navy] varTable [navy]In[/navy] varTables
  [navy]Set[/navy] varRows = varTable.Rows
  [navy]For Each[/navy] varRow [navy]In[/navy] varRows
    [navy]Set[/navy] varCells = varRow.Cells
    [navy]If[/navy] varCells.Length > 2 [navy]Then[/navy]
      [navy]For Each[/navy] varCell [navy]In[/navy] varCells
        [b][green]'Test the current cell for a value[/green]
        [navy]If[/navy] varCell.InnerText = "Open" [navy]Then[/navy]
          [green]'The value was found so output the table and exit[/green]
          WriteHTMLTableToFile varTable.InnerHTML
          [navy]GoTo[/navy] Cleanup
        [navy]End If[/navy][/b]
      [navy]Next[/navy] varCell
    [navy]End If[/navy]
  [navy]Next[/navy] varRow
[navy]Next[/navy] varTable

Cleanup:
objIE.Quit
[navy]Set[/navy] varCell = [navy]Nothing[/navy]
[navy]Set[/navy] varCells = [navy]Nothing[/navy]
[navy]Set[/navy] varRow = [navy]Nothing[/navy]
[navy]Set[/navy] varRows = [navy]Nothing[/navy]
[navy]Set[/navy] varTable = [navy]Nothing[/navy]
[navy]Set[/navy] varTables = [navy]Nothing[/navy]
[navy]Set[/navy] objIE = [navy]Nothing[/navy]
[navy]Exit Function[/navy]

GetWebTable_Error:
[navy]Select Case[/navy] Err.Number
  [navy]Case[/navy] 0

  [navy]Case Else[/navy]
    [navy]Debug.Print[/navy] Err.Number, Err.Description
    GetWebTable = [navy]False[/navy]
    [navy]Stop[/navy]
[navy]End Select[/navy]
[navy]End Function[/navy]

[navy]Private Sub[/navy] WriteHTMLTableToFile(outerHTML [navy]As[/navy] String)
[navy]On Error GoTo[/navy] WriteHTMLTableToFile_Error
[green]'The following is a temporary directory used the cache the web page[/green]
[navy]Const[/navy] cWebTempDirectory [navy]As String[/navy] = "C:\"
[navy]Dim[/navy] intFile [navy]As Integer[/navy]

[green]'Get a file number and open the file we will dump the webpage Into[/green]
intFile = FreeFile
[navy]Open[/navy] cWebTempDirectory & "\WebOutput.htm" [navy]For Output As[/navy] #intFile
[green]'This will write the data To file using the HTML passed In[/green]
[green]'Notice the opening and closing page tags[/green]
[navy]Print[/navy] #intFile, "<HTML><BODY><TABLE>"
[navy]Print[/navy] #intFile, outerHTML
[navy]Print[/navy] #intFile, "</TABLE></BODY></HTML>"

Cleanup:
[navy]Close[/navy] #intFile
[navy]Exit Sub[/navy]

WriteHTMLTableToFile_Error:
[navy]Select Case[/navy] Err.Number
  [navy]Case[/navy] 76
    [green]'temp directiory Does not exist so create it[/green]
    VBA.MkDir cWebTempDirectory
    [navy]Resume[/navy]
  [navy]Case Else[/navy]
    Debug.Print Err.Number, Err.Description
    [navy]Stop[/navy]
[navy]End Select[/navy]
[navy]End Sub[/navy]

Hope this helps,
CMP

[small]For the best results do what I'm thinking, not what I'm saying.[/small]
(GMT-07:00) Mountain Time (US & Canada)
 
Hi Caution MP,

Fantastic bit of code! This is what I've been looking for, for ages. However, is there a way to write the code into a csv file rather than an hmtl file as I want to cycle thorugh a number of identical webpages and append the info

The webpage in question is:


Where by I need to simply change the product number.

I also want to pull out the product name (ie. TITLE), the little table showing the costs and cut off the table just before the 'out of stock' bit. Is this possible? Hope you or someone can help / give me some pointers.

Many thanks in advance

RodP
 
ineedhelplease,
This was built and tested in Excel 2000/IE7. The output is a csv file (C:\WebData.csv) that has a bunch of blank columns that should be easy enough to 'filter' out if you are importing into an Access database.

There are three routines:
[ol][li][tt]GetWebData[/tt]: Variant of the original post, feed it a URL and it will navigate to the page and output the data in the table called [tt]ProductDetailsTable[/tt] as a CSV string by calling the two other routines.*[/li]
[li][tt]TableToCSV[/tt]: Recursive function that will pull the [tt]outerText[/tt] from all cells (and imbeded tables) in the table you specify.[/li]
[li][tt]WriteToFile[/tt]: Simply appends the supplied string to the specified file.[/li][/ol]

[small]*You could speed this up by feeding a list of URLs to this function to eliminate the overhead of opening/closing IE for each web address you want to check.[/small]
Code:
[navy]Const[/navy] cDelimiter = """" & "," & """"

[b][navy]Function[/navy] GetWebData(URL [navy]As String[/navy]) [navy]As Boolean[/navy][/b]
[navy]On Error GoTo[/navy] GetWebData_Error
[navy]Dim[/navy] objIE [navy]As Object[/navy]
[navy]Dim[/navy] varTables, varTable
[navy]Dim[/navy] strBuffer [navy]As String[/navy]
GetWebData = [navy]True[/navy]
[navy]Set[/navy] objIE = CreateObject("InternetExplorer.Application")
[navy]DoEvents[/navy]
[navy]With[/navy] objIE
  .AddressBar = [navy]False[/navy]
  .StatusBar = [navy]False[/navy]
  .MenuBar = [navy]False[/navy]
  .Toolbar = 0
  .Visible = [navy]True[/navy]
  .Navigate URL
[navy]End With[/navy]
[navy]While[/navy] objIE.Busy
[navy]Wend[/navy]
[navy]While[/navy] objIE.Document.ReadyState <> "complete"
[navy]Wend[/navy]
[navy]Set[/navy] varTables = objIE.Document.All.tags("TABLE")
[navy]For Each[/navy] varTable [navy]In[/navy] varTables
  [navy]If[/navy] [u]varTable.ClassName = "ProductDetailsTable"[/u] [navy]Then[/navy]
    [b]WriteToFile[/b] "[i]C:\WebData.csv[/i]", Chr(34) & [b]TableToCSV([/b]varTable[b])[/b] & Chr(34)
  [navy]End If[/navy]
[navy]Next[/navy] varTable
Cleanup:
objIE.Quit
[navy]Set[/navy] varTable = [navy]Nothing[/navy]
[navy]Set[/navy] varTables = [navy]Nothing[/navy]
[navy]Set[/navy] objIE = [navy]Nothing[/navy]
[navy]Exit Function[/navy]
GetWebData_Error:
[navy]Select Case[/navy] Err.Number
  [navy]Case[/navy] 0
    [green]'Do Nothing[/green]
  [navy]Case Else[/navy]
    Debug.Print Err.Number, Err.Description
    GetWebData = [navy]False[/navy]
    [navy]Stop[/navy]
[navy]End Select[/navy]
[b][navy]End Function[/navy][/b]

[b][navy]Function[/navy] TableToCSV(WebTable [navy]As[/navy] Variant) [navy]As String[/navy][/b]
[navy]Dim[/navy] varTables [navy]As Variant[/navy], varTable [navy]As Variant[/navy]
[navy]Dim[/navy] varRow [navy]As Variant[/navy], varCell [navy]As Variant[/navy]
[navy]Dim[/navy] strOut [navy]As String[/navy]
[navy]For Each[/navy] varRow [navy]In[/navy] WebTable.Rows
  [navy]For Each[/navy] varCell [navy]In[/navy] varRow.Cells
    [navy]Set[/navy] varTables = varCell.All.tags("TABLE")
    [navy]If[/navy] varTables.Length > 0 [navy]Then[/navy]
      [navy]For Each[/navy] varTable [navy]In[/navy] varTables
        strOut = strOut & cDelimiter & TableToCSV(varTable)
      [navy]Next[/navy] varTable
    [navy]Else[/navy]
      strOut = strOut & cDelimiter & varCell.outer[navy]Text[/navy]
    [navy]End If[/navy]
  [navy]Next[/navy] varCell
[navy]Next[/navy] varRow
TableToCSV = strOut
[navy]Set[/navy] varCell = [navy]Nothing[/navy]
[navy]Set[/navy] varRow = [navy]Nothing[/navy]
[navy]Set[/navy] varTable = [navy]Nothing[/navy]
[navy]Set[/navy] varTables = [navy]Nothing[/navy]
[b][navy]End Function[/navy][/b]

[b][navy]Sub[/navy] WriteToFile(Filename [navy]As String[/navy], Data [navy]As String[/navy])[/b]
[navy]Dim[/navy] intFile [navy]As Integer[/navy]
intFile = FreeFile
[navy]Open[/navy] Filename [navy]For Append As[/navy] #intFile
[navy]Print[/navy] #intFile, Data
[navy]Close[/navy] #intFile
[b][navy]End Sub[/navy][/b]

As usual I leave the real error handling up to you...

Hope this helps,
CMP
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top