Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extract word table cell data from multiple docs to a single excel sheet

Status
Not open for further replies.

Codman

Technical User
Nov 25, 2003
44
GB
I have many .doc & .docx files, each with a header table (Table 1) in the same format. I need to pull Cell positions C1R1, C2R3 and C2R4 on all documents to a single sheet within an Excel file. Although I can find script that allows me to locate the Excel file in the same folder and pull all of table 1, I can't seem to modify it to just extract the cells as shown above.

By the way I have approximately 15 folder each having circa 100-300 word documents each!

the code I used was:-

Option Explicit

Sub test()

Dim oWord As Word.Application
Dim oDoc As Word.Document
Dim oCell As Word.Cell
Dim sPath As String
Dim sFile As String
Dim r As Long
Dim c As Long
Dim Cnt As Long

Application.ScreenUpdating = False

Set oWord = CreateObject("Word.Application")

sPath = "C:\Users\Domenic\Desktop\" 'change the path accordingly

If Right(sPath, 1) <> "\" Then sPath = sPath & "\"

sFile = Dir(sPath & "*.doc")

r = 2 'starting row
c = 1 'starting column
Cnt = 0
Do While Len(sFile) > 0
Cnt = Cnt + 1
Set oDoc = oWord.Documents.Open(sPath & sFile)
For Each oCell In oDoc.Tables(1).Range.Cells
Cells(r, c).Value = Replace(oCell.Range.Text, Chr(13) & Chr(7), "")
c = c + 1
Next oCell
oDoc.Close savechanges:=False
r = r + 1
c = 1
sFile = Dir
Loop

Application.ScreenUpdating = True

If Cnt = 0 Then
MsgBox "No Word documents were found...", vbExclamation
End If

End Sub
 
Try the following Excel macro:
Code:
Sub GetWordTableData()
'Note: this code requires a reference to the Word object model.
'See under the VBE's Tools|References.
Application.ScreenUpdating = False
Dim wdApp As New Word.Application, wdDoc As Word.Document
Dim strFolder As String, strFile As String, WkSht As Worksheet, r As Long
strFolder = GetFolder
If strFolder = "" Then Exit Sub
Set WkSht = ActiveSheet
r = WkSht.Cells(WkSht.Rows.Count, 1).End(xlUp).Row
'Disable any auto macros in the documents being processed
wdApp.WordBasic.DisableAutoMacros
strFile = Dir(strFolder & "\*.doc", vbNormal)
While strFile <> ""
  r = r + 1
  Set wdDoc = wdApp.Documents.Open(Filename:=strFolder & "\" & strFile, AddToRecentFiles:=False, Visible:=False)
  With wdDoc
    With .Tables(1)
      WkSht.Cells(r, 1) = Split(.Cell(1, 1).Range.Text, vbCr)(0)
      WkSht.Cells(r, 2) = Split(.Cell(3, 2).Range.Text, vbCr)(0)
      WkSht.Cells(r, 3) = Split(.Cell(4, 2).Range.Text, vbCr)(0)
    End With
    .Close SaveChanges:=False
  End With
  strFile = Dir()
Wend
wdApp.Quit
Set wdDoc = Nothing: Set wdApp = Nothing: Set WkSht = Nothing
Application.ScreenUpdating = True
End Sub
 
Function GetFolder() As String
    Dim oFolder As Object
    GetFolder = ""
    Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)
    If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path
    Set oFolder = Nothing
End Function

Cheers
Paul Edstein
[MS MVP - Word]
 
Hi Paul,

This code crashed when tying to read the table. I looked into this to find two issues; the first was no data in the cell, the second was a difference in some of the template layout I wasn't aware of.

In the case of no data, this stalls the whole process, so if on document 3 of 100 I'd have to either remove that document or enter data. This would be very time consuming I think? Can you advise any solution to that scenario?

For the second issue I have attached a file showing that the table has merges etc and in some cases an additional row at the top.

Can you think how this can be handled, or else I'd have to resort to "Old School" !
 
 http://files.engineering.com/getfile.aspx?folder=c4f4a7fa-c1c6-4a93-b38d-079d1c19d7d5&file=cell_arrangement.PNG
What your pic does not disclose is that you apparently have two cells that contain a heading (Customer Address & Project Ref: -) but are the two cells indeed two cells or are the headings in a separate cell from the intended data?

Skip,

[glasses]Just traded in my OLD subtlety...
for a NUance![tongue]
 
Whether a table cell in a particular Word document contains any data will not cause the macro to crash. What will cause it to crash is if the table or cell doesn't exist. You really should ensure the documents you want to extract the data from all have the same table structure - at least insofar as the cells you want to extract the data from are concerned.

You can't expect a macro coded for one table structure to work with documents containing an entirely different table structure. If you can't edit the documents to ensure consistency, you'll need to sort them into separate folders for each time and run different versions of the macro on each.

Cheers
Paul Edstein
[MS MVP - Word]
 
Hi Skip,

The data for the customer name and address is text in single cell and that's okay as I only want this cell plus the email and telephone number. The res t of the data is required for my task. Hence I need complete data from three separate cells.
 
Hi Paul,

You're correct, I can't expect a macro to pull data fro tables that differ, hence my reference to "Old School" (manual data retrieval). It was worth asking the question though as the impossible is sometime a stretch that can be achieved ! :)
 
Based on the picture you attached, I created simple table in Word and entered the data that was on the picture. (I also added a simple Customer Address as "124 Main Street" as well.) Then I saved the document as Text Only (txt) file.

This is what I've got (I've colored [blue]data[/blue] for easy read):
[tt]Customer Address[blue]
124 Main Street[/blue]Our Ref.[blue]TBA[/blue]Your Ref.E-Mail [blue]enquiry[/blue]Page:[blue]1 of 4[/blue]Date:[blue]22nd of January 2010[/blue]Project Ref: -[blue]
ANY[/blue] Tel. [blue]+441234564567 [/blue]E-Mail: [blue]anyone@anyone.com [/blue]
[/tt]
So if you can get this simple text out of any/all of your Word docs, harvesting the data would be an easy task - as long as headers are constant.


---- Andy

There is a great need for a sarcasm font.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top