Hi....
I am a third year software design student on work placement. I have been given a task to parse 100's of word documents with the following requirements:
1. Create a Multi list box of chapters, populated by a config file
2. Parse the word document based on selection - (extract whole chapter)
3. Export chapter to an excel spread sheet
I have a number of obstacles here...1. I'm very new to vba (2 weeks) 2. chapter names appear regularly throughout the document, I have to differentiate by heading style. 3. The documents are 20,000+ words long so what I've done thus far is extremely slow. I am working out of excel vba.
I have posted what I've done so far below. This allows me to select the multilist box, and search for the selected items. It is successful in it's task, however, I need to select all text and tables within that chapter and copy it over to an excel spreadsheet. I can copy to the worksheets within the workbook I'm working out of. So here's my request, I have managed within 2-3 weeks to make some progress, however, from here I seem to be drawing a blank and I've a progress meeting next week and I'm stumped.
So could someone please show me how to parse the content from the chapters..I would be so so so so grateful. I really don't know how to set ranges either....
Oh, there are already bookmarks within the doc and there are hyperlinks, the hyperlinks are in the index, so if you hold ctrl and click it brings you directly to the chapter
Any help would be really really appreciated!!
'====================================================================
' POPULATING LIST BOX WITH DATA IN
' CONFIG WORKSHEET
'=====================================================================
Private Sub UserForm_Initialize()
ListBox1.ListFillRange = "Config!A1:A45"
End Sub
'======================================================================
' PROCESSING LISTBOX SELECTION
'======================================================================
Public Sub Parse_Click()
'======================================================================
' DECLARING VARIABLES
'======================================================================
Dim i As Long
Dim C As New Collection
Dim Path As String
With ListBox1
For i = 0 To .ListCount - 1
'Add all selected items to a collection
If .Selected(i) Then C.Add .List(i)
Next
End With
'Nothing selected, nothing to do
If C.Count = 0 Then Exit Sub
With Application.FileDialog(msoFileDialogFolderPicker)
.Title = "Select Folder to Process and Click OK"
.AllowMultiSelect = False
.InitialView = msoFileDialogViewList
If .Show <> -1 Then Exit Sub
Path = .SelectedItems(1)
If Right(Path, 1) <> "\" Then Path = Path + "\"
'Remove any "
Path = Replace(Path, """", "")
End With
If Dir$(Path & "*.doc") = "" Then
MsgBox "No files found"
Exit Sub
End If
On Error GoTo Errorhandler
ParseDoc Path, C
Exit Sub
Errorhandler:
MsgBox "Error " & Err.Number & ": " & Err.Description
End Sub
'======================================================================
' PARSING WORD DOC FOR
' SELECTED ITEMS
'======================================================================
Public Sub ParseDoc(ByVal strPath As String, ByVal Items As Collection)
Dim objExcel As Object 'Excel.Application
Dim ExcelBook As Object 'Excel.Workbook
Dim WasOpen As Boolean
Dim oDoc As Document
Dim oPara As Paragraph
Dim strFilename As String
Dim Item
Dim Rng As Range
Dim objWord As Word.Application
Set objWord = New Word.Application
objWord.Visible = True
'Setting Location of Excel Spread for Parsed Details
Const WorkBookName As String = "C:\Users\edoogar\Documents\ParseProject\ParseDetails.xls"
'Set objWord = New Word.Application
On Error Resume Next
WasOpen = True
Set objExcel = GetObject(, "Excel.Application")
If objExcel Is Nothing Then
Set objExcel = CreateObject("Excel.Application")
If objExcel Is Nothing Then _
Err.Raise 1000, "ParseDoc", "Excel is not accessible"
objExcel.Visible = True
WasOpen = False
End If
Set ExcelBook = objExcel.Workbooks.Open(Filename:=WorkBookName)
If ExcelBook Is Nothing Then
If WasOpen Then objExcel.Quit
Err.Raise 1001, "ParseDoc", "Can not open " & WorkBookName
End If
On Error GoTo 0
WordBasic.DisableAutoMacros 1
strFilename = Dir$(strPath & "*.doc")
While Len(strFilename) <> 0
Set oDoc = objWord.Documents.Open(Filename:=strPath & strFilename, AddToRecentFiles:=False)
For Each oPara In oDoc.Paragraphs
For Each Item In Items
If InStr(1, oPara.Range, Item) > 0 Then
If InStr(1, oPara.Style, "H2") > 0 Then
oPara.Range.Select
MsgBox "You have found the string!"
GoTo CloseDoc
End If
End If
Next
Next
CloseDoc:
oDoc.Close wdDoNotSaveChanges
strFilename = Dir$()
Wend
WordBasic.DisableAutoMacros 0
objWord.Quit
'ExcelBook.Close
'If WasOpen Then objExcel.Quit
End Sub
I am a third year software design student on work placement. I have been given a task to parse 100's of word documents with the following requirements:
1. Create a Multi list box of chapters, populated by a config file
2. Parse the word document based on selection - (extract whole chapter)
3. Export chapter to an excel spread sheet
I have a number of obstacles here...1. I'm very new to vba (2 weeks) 2. chapter names appear regularly throughout the document, I have to differentiate by heading style. 3. The documents are 20,000+ words long so what I've done thus far is extremely slow. I am working out of excel vba.
I have posted what I've done so far below. This allows me to select the multilist box, and search for the selected items. It is successful in it's task, however, I need to select all text and tables within that chapter and copy it over to an excel spreadsheet. I can copy to the worksheets within the workbook I'm working out of. So here's my request, I have managed within 2-3 weeks to make some progress, however, from here I seem to be drawing a blank and I've a progress meeting next week and I'm stumped.
So could someone please show me how to parse the content from the chapters..I would be so so so so grateful. I really don't know how to set ranges either....
Oh, there are already bookmarks within the doc and there are hyperlinks, the hyperlinks are in the index, so if you hold ctrl and click it brings you directly to the chapter
Any help would be really really appreciated!!
'====================================================================
' POPULATING LIST BOX WITH DATA IN
' CONFIG WORKSHEET
'=====================================================================
Private Sub UserForm_Initialize()
ListBox1.ListFillRange = "Config!A1:A45"
End Sub
'======================================================================
' PROCESSING LISTBOX SELECTION
'======================================================================
Public Sub Parse_Click()
'======================================================================
' DECLARING VARIABLES
'======================================================================
Dim i As Long
Dim C As New Collection
Dim Path As String
With ListBox1
For i = 0 To .ListCount - 1
'Add all selected items to a collection
If .Selected(i) Then C.Add .List(i)
Next
End With
'Nothing selected, nothing to do
If C.Count = 0 Then Exit Sub
With Application.FileDialog(msoFileDialogFolderPicker)
.Title = "Select Folder to Process and Click OK"
.AllowMultiSelect = False
.InitialView = msoFileDialogViewList
If .Show <> -1 Then Exit Sub
Path = .SelectedItems(1)
If Right(Path, 1) <> "\" Then Path = Path + "\"
'Remove any "
Path = Replace(Path, """", "")
End With
If Dir$(Path & "*.doc") = "" Then
MsgBox "No files found"
Exit Sub
End If
On Error GoTo Errorhandler
ParseDoc Path, C
Exit Sub
Errorhandler:
MsgBox "Error " & Err.Number & ": " & Err.Description
End Sub
'======================================================================
' PARSING WORD DOC FOR
' SELECTED ITEMS
'======================================================================
Public Sub ParseDoc(ByVal strPath As String, ByVal Items As Collection)
Dim objExcel As Object 'Excel.Application
Dim ExcelBook As Object 'Excel.Workbook
Dim WasOpen As Boolean
Dim oDoc As Document
Dim oPara As Paragraph
Dim strFilename As String
Dim Item
Dim Rng As Range
Dim objWord As Word.Application
Set objWord = New Word.Application
objWord.Visible = True
'Setting Location of Excel Spread for Parsed Details
Const WorkBookName As String = "C:\Users\edoogar\Documents\ParseProject\ParseDetails.xls"
'Set objWord = New Word.Application
On Error Resume Next
WasOpen = True
Set objExcel = GetObject(, "Excel.Application")
If objExcel Is Nothing Then
Set objExcel = CreateObject("Excel.Application")
If objExcel Is Nothing Then _
Err.Raise 1000, "ParseDoc", "Excel is not accessible"
objExcel.Visible = True
WasOpen = False
End If
Set ExcelBook = objExcel.Workbooks.Open(Filename:=WorkBookName)
If ExcelBook Is Nothing Then
If WasOpen Then objExcel.Quit
Err.Raise 1001, "ParseDoc", "Can not open " & WorkBookName
End If
On Error GoTo 0
WordBasic.DisableAutoMacros 1
strFilename = Dir$(strPath & "*.doc")
While Len(strFilename) <> 0
Set oDoc = objWord.Documents.Open(Filename:=strPath & strFilename, AddToRecentFiles:=False)
For Each oPara In oDoc.Paragraphs
For Each Item In Items
If InStr(1, oPara.Range, Item) > 0 Then
If InStr(1, oPara.Style, "H2") > 0 Then
oPara.Range.Select
MsgBox "You have found the string!"
GoTo CloseDoc
End If
End If
Next
Next
CloseDoc:
oDoc.Close wdDoNotSaveChanges
strFilename = Dir$()
Wend
WordBasic.DisableAutoMacros 0
objWord.Quit
'ExcelBook.Close
'If WasOpen Then objExcel.Quit
End Sub