Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing Chapters in a Word Doc and Export to Excel

Status
Not open for further replies.

DJDOO

Technical User
Mar 15, 2012
2
0
0
IE
Hi....

I am a third year software design student on work placement. I have been given a task to parse 100's of word documents with the following requirements:

1. Create a Multi list box of chapters, populated by a config file
2. Parse the word document based on selection - (extract whole chapter)
3. Export chapter to an excel spread sheet

I have a number of obstacles here...1. I'm very new to vba (2 weeks) 2. chapter names appear regularly throughout the document, I have to differentiate by heading style. 3. The documents are 20,000+ words long so what I've done thus far is extremely slow. I am working out of excel vba.

I have posted what I've done so far below. This allows me to select the multilist box, and search for the selected items. It is successful in it's task, however, I need to select all text and tables within that chapter and copy it over to an excel spreadsheet. I can copy to the worksheets within the workbook I'm working out of. So here's my request, I have managed within 2-3 weeks to make some progress, however, from here I seem to be drawing a blank and I've a progress meeting next week and I'm stumped.

So could someone please show me how to parse the content from the chapters..I would be so so so so grateful. I really don't know how to set ranges either....

Oh, there are already bookmarks within the doc and there are hyperlinks, the hyperlinks are in the index, so if you hold ctrl and click it brings you directly to the chapter

Any help would be really really appreciated!!

'====================================================================
' POPULATING LIST BOX WITH DATA IN
' CONFIG WORKSHEET
'=====================================================================
Private Sub UserForm_Initialize()

ListBox1.ListFillRange = "Config!A1:A45"

End Sub

'======================================================================
' PROCESSING LISTBOX SELECTION
'======================================================================

Public Sub Parse_Click()

'======================================================================
' DECLARING VARIABLES
'======================================================================

Dim i As Long
Dim C As New Collection
Dim Path As String

With ListBox1
For i = 0 To .ListCount - 1
'Add all selected items to a collection
If .Selected(i) Then C.Add .List(i)
Next
End With

'Nothing selected, nothing to do
If C.Count = 0 Then Exit Sub

With Application.FileDialog(msoFileDialogFolderPicker)
.Title = "Select Folder to Process and Click OK"
.AllowMultiSelect = False
.InitialView = msoFileDialogViewList
If .Show <> -1 Then Exit Sub

Path = .SelectedItems(1)
If Right(Path, 1) <> "\" Then Path = Path + "\"
'Remove any "
Path = Replace(Path, """", "")
End With

If Dir$(Path & "*.doc") = "" Then
MsgBox "No files found"
Exit Sub
End If

On Error GoTo Errorhandler
ParseDoc Path, C
Exit Sub

Errorhandler:
MsgBox "Error " & Err.Number & ": " & Err.Description
End Sub

'======================================================================
' PARSING WORD DOC FOR
' SELECTED ITEMS
'======================================================================

Public Sub ParseDoc(ByVal strPath As String, ByVal Items As Collection)
Dim objExcel As Object 'Excel.Application
Dim ExcelBook As Object 'Excel.Workbook
Dim WasOpen As Boolean
Dim oDoc As Document
Dim oPara As Paragraph
Dim strFilename As String
Dim Item
Dim Rng As Range
Dim objWord As Word.Application
Set objWord = New Word.Application
objWord.Visible = True

'Setting Location of Excel Spread for Parsed Details
Const WorkBookName As String = "C:\Users\edoogar\Documents\ParseProject\ParseDetails.xls"

'Set objWord = New Word.Application
On Error Resume Next
WasOpen = True
Set objExcel = GetObject(, "Excel.Application")
If objExcel Is Nothing Then
Set objExcel = CreateObject("Excel.Application")
If objExcel Is Nothing Then _
Err.Raise 1000, "ParseDoc", "Excel is not accessible"
objExcel.Visible = True
WasOpen = False
End If

Set ExcelBook = objExcel.Workbooks.Open(Filename:=WorkBookName)
If ExcelBook Is Nothing Then
If WasOpen Then objExcel.Quit
Err.Raise 1001, "ParseDoc", "Can not open " & WorkBookName
End If
On Error GoTo 0

WordBasic.DisableAutoMacros 1
strFilename = Dir$(strPath & "*.doc")
While Len(strFilename) <> 0
Set oDoc = objWord.Documents.Open(Filename:=strPath & strFilename, AddToRecentFiles:=False)

For Each oPara In oDoc.Paragraphs
For Each Item In Items
If InStr(1, oPara.Range, Item) > 0 Then
If InStr(1, oPara.Style, "H2") > 0 Then
oPara.Range.Select
MsgBox "You have found the string!"
GoTo CloseDoc
End If
End If
Next
Next

CloseDoc:
oDoc.Close wdDoNotSaveChanges
strFilename = Dir$()

Wend
WordBasic.DisableAutoMacros 0
objWord.Quit
'ExcelBook.Close
'If WasOpen Then objExcel.Quit
End Sub
 

hi,

Welcome to Tek-Tips. You have done remarkably well already, after a brief scan of your work. Thank you for posting your code and providing the information that you have already.

I would ask you to elaborate on your direct question, "So could someone please show me how to parse [highlight]the content[/highlight] from the chapters."

[highlight]What are you referring to?[/highlight]



Skip,
[sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 
Hi Skip,

It's great to be this far, albeit not without help!

Sorry for not being clear in my post. Each chapter contains both paragraphs and tables. So at the moment, my code is finding the chapter heading selected, what I want/need to do is, essentially select all text from that point up to the next chapter heading copy it, then paste it to a worksheet if that makes sense? So I basically am unsure (very unsure) as how to set the range from the required chapter heading to the next chapter heading & select it. If I can get that, then I'm sure I can get the copy and paste.

selection.copy or something along those lines...


I don't know if I'm explaining myself correctly, can you see my logic?

If you could help me out skip I would be so grateful..
 


Okay. I would loop thru Sections in the document, as it appears that each Section is a Chapter.

So using a Section you might be able to 1) parse the Chapter Heading from the left() of the Section.Range.Text property and 2) parse the remainder from the right().

Are Chapter Heading a separate paragraph, or ONLY defined as a format?

Skip,
[sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 


Here is some test code I played around with. It does not account for the trailing paragraph character...
Code:
Sub parseIt()
    Dim wd As Word.Application, wdDoc As Word.Document, wdSec As Word.Section, wdSty As Word.Style
    Dim sFile As String, i As Integer, sText As String
'[highlight]your doc path here[/highlight] [b]
    sFile = "C:\Documents and Settings\ii36250\My Documents\test Chapters.docx"[/b]
    
    Set wd = New Word.Application
    
    Set wdDoc = wd.Documents.Open(sFile)
    
    For Each wdSec In wdDoc.Sections
        sText = wdSec.Range.Text
        Set wdSty = wdSec.Range.Characters(1).Style
        
        For i = 2 To Len(sText)
            If wdSty <> wdSec.Range.Characters(i).Style Then Exit For
        Next
        Debug.Print wdSec.Index
        Debug.Print Left(sText, i - 1)
        Debug.Print Right(sText, Len(sText) - i)
        Debug.Print
    Next
    
    wdDoc.Close
    wd.Quit
    
    Set wdDoc = Nothing
    Set wd = Nothing
End Sub

Skip,
[sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 
I would loop thru Sections in the document, as it appears that each Section is a Chapter.
I would certainly agree with the first part, but why do you think each Section is a Chapter?

 
Gerry,

Based on the premise stated by the OP...
2. Parse the word document [highlight]based on selection - (extract whole chapter)[/highlight]


Skip,
[sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 
Uh Skip? That is selection...not SECTION.
Skip said:
I would loop thru Sections in the document, as it appears that each Section is a Chapter.
Does it now?

 
Also, this is cross-posted to VBAExpress.

 
Given that you're apparently working with Heading 3 as the Chapter headings, you could use code like the following:
Code:
Sub ExtractH2Text()
Application.ScreenUpdating = False
Dim FRng As Range, CRng As Range
Set FRng = Activedocument.Range
' To process just a selected range, use the next line instead
'Set FRng = Selection.Range
With FRng
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = ""
    .Style = "Heading 3"
    .Replacement.Text = ""
    .Forward = True
    .Wrap = wdFindStop
    .Format = True
    .MatchCase = False
    .MatchWholeWord = False
    .MatchWildcards = False
    .MatchSoundsLike = False
    .MatchAllWordForms = False
    .Execute
  End With
  Do While .Find.Found
    Set CRng = .Paragraphs(1).Range.Duplicate
    With CRng
      Do
       Select Case .Paragraphs.Last.Next.Style
        ' The Styles listed here tell Word when to stop extending the range.
        Case "Heading 1", "Heading 2", "Heading 3"
          Exit Do
        Case Else
          If .Paragraphs.Last.Next.Range.Start > FRng.End Then Exit Do
          .MoveEnd wdParagraph, 1
        End Select
      Loop
      ' Insert here the code to export the data to Excel. For example,
      ' if you want to preserve formatting, you might use copy & paste;
      ' otherwise, you might populate a string variable with the range
      ' text, then insert that into Excel.
    End With
    .Find.Execute
  Loop
End With
Set FRng = Nothing: Set CRng = Nothing
Application.ScreenUpdating = True
End Sub

Cheers
Paul Edstein
[MS MVP - Word]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top