Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

[MS Word] Words Statistic

Status
Not open for further replies.

heizer

Programmer
Jan 19, 2008
4
PL
Hello!
I want to create words statistic. Someone on a polish forum help me with this, but partially. In this moment "my" code output the following results: words, frequency of occurence, and page number.
Code:
Option Explicit
Option Compare Text
Sub Test()
'based on: [URL unfurl="true"]http://www.vbaexpress.com/kb/getarticle.php?kb_id=727[/URL]
'based on: [URL unfurl="true"]http://www.authorsden.com/categories/article_top.asp?catid=20&id=39307[/URL]
    Dim lngRes                    As Long
    lngRes = WordCountAndPages(ThisDocument)
    Application.ScreenRefresh
    MsgBox "There were " & CStr(lngRes) & " different words ", vbOKOnly, "Finished"
End Sub
Function WordCountAndPages(SourceDoc As Word.Document, _
                           Optional ByVal sExcludes = "[the][a][of][is][to][for][by][be][and][are]", _
                           Optional ByVal lmaxwords As Long = 50000) As Long

    ' "[ale][ani][aby][do][od][czy][za][ze][przed][po]" 'excludes for example for polish
    On Error GoTo WordCountAndPages_Error

    Dim NewDoc                    As Word.Document
    Dim TmpRange                  As Word.Range
    Dim aWord                     As Object
    '----------------------------------------
    Dim tmpName                   As String
    Dim strSingleWord             As String

    Dim lngCurrentPage            As Long
    Dim lngPageCount              As Long
    Dim lngWordNum                As Long                  'Number of unique words
    Dim lngttlwds                 As Long                  'Total words in the document
    Dim j                         As Long
    Dim k                         As Long
    Dim w                         As Long
    Dim bTmpFound                 As Boolean               'Temporary flag

    ReDim arrWordList(1 To 1) As String                    'Array to hold unique words
    ReDim arrWordCount(1 To 1) As Long                     'Frequency counter for unique words
    ReDim arrPageW(1 To 1) As String                       'Pages unique words

    lngCurrentPage = 1   'we started to count from a first page to next pages

    With SourceDoc
        If ActiveDocument.FullName <> .FullName Then SourceDoc.Activate    ' because below selection
        Set TmpRange = .Range
        ' document's page count (maybe must refresh)
        lngPageCount = .Content.ComputeStatistics(wdStatisticPages) 'we counted number of pages
    End With
    '--------------
    ' The item in the Words collection includes both the word and the spaces after the word
    ' The Count property for this collection in a document returns the number of items in the main story only.
    ' Also, the Count property includes punctuation and paragraph marks in the total.
    lngttlwds = TmpRange.Words.Count                       '  SourceDoc.Words.Count
    '---------------------
    System.Cursor = wdCursorWait

    Do Until lngCurrentPage > lngPageCount
        If lngCurrentPage = lngPageCount Then
            TmpRange.End = SourceDoc.Range.End             'last page (there won't be a next page)
        Else
            'Find the beginning of the next page
            'Must use the Selection object. The Range.Goto method will not work on a page
            Selection.GoTo wdGoToPage, wdGoToAbsolute, lngCurrentPage + 1  'page, next page
            'Set the end of the range to the point between the pages
            TmpRange.End = Selection.Start
        End If
        '------------------------------------
        For Each aWord In TmpRange.Words
            ' 160 is non breaking space

                strSingleWord = LCase(Replace(Trim(aWord.Text), Chr(160), "")) 'lower character to upper character
                'maybe remove lower case form code, like this: (Replace(Trim(aWord.Text), Chr(160), ""))
                Select Case True
                Case Len(strSingleWord) = 1  'number of characters in single word
                ' if =1 then ignore "a" "&" if = 2 ignore for example: "an" "of" "is" "to" "by" "be"
                Case strSingleWord < "a" Or strSingleWord > "z"    'maybe remove this line form code
                Case InStr(1, sExcludes, "[" & strSingleWord & "]", vbTextCompare)
                Case Else
                    bTmpFound = False
                    For j = 1 To lngWordNum
                        If StrComp(arrWordList(j), strSingleWord, vbTextCompare) = 0 Then
                            arrWordCount(j) = arrWordCount(j) + 1
                            If (arrPageW(j) & "," Like "*," & CStr(lngCurrentPage) & ",*") = False Then
                            arrPageW(j) = arrPageW(j) & "," & CStr(lngCurrentPage)
                            End If
                            bTmpFound = True
                            Exit For
                        End If
                    Next j
                    If Not bTmpFound Then
                        lngWordNum = lngWordNum + 1
                        ReDim Preserve arrWordList(1 To lngWordNum)
                        ReDim Preserve arrWordCount(1 To lngWordNum)
                        ReDim Preserve arrPageW(1 To lngWordNum)
                        arrWordList(lngWordNum) = strSingleWord
                        arrWordCount(lngWordNum) = 1
                        arrPageW(lngWordNum) = arrPageW(lngWordNum) & "," & CStr(lngCurrentPage)
                    End If
                    If lngWordNum > lmaxwords - 1 Then
                        MsgBox "Too many words.", vbOKOnly
                        Exit For
                    End If
            End Select
            lngttlwds = lngttlwds - 1
            StatusBar = "Remaining: " & lngttlwds & ", Unique: " & lngWordNum
        Next aWord
        '------------------------------------
        lngCurrentPage = lngCurrentPage + 1      'move to the next page
        TmpRange.Collapse wdCollapseEnd          'go to the next page
    Loop
    '------------------------------------
    If lngWordNum > 0 Then
        tmpName = SourceDoc.AttachedTemplate.FullName    'output results
        Set NewDoc = Application.Documents.Add(Template:=tmpName, NewTemplate:=False)

        Selection.ParagraphFormat.TabStops.ClearAll
        Application.ScreenUpdating = False
        With Selection
            For j = 1 To lngWordNum
                .TypeText Text:=arrWordList(j) & vbTab & CStr(arrWordCount(j)) & vbTab & Mid(arrPageW(j), 2) & vbNewLine
            Next j
        End With

        Set TmpRange = NewDoc.Range
        TmpRange.ConvertToTable Separator:=wdSeparateByTabs
        With NewDoc.Tables(1)
            .Sort ExcludeHeader:=False, _
                  FieldNumber:="Kolumna 1", SortFieldType:=wdSortFieldAlphanumeric, SortOrder:=wdSortOrderAscending, _
                  FieldNumber2:="", _
                  FieldNumber3:="", _
                  CaseSensitive:=False, LanguageID:=wdPolish, IgnoreDiacritics:=False
            .Borders(wdBorderTop).LineStyle = wdLineStyleNone
            .Borders(wdBorderLeft).LineStyle = wdLineStyleNone
            .Borders(wdBorderBottom).LineStyle = wdLineStyleNone
            .Borders(wdBorderRight).LineStyle = wdLineStyleNone
            .Borders(wdBorderHorizontal).LineStyle = wdLineStyleNone
            .Borders(wdBorderVertical).LineStyle = wdLineStyleNone
            .Borders(wdBorderDiagonalDown).LineStyle = wdLineStyleNone
            .Borders(wdBorderDiagonalUp).LineStyle = wdLineStyleNone
        End With
    End If
    WordCountAndPages = lngWordNum
    '---------------------
WordCountAndPages_Exit:
    On Error Resume Next
    Set aWord = Nothing
    Set TmpRange = Nothing
    Set NewDoc = Nothing
    System.Cursor = wdCursorNormal
    Application.ScreenUpdating = True

    Exit Function

WordCountAndPages_Error:
    MsgBox "B??d : ( " & Err.Number & " ) " & Err.Description & vbCrLf & _
           "Procedura  : " & "WordCountAndPages", vbExclamation
    Resume WordCountAndPages_Exit
End Function
I also need another column with page numbers, where the page number looks like that:
1-3, 5, 6-8. Now the page numbers looks like this: 1,2,3,5,6,7,8. When the
page numbers are consecutive (sequential), beetwen a page numbers I need to
add a hyphen (dashes). Solution for this problem, I found at below page,
but for an awk language:
and also:

In another column I also need a number of page numbers. For example, when a
some word appear on pages 1,3,5,7, in output results in another column will
be 4.
And the last case: my code found all words in document. I need also search
for the keywords, for example this code do it:

I have also a some problems with some lines of my code:
strSingleWord < "a" Or strSingleWord > "z" 'fist character of word must
be a letter, this line ignore a special word, like "3w" "4you"
"3ware" "3love" "2day" , and so on...

I have also problem with some words, like: "@SCHL" "m&m", and so
on..., something line of code ignore this words.

Anyone help me with this?

You can also visit my thread at polish forum:

Thanks for your attention!
Regards
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top