Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Word Search is Progressively Slow

Status
Not open for further replies.

hilbertl3

Technical User
Sep 22, 2004
77
US
I am executing a vba macro in a microsoft word document that has 1600 pages. The macro looks in the document for a specific string. If the string is found, it prints out the page the string is on as a pdf, then it executes the search again in a loop. When I execute this macro on docs that are 200 pgs long, the pdfs are generated quite rapidly- within 5 minutes. However, when I run the same macro on a large document, the pdfs are initially generated rapidly but as the macro progresses down through the document, they are generated more and more slowly. The code gos from a high of 9 pdfs per minute to 1 pdf every 10 minutes. The size of each pdf is going to be the same, and the string I'm searching for occurs every 2 pages. Can anyone shed some light on this? Here's the code...



Public Sub nm22(filetype As String, fsearch As String, fileRealName As String)

Dim cp As Integer
Dim var1, var2 As Variant
Dim int1 As Integer
Dim fname, mstring As String

Dim lnumber As String

Dim exapp As Object
Set exapp = New Excel.Application

Dim kn As Integer
On Error GoTo handle1


'mstring = "k:\common\"
mstring = "C:\welcome\"
With exapp.Application.FileSearch


.LookIn = mstring
.SearchSubFolders = False
.FileName = fileRealName
.Execute



ReDim var2(.FoundFiles.Count - 1)
int1 = 0


For Each var1 In .FoundFiles

'fname = Mid(var1, 22, Len(var1) - 25)
fname = Mid(var1, 12, Len(var1) - 15)
Documents.Open FileName:=var1, ConfirmConversions:= _
False, ReadOnly:=False, AddToRecentFiles:=False, PasswordDocument:="", _
PasswordTemplate:="", Revert:=False, WritePasswordDocument:="", _
WritePasswordTemplate:="", Format:=wdOpenFormatAuto
Word.Application.Visible = True
ActivePrinter = "Adobe PDF"
Selection.EndKey Unit:=wdStory
Selection.HomeKey Unit:=wdStory


Do While Selection.Find.Execute(findtext:=fsearch, Forward:=True, Wrap:=wdFindStop) = True
Selection.Collapse direction:=wdCollapseEnd

Selection.MoveEnd Unit:=wdLine, Count:=1
If filetype = "GB" Or filetype = "LP" Then
lnumber = Trim(Mid((Selection.Text), 1, Len(Selection.Text) - 1))
ElseIf filetype = "WL" Then
lnumber = Trim(Mid((Selection.Text), 2, Len(Selection.Text) - 2))
ElseIf filetype = "DV" Then
lnumber = Trim(Mid((Selection.Text), 3, Len(Selection.Text) - 3))
End If

Selection.MoveDown Unit:=wdLine, Count:=1
cp = Selection.Information(wdActiveEndSectionNumber)
Application.PrintOut FileName:="", Range:=wdPrintRangeOfPages, Item:= _
wdPrintDocumentContent, Copies:=1, Pages:="s" & cp & "-" & "s" & (cp), PageType:= _
wdPrintAllPages, Collate:=True, Background:=False, PrintToFile:=False
Name "c:\wiredb\system\" & fname & ".pdf" As "C:\Welcome\" & filetype & "\" & filetype & "_" & lnumber & ".pdf"
Loop
ActiveDocument.Close


Next

End With


MsgBox "Complete"

end sub
 
Hi hilbert,

I suspect the performance fall-off is casued by Word re-paginating the document every time you print. There's no way to stop that.

I'm guessing too that Word only needs to re-paginate up to the print page. If so, when you're printing pages from the front of the document, Word takes less time to repaginate than when you get further into the document. This would explain Word getting progresively slower the further you get into the document.

If, as you say, you're printing the page the found string appears on, you should be able to reduce your 'Application.PrintOut' line to:
Application.PrintOut Filename:="", Range:=wdPrintCurrentPage, Pages:="", Background:=True
These measures, especially the last, should speed things up a bit.

Cheers

[MS MVP - Word]
 
It could be a few reasons. Some comments.

Wht do you have the Selection moving to the end of the document (just after it is opened), then move it back to the start? It is already at the start.
Code:
Selection.EndKey Unit:=wdStory
Selection.HomeKey Unit:=wdStory

Why are you doing this as an Excel filesearch? It seems you are in Word - so why is this being done in Excel?

Code:
If filetype = "GB" Or filetype = "LP" Then
lnumber = Trim(Mid((Selection.Text), 1, Len(Selection.Text) - 1))
ElseIf filetype = "WL" Then
lnumber = Trim(Mid((Selection.Text), 2, Len(Selection.Text) - 2))
ElseIf filetype = "DV" Then
lnumber = Trim(Mid((Selection.Text), 3, Len(Selection.Text) - 3))
End If
I would do the above as a Select Case, rather than If...Then.

I am not quite following the movement of the Selection. If the purpose is to find a string, then print the page the string is on - then why are you moving the Selection around? You are moving the Selection End one line. What is the purpose of this? Your print instruction seems to deal with sections - not just pages.

Moving the Selection should be avoided, if possible, and certainly should be avoided when there is such a large amount of processing to do.

I am assuming that you really do need individual PDF for each string found. However, if you could do one PDF per document, then it would be mor efficient to store up the relevant pages, and then at the end print those.

Gerry
 
Thanks all for your replies.

Macropod,
Based on what you said about repagination, I could take a copy of the word document, print out the pdf and then delete the section that I printed. If it re-paginated from the top of the document, then it wouldn't get progressively slower. I'll try this.

fumei,

I'm using the excel filesearch function to find all the word documents in a given directory, then I'm opening up each Word document and searching through it using the Word find function. I search through the Word doc and locate my search string and then print out the page/section that contains the search string. In my initial explanation I tried to simplify what I was doing a little, just in order to be clear. The Word document contains hundreds of mail merge letters. They don't always occupy a single page, that's why I print the active section. That part of the code seems to be working well. I moved the selection to the end of the document and then back to beginning just to force Word to read through the whole document it hopes of getting better performance.... didn't work.
I move the selection to the end of a line to capture a value in the variable cp. I use that variable cp when I rename the pdf file that the code creates.
 
I moved the selection to the end of the document and then back to beginning just to force Word to read through the whole document it hopes of getting better performance....
Well, that is true, Word WILL read through the whole document. This does not make better performance. It will do the opposite! As macropod mentions, this will cause Word to do a repagination. Try taking that out.

I still do not understand the instance of Excel. You say you are using Excel to
the excel filesearch function to find all the word documents in a given directory
Again...why? You can do this with Word. You do not need to use Excel to find the files. So...why are you using Excel, when you are in Word?
I move the selection to the end of a line to capture a value in the variable cp. I use that variable cp when I rename the pdf file that the code creates.
I don't see this.

You execute a Selection Find search.
You collapse the selection.
You move the Selection End one line.
You move the Selection End down one line.

Then you grab a number - the Selection ActiveEndSectionNumber.

I fail to see how this functions efficiently. Will the ActiveEndSectionNumber change will the extending of the Selection? Even if it does, you have no checking to see if it does. If it does NOT, then the the ActiveEndSectionNumber will be the same - in which case the movement (extention) of the Selection does not really do anything helpful. I can see that the cp variable pickes up a number, but I can't see why the Selection instruction makes any difference.

Gerry
 
macropod,

The approach I proposed didn't work. It's still bogging down.

Hilbertl
 
It looks like the problem had to do with the cache size allocated for word in the registry. I implemented the changes suggested in thread68-408109 and performance, though not super fast, shows definite improvment.

hilbertl
 
Alas, this still didn't work. In a 2083 page document, it still goes from 20 pdfs per minute to 1 every 5 minutes until Word crashes.
 
1. Just for my curiousity, could you answer the question as to why you are using the Excel file search?

2. Did you try removing the end of doc/start of doc movement?



Gerry
 
I have a process now that is working. I open the document and then process the first 15 sections, converting each section to it's own pdf. Then I close the document, reopen it, move to the last section I processed + 1, then do another 15. In this way I work my way through the entire document, which can have in excess of 4000 pages. Before I used this approach, Word would process the first 15 or so sections very rapidly, but after that it processed each section more and more slowly until the application crashed. Closing and reopening the document seems to reset something in Word and allows me to take advantage of the top end performance without getting bogged down. If the section I want to start process is in the middle of a document with 4000+ pages, I can move to that section, process it and continue to process subsequent sections with the same performance I got starting at the beginning of the document as long as I don't go beyond 15 sections or so. I had to pray over this one. Thanks goodness it works.
 
fumei,

I didn't realize that the Word application had a filesearch method, I just knew about Excel. I use the filesearch method to identify files that follow a certain naming convention in a particular directory-- to find all Word files that begin with "Welcome" that are in the My Documents folder, for example. Though I could have used Word's filesearch and avoided the overhead of referencing an extra application, I didn't think this was the core problem. The performance breakdown I encountered occurred as I moved progressively through the document from one section to another. It was quite dramatic, from a high of 9 pdfs per minute to a low of 1 every ten minutes followed by an application crash.

As to point two, I tried taking this line out but it didn't make a difference in performance.

hilbertl
 
I am wondering, since closing and reopening the document seems to work, if Word makes temp files in the PDF creation process. If so - and I have no idea if it does - that could certainly clog things up.

Gerry
My paintings and sculpture
 
Hay All,
Why not build an array of pages to be processed. Then you can use the PrintOut method with the option Pages to print a pdf. (bring the print out of the loop) If you really need a pdf per item I think you should copy these sections to new temp files and print those. That should boost performance.
Hope it helps...
 
This has sort of been suggested.
I am assuming that you really do need individual PDF for each string found. However, if you could do one PDF per document, then it would be mor efficient to store up the relevant pages, and then at the end print those.
Essentially I was thinking along the same lines - building an array of the required pages, then printing them.

Using temp files would possibly help. That is hard to say. Word can be funny that way.

Gerry
My paintings and sculpture
 
A new development...
I am trying to improve on my workaround and, after taking a second pass at this problem,
I've noticed that the procedure only bogs down
on certain Word documents. For example, I have a Word document that has 8000+ pages with 2 pages in a sections and each section is printed on its own pdf with no degradation. However, I have another Word document that has 2300 pages with 2 pages per section and in that document I see the performance degredation. I am trying to determine what the difference between the two documents is.

hilbertl
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top