Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

import word text into access97 1

Status
Not open for further replies.

Zorro1265

Technical User
Nov 14, 2000
181
US
I have a bunch of msword doc files I would like to read the text from them into an access memo field. I would also like to use the name of the document as a second field in the same table. For example, 1234.doc would read the text into my memo field hopefully keeping the formatting (dont need lines and font type just general format) and the second field would be populated with the number 1234. I would like to be able to point to a folder full of files and let the program run through the whole folder. Is this possible?
 
If by "just general format" you mean keeping just the paragraph marks (Carriage Return/Line Feed characters) and page breaks (Form Feed characters), the following will do it fairly well.

Note: Some Word features, such as line numbers and bullets, won't show up in the saved text. If you need such formatting, you may have to modify this procedure to save the file as plain text, then read the content of the plain text file.
Code:
Public Sub LoadAllDocuments(Folder As String)
    Dim strDocFile As String
    Dim strDocName As String
    Dim strDocText As String
    Dim wrd As Word.Application
    Dim wdoc As Word.Document
    Dim db As DAO.Database
    Dim rst As DAO.Recordset
    
    DoCmd.Hourglass True
    SysCmd acSysCmdSetStatus, "Loading Word documents"
    Set wrd = New Word.Application
    Set db = CurrentDb()
    Set rst = db.OpenRecordset("Documents")
    strDocFile = Dir$(Folder & "\*.doc")
    Do While Len(strDocFile) <> 0
        strDocName = Left$(strDocFile, InStrRev(strDocFile, &quot;.&quot;) - 1)
        Set wdoc = wrd.Documents.Open(FileName:=Folder & &quot;\&quot; & strDocFile, _
            ReadOnly:=True, AddToRecentFiles:=False, _
            Format:=wdOpenFormatAuto, Visible:=False)
        strDocText = wdoc.Content.FormattedText
        rst.AddNew
        rst.Fields(&quot;DocName&quot;) = strDocName
        rst.Fields(&quot;DocText&quot;) = strDocText
        rst.Update
        strDocFile = Dir$()
    Loop
    wrd.Quit
    Set wrd = Nothing
ErrorExit:
    SysCmd acSysCmdClearStatus
    DoCmd.Hourglass False
    Exit Sub
ErrorHandler:
    MsgBox &quot;Error &quot; & Err.Number & vbCrLf & Err.Description, vbExclamation
    Resume ErrorExit
End Sub
Rick Sprague
 
Wow! I wont even pretend I know what that all means. Just so I dont screw it up. I have a table with this code what should I name the memo field and the field for the doc name? Lastly what do I run this code from? Just make a button with this as the click event? I really appreciate your input on this!!
 
When I try to do anything with this I get a message &quot;sub or function not defined&quot; and the code that is highlighted is &quot;InStrRev&quot;. Thanks for your help Rick.
 
The code uses the table name &quot;Documents&quot; and field names &quot;DocName&quot; and &quot;DocText&quot;. Each of these occurs once in the code. Feel free to change them if you like.

You won't quite be able to run this from a button directly, because you need to pass it the path name of the folder to be scanned. (Since you've posted this in a VBA forum, I assumed you were familiar with writing and using VBA code.) I suggest you create a form with a text box into which you can key a folder path, and add a command button with a click event procedure that executes this:
Call LoadAllDocuments(textboxname.Value)

The InStrRev function became available in VB 6. I guess you're not at that level. Add this to your code module to make it work:
Code:
Public Function InStrRev(StringCheck As String, StringMatch As String, _
                         Optional Start As Long = -1, _
                         Optional Compare As Integer = vbBinaryCompare) _
                         As Long
' Purpose: Finds the last occurrence of a substring within a string
' Accepts: 1. String to be searched
'          2. Substring whose occurrence is to be found
'          3. Optional rightmost position to be searched (default is
'             -1, meaning to search the entire string)
'          4. Optional text comparison method (default: binary compare)
' Returns: Position of the substring within the string (1=start of string)
' Notes:   Returns 0 if the string to be search is zero-length.
'          Returns Start if the string to be found is zero-length.
'          Returns 0 if Start is past the end of the search string.
'          Returns 0 if substring is not found.
    Dim i As Long, j As Long, k As Long, s As String

    If (Start < -1) Or (Start = 0) Then Error 5
    j = Len(StringCheck)
    k = Len(StringMatch)
    i = Start
    If i = -1 Then i = j
    If j = 0 Then InStrRev = 0: Exit Function
    If k = 0 Then InStrRev = i: Exit Function
    If i > j Then InStrRev = 0: Exit Function
    s = Left$(StringCheck, i)
    i = 0
    Do
        j = InStr(i + 1, s, StringMatch, Compare)
        If j = 0 Then Exit Do
        i = j
    Loop
    InStrRev = i
End Function
Rick Sprague
 
Thanks Rick!

I am 2/3 of the way through my fixt VBA book, and I knew that VBA was the way to go just not how to do it. Sorry if I should have posted this somewhere else. Keep your fingers crossed hopefully I wont have to pester you anymore.



 
I have everything in place but I get an error named argument not found the code that is highlighted is &quot;Visible:=&quot;. If I remove that part of the code it does input my doc file. The memo field shows the line feeds and character returns but doesnt actually execute them. Did I do something wrong? Its almost there and this is going to be really great, Im sorry to be a pain.
 
I'm guessing you're using a Word version prior to Word 2000, and that's why the Visible named argument is unknown. It doesn't matter--if leaving it out works for you, then leave it out.

I don't know what characters are showing up in the memo field in place of the CR and LF. I'm pretty sure that if it were actually CRLF, you'd get line breaks within the field. So it must be that Word uses some other characters internally. It wouldn't be hard to substitute CRLFs, if we can figure out what's there in their place.

See if you can copy what should be a CRLF to the clipboard. Then open the Immediate window (Ctrl-G, called the Debug window in A97) and type the following but don't enter it:
?Asc(&quot;&quot;)
Put the cursor between the quotes and then paste from the clipboard, then press Enter. Let me know the number that's printed. If there is more than one character in the string, delete the first character and press Enter again. Let me know each number that gets printed. If this were really a CRLF, we'd expect to see first 13, then 10 printed. Rick Sprague
 
When I do this I get the number 32 printed, even when I delete instances of more than one character it still says 32. You are correct my company only allows Office97 currently so thats what all this is in.

Thanks
 
Sorry I've been too busy to get back to you sooner.

32 is a space character. What I intended was for you to look at the document data and find where a line break should occur, then copy/paste that into the Asc(&quot;&quot;) function. I was expecting that you'd see something strange there; in my testing, I saw black, thin rectangles.

Is that what you did? Does it, in fact, appear to have spaces where a line break should be? If so, I guess that could be a difference between Word 97 and Word 2000. Unfortunately, I don't know how you could fix this, except to resort to making Word save the file as text and then import the text into your database field. It would be kind of kludgy, and I'd rather nto do it, but it may be the only way. Rick Sprague
 
In the imported field I got the rectangles and when I pasted one of these into the debug it gave me 32. If I could install word2000 on a system and use that to do the convert do you think that would make it work? I may be able to copy all the data to a cd and take that to a machine that has office 2000 on it.
 
I guess I wasn't paying close attention when I tested this with Word 2000. There's something odd here that I don't understand. Like you, I see the thin black rectangles. When I cut and paste these into the Debug Window, they perform a carriage return/line feed function. However, they don't perform that function in the table datasheet, nor in a text box on the form.

I am testing with Word 2000, so no, it won't work that way.

The only thing I can think to do is to use the Word object model to save each document as a temporary text file, and then load the text file into your table field with Open/Line Input/Close statements. That's going to take a bit more work than I have time for at the moment.

Let me know what you think. I'll check back tomorrow evening. Rick Sprague
 
With this project I am relying on you, I will be more than happy to try whatever you think I should do. Is the process you are talking about a manual operation on each file? I have lots and lots of them :(
 
Good news! I figured out that Word was only saving a CR character for each paragraph mark. All we have to do is insert an LF character after each CR, and we're good. It took only a small modification to my original routine:
Code:
Public Sub LoadAllDocuments(Folder As String)
    Dim strDocFile As String
    Dim strDocName As String
    Dim strDocText As String
    Dim wrd As Word.Application
    Dim wdoc As Word.Document
    Dim db As DAO.Database
    Dim rst As DAO.Recordset
    Dim i As Integer
    
    DoCmd.Hourglass True
    SysCmd acSysCmdSetStatus, &quot;Loading Word documents&quot;
    Set wrd = New Word.Application
    Set db = CurrentDb()
    Set rst = db.OpenRecordset(&quot;Documents&quot;)
    strDocFile = Dir$(Folder & &quot;\*.doc&quot;)
    Do While Len(strDocFile) <> 0
        strDocName = Left$(strDocFile, InStrRev(strDocFile, &quot;.&quot;) - 1)
        Set wdoc = wrd.Documents.Open(FileName:=Folder & &quot;\&quot; & strDocFile, _
            ReadOnly:=True, AddToRecentFiles:=False, _
            Format:=wdOpenFormatAuto, Visible:=False)
        strDocText = wdoc.Content.FormattedText
        rst.AddNew
        rst.Fields(&quot;DocName&quot;) = strDocName
        i = 0
        Do
            i = InStr(i + 1, strDocText, vbCr)
            If i = 0 Then Exit Do
            strDocText = Left$(strDocText, i) & vbLf & Mid$(strDocText, i + 1)
        Loop
        rst.Fields(&quot;DocText&quot;) = strDocText
        rst.Update
        strDocFile = Dir$()
    Loop
    wrd.Quit
    Set wrd = Nothing
ErrorExit:
    SysCmd acSysCmdClearStatus
    DoCmd.Hourglass False
    Exit Sub
ErrorHandler:
    MsgBox &quot;Error &quot; & Err.Number & vbCrLf & Err.Description, vbExclamation
    Resume ErrorExit
End Sub
I hope that works for you! Let me know. Rick Sprague
 
AWESOME!!!!!!!!!!!

It worked, I have a couple of stray little boxes but nothing we cant live with!! You are great, thank you so much for all this help!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top