Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Test for correct file format and replace lost extension 1

Status
Not open for further replies.

gedwarren

IS-IT--Management
Jan 27, 2012
10
GB
I have written a VBA script that finds documents missing their file extensions (sloppy users) over a shared drive tree and writes the filename and path into a table.

I now have a list of some 900 documents out of a total 120,000 on the shared drive. Most of those will be Word or pdf format. I want to test each file to find its format and then add the appropriate extension. How can I test for the file format? Is there an alternative to looping through the files and trying to background open each file until it works? I suspect the machine running the code would run out of memory before it got to the end.
 
I use either of the two methods described here


You can probably do your particular task real easy just using the return value of Shell:

Function FixMyCrummyFileName(byVal MyCrummyFileName as string) as string

On Error Goto Handler


myNiceFileName=myCrummyFileName+".pdf"

If Shell(myNiceFileName, vbNormalFocus) <> 0 Then
' Program executed sucessfully
FixMyCrummyFileName="PDF"
Exit Function
else
myNiceFileName=myCrummyFileName+".doc"

If Shell(myNiceFileName, vbNormalFocus) <> 0 Then
' Program executed sucessfully

FixMyCrummyFileName="DOC"
Exit Function

Exit Function
else
'ad infinitum until you run out of file types you are testing
endif


End If

'set return value


End Funtion



The problem is, you have actually openned the file on the client when you've performed the test, so now you need to kill the app. You are going to need to use some code that looks like this:


You would need to use the return value
of FixMyCrummyFileName to determine the executable app you need to close and then pass the executable to functions you develop from code on the web page from the link above.

Here is the problem you are going to run into: Your VBA code is running on one thread at one speed, and your file opening and process killing is running on threads independent from your VBA application, which means you can run into timing issues where things just start bumping into each other, so with 900 files you might have to experiment with various methods to get your code to sleep or to wait for some event to complete.
 
Thanks for your reply. This is what I've got so far just to test the function returns the right extension. I created a .doc file and saved it as "testDOC". When I run FixExts() I get an empty string returned - any ideas what I'm doing wrong?


Private Sub FixExts()

Dim MyCrummyFileName As String
Dim FileExt As String

MyCrummyFileName = "C:\Users\User\Documents\ReplaceExt\testDOC"

FileExt = FixMyCrummyFileName(MyCrummyFileName)

Debug.Print "It's a " & FileExt & " file."

End Sub


Function FixMyCrummyFileName(ByVal MyCrummyFileName As String) As String

On Error GoTo Handler

Dim MyNiceFileName As String

MyNiceFileName = MyCrummyFileName & ".doc"

If Shell(MyNiceFileName, vbNormalFocus) <> 0 Then
' Program executed sucessfully
FixMyCrummyFileName = "DOC"
Exit Function

Else: MyNiceFileName = MyCrummyFileName & ".pdf"
If Shell(MyNiceFileName, vbNormalFocus) <> 0 Then
' Program executed sucessfully
FixMyCrummyFileName = "PDF"
Exit Function
Else
'ad infinitum until you run out of file types you are testing
End If


End If

'set return value

Handler:

End Function
 
Your asking it to execute on a file, Shell is used to run .exe files. Also you changed the variable to add the extension not the actual file see code below

Code:
Public Function FixMyCrummyFileName(ByVal MyCrummyFileName As String) As String

    On Error GoTo Handler
    Dim strFileName As String
    Dim MyNiceFileName As String

    strFileName = MyCrummyFileName

    Name strFileName As strFileName & ".doc"
    strFileName = strFileName & ".doc"

    If Shell("C:\Program Files\Microsoft Office\OFFICE11\Winword.exe " & strFileName, vbNormalFocus) <> 0 Then
        ' Program executed sucessfully
        FixMyCrummyFileName = "DOC"
        Exit Function
    Else
        strFileName = MyCrummyFileName
        Name strFileName As strFileName & ".pdf"
        strFileName = strFileName & ".pdf"
        If Shell("C:\Program Files\Adobe\Reader 9.0\Reader\AcroRd32.exe " & strFileName, vbNormalFocus) <> 0 Then
            ' Program executed sucessfully
            FixMyCrummyFileName = "PDF"
            Exit Function
        Else
            'ad infinitum until you run out of file types you are testing
        End If
    End If
    'set return value

Handler:

End Function

HTH << MaZeWorX >> "I have not failed I have only found ten thousand ways that don't work" <<Edison>>
 
The code below now successfully opens the file in Word (if it's a .doc) and then closes Word and returns the file type. I've still got a bit of work ahead of me:

1. Do the same for PDF files
2. Rename the file with the correct extension
3. Set up a loop to work through a table of prath & file name
4. Handle timing issues, as vbajock mentioned above
5. Add error handling

When I get it fixed I'll post it here. Thanks a lot for your help.

*****************************

Option Compare Database
Option Explicit

Private Sub FixExts()

Dim MyCrummyFileName As String
Dim FileExt As String

MyCrummyFileName = "C:\Users\Ged\Documents\ReplaceExt\testDOC"

FileExt = FixMyCrummyFileName(MyCrummyFileName)

Debug.Print "It's a " & FileExt & " file."

End Sub

Public Function FixMyCrummyFileName(ByVal MyCrummyFileName As String) As String

On Error GoTo Handler
Dim strFileName As String
Dim MyNiceFileName As String

strFileName = MyCrummyFileName & ".doc"

If Shell("C:\Program Files\Microsoft Office\Office14\Winword.exe " & strFileName & " /q", vbNormalFocus) <> 0 Then
' Program executed sucessfully
FixMyCrummyFileName = "DOC"
CloseAPP_B ("winword.exe")

Exit Function
Else
strFileName = MyCrummyFileName & ".pdf"
If Shell("C:\Program Files\Adobe\Reader 9.0\Reader\AcroRd32.exe " & strFileName, vbNormalFocus) <> 0 Then
' Program executed sucessfully
FixMyCrummyFileName = "PDF"
Exit Function
Else
'ad infinitum until you run out of file types you are testing
End If
End If
'set return value

Handler:

End Function


Private Function CloseAPP_B(AppNameOfExe As String)
'No frills killer
Dim oProcList As Object
Dim oWMI As Object
Dim oProc As Object

' step 1: create WMI object instance:
Set oWMI = GetObject("winmgmts:")
If IsNull(oWMI) = False Then
' step 2: create object collection of Win32 processes:
Set oProcList = oWMI.InstancesOf("win32_process")
' step 3: iterate through the enumerated collection:
For Each oProc In oProcList
' option to close a process:
If UCase(oProc.Name) = UCase(AppNameOfExe) Then
oProc.Terminate (0)
End If 'IsNull(oWMI) = False
Next
End If
Handler:

End Function
 
You could try using the Magic Numbers rather than brute forcing them.

Here are a couple of functions that will quickly have a go at checking the file type.

Code:
Public Function IsWordDoc(sFileName As String)
    Dim iFileNumber As Integer
    Dim sBuffer As String
    Dim lCharNumber As Long
    Dim sCharacter As String * 1
    iFileNumber = FreeFile
    Open sFileName For Binary Access Read Shared As #iFileNumber
    sBuffer = Space$(515)
    Get #iFileNumber, , sBuffer
    Close #iFileNumber

    IsWordDoc = (Asc(Mid(sBuffer, 513, 1)) = 236 And _
                 Asc(Mid(sBuffer, 514, 1)) = 165 And _
                 Asc(Mid(sBuffer, 515, 1)) = 193)

End Function

Public Function IsPDF(sFileName As String)
    Dim iFileNumber As Integer
    Dim sBuffer As String
    Dim lCharNumber As Long
    Dim sCharacter As String * 1
    iFileNumber = FreeFile
    Open sFileName For Binary Access Read Shared As #iFileNumber
    sBuffer = Space$(4)
    Get #iFileNumber, , sBuffer
    Close #iFileNumber

    IsPDF = (sBuffer = "%PDF")
End Function

hth

Ben

----------------------------------------------
Ben O'Hara
David W. Fenton said:
We could be confused in exactly the same way, but confusion might be like Nulls, and not comparable.
 
Thanks Ben. I've not come across Magic Numbers before. It looks like it could be a more efficient method - I wasn't sure how the code I posted above would cope with large numbers of files. I'll give it a go this evening.

Cheers,
Ged.
 
It should work really quickly. At most, the function is reading 515 bytes into memory, then forgetting about the rest of the file.
Magic numbers aren't 100% accurate, but it's almost certainly good enough for what you're trying to do here. The worst that can happen is you get a phone call from a user saying a file won't open.

B.

----------------------------------------------
Ben O'Hara
David W. Fenton said:
We could be confused in exactly the same way, but confusion might be like Nulls, and not comparable.
 
This works a treat with a Word doc, thanks. I'll post the whole thing once it's done.

What tags do I use in this site to mark up text as code?

Cheers,
Ged.


Private Sub FixExts()

Dim sFileName As String
Dim sExt As String

sFileName = "H:\ReplaceLostExts\testDOC"

If IsWordDoc(sFileName) = True Then
sExt = "DOC"
End If

Debug.Print "It's a " & sExt & " file."

End Sub
 
Glad you got it working.
If you click the "Process TGML" link at the bottom of box you enter new messages in, it will tell you the supported syntax.

To mark a piece of text as code, surround it with [&#91;]code] and [&#91;]/code].

hth

Ben

----------------------------------------------
Ben O'Hara
David W. Fenton said:
We could be confused in exactly the same way, but confusion might be like Nulls, and not comparable.
 
I've discovered that I also need to fix files missing DOT, RTF, TXT, HTM, JPEG, JPG, DOCX, DOCM, XLS, XLSX, XLSM, PPT and PPTX extensions. I'm struggling to see how I can how I can create functions to test for those from the info provided at Is there some kind of rule for working it out? If there is, I don't see it.

Thanks for any help with this.

Ged.
 
So I figured out the function for JPEG. Still trying to puzzle out the others.

Code:
Public Function IsJPG(sName As String)
    Dim iFileNumber As Integer
    Dim sBuffer As String
    Dim lCharNumber As Long
    Dim sCharacter As String * 1
    iFileNumber = FreeFile
    Open sName For Binary Access Read Shared As #iFileNumber
    sBuffer = Space$(4)
    Get #iFileNumber, , sBuffer
    Close #iFileNumber

    IsJPG = (sBuffer = "ÿØÿà")
    Debug.Print sBuffer
End Function
 
Oh, and it seems JPEG and JPG are interchangable.
 
The one for RTF turned out to be simply:

Code:
Public Function IsRTF(sName As String)
    Dim iFileNumber As Integer
    Dim sBuffer As String
    Dim lCharNumber As Long
    Dim sCharacter As String * 1
    iFileNumber = FreeFile
    Open sName For Binary Access Read Shared As #iFileNumber
    sBuffer = Space$(6)
    Get #iFileNumber, , sBuffer
    Close #iFileNumber

    IsRTF = (sBuffer = "{\rtf1")
End Function
 
I'm still trying to figure out how to identify DOT, DOCX, XLS and XLSX files, but having no luck.

Can anyone help?

Cheers,
Ged.
 
AFAIK There is no binary difference between a DOT and a DOC file, the difference is just in the file extension, so I would put them all as docs.
DOCX,XLSX,etc all have the same signature as they are all essentially zip files, so you need to check the first 8 bytes are 50 4B 03 04 14 00 06 00, then to determine the actual type you'd have to unzip it and read the XML document inside.

Ben

----------------------------------------------
Ben O'Hara
David W. Fenton said:
We could be confused in exactly the same way, but confusion might be like Nulls, and not comparable.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top