Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

identifying file types

Status
Not open for further replies.

thedougster

Programmer
Jan 22, 2009
56
US
Within a C# program, is there any way to tell if a file is a text file or not? I mean a real way, not just basing your conclusion on a file name extension.
 
Hi,

Try this function...

Code:
static bool IsTextFile(string fileName)
  {
    byte[] file;
    using (System.IO.FileStream stream = new System.IO.FileStream(fileName, System.IO.FileMode.Open, System.IO.FileAccess.Read))
    {
      file = new byte[stream.Length];
      stream.Read(file, 0, file.Length);
    }

    if (file.Length > 3 && ((file[0] == 0x00 && file[1] == 0x00 && file[2] == 0xFE && file[2] == 0xFF /*UCS-4*/)))
      return true;
    else if (file.Length > 2 && ((file[0] == 0xEF && file[1] == 0xBB && file[2] == 0xBF /*UTF-8*/)))
      return true;
    else if (file.Length > 1 && ((file[0] == 0xFF && file[1] == 0xFE /*Unicode*/)))
      return true;
    else if (file.Length > 1 && (file[0] == 0xFE && file[1] == 0xFF /*Unicode Big Endian*/))
      return true;
    else
    {
      for (int i = 0; i < file.Length; i++)
        if (file[i] > 0x80)
          return false;

      return true;
    }
  }

Ryan
 
Thanks for your help, RyanEK. Just one question: the last 2 terms in your first "if" statement (for the UCS-4 text file format) are:

Code:
file[2] == 0xFE && file[2] == 0xFF

Did you mean that the latter term should be:

Code:
file[3] == 0xFF

That is, the index should be 3?

Thanks again.
 
Just a few comments on the posted code. Be aware that line 5, will actually read all the contents:
Code:
file = new byte[stream.Length];
stream.Read(file, 0, file.Length);
Rather than loading everything in the memory, you can just get the first 3 bytes or so, since that's all you need to test.

If the file is non-unicode, it "may be" a 7-bit ASCII text file. Typically, an ASCII text file contains only all printable characters (char 32-127) and newline chars (char 13 and/or 10).

my 2cents
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top