Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to read a file with an unknown character set?

Status
Not open for further replies.

Fedo

Technical User
Feb 6, 2001
9
US
I have a problem with which I have already spent a lot of time and I cannot find a solution although I intuitively feel that there is a very easy solution for it. I am completely stucked.
Here is the problem:
I have a file about which I do not know anything (neither its character set) and I want to read the whole file into a memory buffer.
I have already tried fread, fgets (fgetws), getline, streams, etc. The problem is that all those functions and techniques stop reading of the file as soon as they find 00x0 which they consifer for EOF.
Does anybody have an idea what I am doing wrong?
 
Greetinx!

If you use Builder, you may to try following methods: FileOpen, FileRead. FileWrite, FileClose.
Happy programming!))
 
You may not be doing anything wrong. The file may be trashed and another program truncated the file without informing your file tables (FAT, FAT32, NTFS, whatever you were using.)(Improper shutdown?)
James P. Cottingham

I am the Unknown lead by the Unknowing.
I have done so much with so little
for so long that I am now qualified
to do anything with nothing.
 
I should maybe make it more clear. I am able to read whichever text file into the memory which uses 8-bit character set. However files with not 8-bit character sets as well as Microsoft Word, Excel files I cannot read (the reading is stopped as soon as 00x0 character is found which is probably considered as EOF).
Here are 2 different approaches I use:

1.> using streams
AnsiString MainForm::ReadFile(char *MyFile)
{
ifstream in(MyFile); // Open for reading
ostrstream out;
out << in.rdbuf(); // Copy file
return (AnsiString(out.str()));
}

2.> using fread
AnsiString MainForm::ReadFile(char *MyFile)
{
FILE *stream;
AnsiString Buf=&quot;&quot;;
if ((stream = fopen((MyFile, &quot;r&quot;)) == NULL)
return (&quot;&quot;);
else {
fseek(stream, 0, SEEK_END);
float FileSize = ftell(stream);
fseek(stream, 0, SEEK_SET);
//Allocate a buffer.
char *Buffer = new char[FileSize+1];
//Read from the file
unsigned int ItemsRead = fread(Buffer, 1, FileSize, stream);
Buffer[ItemsRead] = 0; //Close with a 0 char.
//Load the contents of the buffer to the result-string.
Buf = AnsiString(Buffer);
//Delete buffer and close the file.
delete[] Buffer;
fclose(stream);
return (Buf);
}
}

In my test examle I try to write the memory buffer into the file:
void __fastcall MainForm::BtnClick(TObject *Sender)
{
FILE *stream;
AnsiString Out;

if ((stream = fopen(&quot;Output.zzz&quot;, &quot;w+&quot;)) == NULL)
ShowMessage (&quot;Error opening the file&quot;);
else
{
Out = ReadFile(&quot;My.doc&quot;);
fprintf(stream, Out.c_str());
fclose(stream);
}
}

As I mentioned none of those approaches reads a not 8-bit character set document or Word document. I have even tried to read a file byte by byte (using fgetc).
I am really stucked. Can anybody help me?
 
You probably need to use Unicode or &quot;wide strings.&quot; See wchar_t in standard library. You may also want to look at mbtowc, wctomb, mbstowcs, and wcstombs. These convert multibyte charactesr to wide characters and back, and multibyte strings to wide strings and back.

James P. Cottingham

I am the Unknown lead by the Unknowing.
I have done so much with so little
for so long that I am now qualified
to do anything with nothing.
 
O.K.
I've finally got it. The mistake I did was not in reading a file, but in writing read memory buffer to another file. For writing I used function fprintf, which writes string (char *) to a file. The problem with this function is that writing stops as soon as 0x00 appears in the string (0x00 is considered as the end of the string). Therefore, you have to use either streams or function write, where it is possible to specify how many bytes from the buffer should be written to the file.
Here, I present my the most preferable solution to the problem. In this example I use streams and library strstream.

1.> using streams
void MainForm::ReadFile(char *MyFile, ostrstream *OutDoc)
{
// Open for reading
ifstream in(MyFile, ios_base::binary);
// Copy file into the String stream
*OutDoc << in.rdbuf();
}

In my test examle I try to write the memory buffer into the file:
void __fastcall MainForm::BtnClick(TObject *Sender)
{
ostrstream OutDoc;

// Open file
ofstream in(&quot;Output.zzz&quot;, ios_base::binary);
ReadFile(&quot;MyDoc&quot;, &OutDoc);
// Copy the String stream into the file
in << OutDoc.rdbuf();
}

Good luck.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top