Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

The default save as encoding changed 1

Status
Not open for further replies.

greathope123

Programmer
Nov 1, 2011
84
GB
What I did:

Using Notepad, I opened a text (.log) file, deleted about half part, clicked File -> Save As, I noticed the default Encoding was ANSI. I then fill in a new file name and clicked Save.

After that, I opened the new file and clicked File -> Save As, the default Encoding became UTF-8.

Can you explain why?

Thanks a lot in advance.
 
Without going into the technical details it is because Notepad has to guess the actual format of the text when it opens the file. In most Unicode cases this guess is pretty easy to get right, because there is something called a BOM (byte order mark) which tells you exactly what the encoding is. However the BOM isn't always included in the file (and in the case of UTF-8 is almost always left out). In the situation where there is no BOM Notepad applies a bunch a heuristic analysis to figure out what format the file is (by examing the first 256 bytes).

And sometimes an ANSI file will be seen as a UTF-8 file (and vice versa). Which won't matter if your file is actually ASCII, but might do if it is not.
 
Thank you for your response.

There are some characters in other language can not be displayed properly in the text file.
After I deleted about half of the lines in the text file and save, reopen, those characters will be displayed correctly.
I want to know the reason so that I can find a simple way to make those characters displayed correctly.

I have just tested that Notepad still does not realize what I need after I inserted a few lines of those characters at the start of the file.
 
Unfortunately you can't change Notepad's detection algorithm, and you can't (easily) change the default format that Notepad uses when creating a new document (it will always be ANSI)

Notepad has limitations in relation to extended and Unicode characters setrs that it tries hard to work around using detection algoritms written back in the days of NT3.5 and hardly changed since then.

You might want to consider a text editor such as Notepad+ instead.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top