Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

char - '0' == int?? 2

Status
Not open for further replies.

AtomicChip

Programmer
May 15, 2001
622
CA
Hey all,

Just looking for an explanation of how and why this works:

Code:
const char* foo = "123";
int n = foo[ 0 ] - '0'; // resulting in the int value 1

Now, I understand that C++ will do auto type-casting 'n stuff like in the following example:

Code:
const int n = 3;
const float m = 1.1;
const int l = n*m; // will result in 3

...but why does subtracting a '0' ascii value from an ascii char result in a proper int value??

Thanks in advance..

-----------------------------------------------
"The night sky over the planet Krikkit is the least interesting sight in the entire universe."
-Hitch Hiker's Guide To The Galaxy
 
If you look at an ASCII table, you'll see that '0' = 0x30 (or 48 in decimal). Thankfully when they invented the ASCII table, they grouped numbers and letters together in the proper order; so '1' is right after '0' and therefore has a value of 0x31 (or 49 in decimal).

'1' - '0' is the same as 49 - 48, which equals 1.
 
Code:
const char* foo = "123";
int n = foo[ 0 ] - '0'; // resulting in the int value 1
In char arithmetics operands are converted into int (it's so called integral promotion conversion). You get the first element of "123" ('1') via foo pointer - see cpjust's explanation.
After initializer evaluation its value (int after promotion int-int) initializes int n.
Be careful: in C++ basic execution character set may be not only ASCII.
Code:
int n = '0';
In that case char value '0' depends on basic character set of the C++ implementation. Moreover, char value may be signed or unsigned (it's implementation specific).
 
Thanks guys.. Makes sense.

-----------------------------------------------
"The night sky over the planet Krikkit is the least interesting sight in the entire universe."
-Hitch Hiker's Guide To The Galaxy
 
In that case char value '0' depends on basic character set of the C++ implementation

can you name any other than ASCII "character set of the C++ implementation" that has '0' as another numerical value? yeez.. some people really see simple things in a hard way.

and for the answer of the first post - there is no such thing as a character type for a computer. Everything that he understands is just numbers. And characters is just a way we treat one number or another. What we see as a text 'AAA', for computer is just 3 bytes of values 65, 65, 65 (ASCII encoding, no terminating byte). When you start looking at characters this way, you'll start understanding the way some things really work.. (like tolower() being just a range check and a simple ch+32 operation).

------------------
When you do it, do it right.
 
Not all of us are gifted with detailed knowledge of what character sets will be implemented next year. ArkM does have a point. It never does any harm to understand the assumptions on which your work is based.
 
EBCDIC on IBM mainframes, for example.
 
like tolower() being just a range check and a simple ch+32 operation
This is true for ASCII, but not for things like EBCIDIC. It would be even worse if you were using something like UTF16...

So to be clear, when you want to convert characters to upper or lower case, never try to optimize your code by adding or subtracting 32 from the character value -- always use the library functions like toupper(). Otherwise, if you decide to port your program to another platform, you might be in big trouble.
 
Common, dont be silly guys. EBCIDIC is a mainframe encoding - things where most likely 0.01% of worlds programmers care about. Not sure what you guys use to code for that, but that's definately not C++ we talk about here.
UTF16 is a Unicode encoding which for programming languages will always be ASCII compliant, which means that the basic set (first 128) of characters are ALWAYS in the same position as ASCII.
And I am not telling people to add 32 to make their letters go lower case, because some <noob> reading the code will spend 2 weeks to understand what that +32 hardcoded thing means. What I am telling, is that people should try looking deeper into how programs work and not come later talking about "integral promotion" stuff...

------------------
When you do it, do it right.
 
> (like tolower() being just a range check and a simple ch+32 operation).
What about locales?
As well as being more readable, it will also do the right thing (to paraphrase your sig) when you're in a foreign language.

> Be careful: in C++ basic execution character set may be not only ASCII.
True, but this quote from C99 is pertinent (C++ standards will be equivalent).
draft C99 said:
In both the source and execution basic character sets, the
value of each character after 0 in the above list of
decimal digits shall be one greater than the value of the previous.
So you can always assume that no matter what the underlying character set is, at least '0' to '9' will be consecutive to permit the [tt]ch - '0'[/tt] type of calculation. For everything else though, the appropriate API should be used.

--
 
And finally to the initial question:

but why does subtracting a '0' ascii value from an ascii char result in a proper int value??

Because char in C/C++ is simply 8-bit wide int.

 
But see cpjust's post, 2nd in this thread. The important thing isn't the 8-bitness of char, it's that char is a number, and the ascii character set happens to contain 0-9 listed successively in the correct order. It would work equally well on a hypothetical 19bit processor if the designers have the sense to keep the numbers in the right order.
 
I thought not 8-bitness but int-ness of the char - char is one of integer data types like short and long.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top