Understanding Character Data Types: ASCII Encoding, Size, and Signed vs Unsigned

Overview of Character Data Type

Characters in computers are represented using bits, typically 8 bits (1 byte) per character. The primary encoding scheme discussed is ASCII, which uses 7 bits to represent 128 characters (0-127). Extended ASCII utilizes all 8 bits to represent 256 characters (0-255), including additional symbols and characters needed for non-English languages.

Character Declaration and Usage

Characters are stored in variables declared with the char data type.
The value must be enclosed in single quotes (e.g., 'A').
Only one character can be stored per variable due to the 1-byte size.
Integer values can also be assigned to char variables; when printed with %c format specifier, the integer is interpreted as its ASCII character equivalent (e.g., 65 corresponds to 'A'). See also Understanding Data Representation in C Programming for more on how data is represented internally.

Size and Range of Character Variables

Size: 1 byte (8 bits)
Unsigned char range: 0 to 255
Signed char range: -128 to +127 (using 2's complement representation)
ASCII traditionally uses 7 bits; Extended ASCII makes use of the full 8 bits. For an in-depth explanation of integer size and range concepts that closely relate to characters, refer to Understanding Integer Data Type: Size, Range, and Number Systems Explained.

Signed vs Unsigned Characters Explained

Signed characters use one bit as a sign bit, allowing negative values, which correspond to values in the extended ASCII range.
There is a binary equivalence between certain signed negative values and unsigned positive values (e.g., signed -128 equals unsigned 128).
Negative values in characters do not provide extra functionality but reflect binary representation constraints.
The most significant bit's place value is negative in signed representation. Further insights on signed and unsigned types and the overflow issues can be explored in Understanding Integer Range Overflow in Signed and Unsigned Types.

Two's Complement Representation

Negative values are represented in two's complement form.
Examples:
- -128 is represented by setting the most significant bit (MSB) to 1 and all others to 0.
- Binary representations of signed negative values correspond to specific unsigned positive integers.

Practical Code Insights

Using %c with values assigned to char variables prints the corresponding character.
Signed and unsigned chars can print the same characters for different integer values due to binary equivalence.
Understanding this helps avoid confusion with character and integer representations.

Summary

Character size is fixed at 1 byte.
ASCII uses 7 bits, Extended ASCII uses 8 bits.
Signed char ranges from -128 to 127, unsigned char from 0 to 255.
Negative character values correspond to positive values in binary representation; they don't add extra power.
Proper use of single quotes and format specifiers is essential when working with character variables.

This foundational knowledge enables programmers to manage character data accurately and understand underlying encoding mechanisms in software development.

Today we will start our discussion on second fundamental data type called character.

Here is the outline of this lecture. Today, we will study a brief overview on character data type.

Size of characters. Range of characters. And we will also talk about

the difference between signed and unsigned characters. Lets have a brief overview.

If you remember from the first lesson itself, I told you that how can we represent characters

in computer. Recall this example of HELLO! And I also told you that how

each character is represented with 8 bits of information. Computer is capable to understand

only 0 and 1. Therefore, we need to represent characters

in 0 and 1 form only. But we don't need to bother about it. Because internally,

all are represented in bits form only. To encode characters, there are several encoding schemes

available. But one of the most common encoding scheme is ASCII encoding scheme.

This is an ASCII table that represents the ASCII encoding scheme. And here you can see

ther are some characters, which are non-printable characters and some characters

are printable characters. The non-printable ones are the control characters

and the printable ones are the characters you can print on the screen. ASCII uses 7 bits to

encode characters therefore, we are available with 128 character in total.

As you can see here, this is from 0 to 127. That is, there are total 128 characters

in ASCII table. But minimum to minimum, we have at least 1 byte

and we know, 1 byte is equal to 8 bits and ASCII require just 7 bits to represent characters

therefore the most significant bit that is eighth bit is set to 0.

let's see how we define and declare a character variable. Here you can see,

I have declared a variable of character data type and assigned it a character.

A variable could be of any name according to your choice but if it is of character data type

it is capable of holding one character at a time. Note down these single quotes over here.

Now this is important. Remember to put single quotes and not double quotes.

If you do so, you might get some unexpected results. Character variable is able to hold only one character at a time.

This is very important. If you want to provide a whole string to it, It wont be able to hold it

because its size is equal to 1 byte. And it wont be able to hold more than 1 character at a time.

Now this is also not necessary to provide only characters to these variables. You can also assign

integer values to them. For example- In this variable name, I have provided a value 65.

Now, this value acts like a character in itself when we are going to print it. When we try to print the contents

of this variable we get a character instead of an integer.

And that totally depends on the format specifier you are using. Here is the code:

Here we can see, I have provided %c as a format specifier.

If you put %d instead of %c, it will print the decimal value. But in this case it will print

a character. Lets see what will be the associated character value for this

particular decimal value. As we know, the associated character for this

decimal value 65 is A. That is why A is printed. If you see the ASCII table,

you can see that A is associated with the decimal value 65.

If you provide 65, it is similar to provide a character 'A' Now let us understand,

why this happened. Because after all everything is in the form of bits only.

Therefore, either you will write a character 'A' or value 65

both are one and the same thing. Because their binary representations are same.

The only difference between a character and an integer is that

character is capable of holding only 1 byte of information on the other hand,

integer is capable of holding either 2 bytes or 4 bytes of information. Both can store an integer.

Both can print integers. But it is better to use them as what they are meant for.

Let's see the size and range of a character. Size of a char variable

or a character is 1 byte long. And range would be

from 0 to 255 in case of unsigned character. And -128 to +127 in case of signed characters.

This representation is coming from 2s complement representation unsigned range is because

we have 8 bits of information available with us. Therefore, the maximum value that we would be able to represent

will be 255. In the traditional ASCII character encoding, we have only 7 bits to

encode the characters. and minimum to minimum we have to have 8 bits.

Therefore, 8th bit is of total waste. there is one more encoding scheme called Extended ASCII encoding scheme

to utilize the 8th bit. or you can say, MSB bit. Therefore, the range is utilized

properly in this encoding scheme. As you can see over here, the range is from 0 to 255

instead of 0 to 127. Note- Apart from the English characters for the non-English speakers,

we have to represent other language characters as well. like for Russian, German, Chinese etc.

For them, other schemes are available. But our concern, is traditional ASCII

character encoding scheme which covers most of the special symbols as well as English characters and digits

that we use in our day to day life. And most of the times, that is sufficient. Therefore, we won't have to bother about

the other schemes much. Let's move to the next topic Difference between

Signed and Unsigned character. I told you the signed and unsigned range for character.

But this is not an easy to digest fact that we have both signed and unsigned range of characters.

Unsigned range is OK, but why signed range? In case of integers,

signed range makes sense. Because in reality, we are not only representing unsigned integers,

but signed integers as well. But, what are negative values doing in characters?

Are they buying some additional powers to us? Even though we won't require negative values at all, but as we know

internally everything is in the form of bits. So we can't resist ourselves

to provide negative values to character variables. But the question is, what happens

when we provide negative values to it. To understand this concept, let's consider

the Extended ASCII table once again. 0 to 127 is same for

both signed and unsigned range. Difference comes in -128 to -1 in signed range

and 128 to 255 in unsigned range. Let's write down the 2's complement representation of

-128 in binary. There is one important point to note. Here we can see, the place value

is -2 raised to the power 7. And, this is not the usual case when we are representing a positive value.

Right? This is -2 raised to the power 7. If we want to represent the negative 128, we have to set this bit

to 1 and reset all the other bits. Because this is -2 raised to the power 7 which is -128.

Therefore, by setting this particular bit we will be able to represent -128. This is the 2s compliment representation.

Always remember that the most significant bits place value is always negative.

In the case of positive numbers, this is quite easy because here the place value will be positive,

if we set this bit to 1 and reset all the other bits, we will be able to represent +128.

Let's try to represent -127. By setting this bit to 1, and this bit to 1

we will be able to represent -127. On the other hand, if I want to represent the value +129

this is also very easy to represent. 2 raised to the power 7, which is equal to 128 and this is 2 raised to the power 0 , which is equal to 1.

Adding these two values together, we get our answer +129. As you can observe,

that these two values are equal. As in the previous case, these two values are equal.

-128 and +128, both have equal binary representation. Similarly -127 and +129 have similar binary representations.

If I would like to represent -126 this would be the binary representation, and for +130

this would be the binary representation. Both are equal. This is -128

and rest of the numbers, if I add them together it would be +127. -128 +127 is equal to -1.

Therefore we need to set all values to 1. If you want to represent +255 then we have to set all

these values to 1 similarly, But the only difference as you can see, is of this place value.

That is why we are getting two different values for the same binary representations.

OK, Lets implement the code to understand. If we try to print -1

and if we try to print +255 both are one and the same thing. Let's implement the code.

Let's see what character is printed for this particular value. This would be the character.

If you see the Extended ASCII table. then the associated character for this particular value would be this.

Let's change the code a little bit. -128 and +128 Both are one and the same thing.

Therefore, they must have to print the same character. Let see whether they do or not. Yes, they are printing the same character.

Therefore, it is verified that +128 and -128 both are same.

Let's see what happens when we change this to +129 and execute it.

This would be the character. And let's see whether -127 and 129 are same or not.

Yes, they are same. Therefore, it is verified that all the things that we had studied

up to now is correct. So the final conclusion is negative values wont buy you any

additional power in case of character variables. Always remember that each negative value is equivalent to

some positive value in Extended ASCII character set. Because after all

every thing is binary only. And one more thing, the idea of range exceeding

conditions for characters is similar to integers that we had studied in previous lecture.

Therefore, it is not worth mentioning each and every point once again. If you want, you can refer

the previous lecture and relate the concepts accordingly. Let's have a summary of whatever

we had studied up till now. Size of character is equal to 1 byte, Signed character range is from -128 to +127

Unsigned character range is from 0 to 255 Negative values won't buy you any additional powers.

In traditional ASCII table, each character requires 7 bits. In Extended ASCII table,

each character utilize all 8 bits. OK friends, this is it for now.

See you in the next lecture.

Heads up!

This summary and transcript were automatically generated using AI with the Free YouTube Transcript Summary Tool by LunaNotes.

Generate a summary for free

Related Summaries

Understanding Integer Data Type: Size, Range, and Number Systems Explained

This summary explores the integer data type, its memory allocation, and how computers represent integer ranges using decimal and binary number systems. It also covers calculating integer range for different byte sizes, including the use of two's complement for signed integers.

Understanding Data Representation in C Programming

Explore how data representation works in computers, focusing on integers and binary systems in C programming.

Comprehensive Guide to Integer Data Types and Modifiers in C Programming

This article explores integer data type modifiers in C, including short, long, signed, and unsigned. Learn about memory size differences, value ranges, and how to use symbolic constants and printf specifiers to work effectively with these data types.

Understanding Advanced printf Usage and Integer Behaviors in C Programming

This comprehensive summary explores key concepts in C programming, including nested printf functions, string width specifiers, character variable overflow, integer declarations, and nuances of signed versus unsigned integer arithmetic. Learn how printf returns values, how formatting affects output, and how integer operations behave in different contexts.