A character data type represents a single character. In addition to processing numeric values, you can process characters in Java. The character data type, char, is used to represent a single character. A character literal is enclosed in single quotation marks. Consider the following code:
char letter = 'A';
char numChar = '4';
The first statement assigns character A to the char variable letter. The second statement assigns digit character 4 to the char variable numChar.
A string literal must be enclosed in quotation marks (" "). A character literal is a single character enclosed in single quotation marks (' '). Therefore, "A" is a string, but 'A' is a character.
Unicode and ASCII code
Computers use binary numbers internally. A character is stored in a computer as a sequence of 0s and 1s. Mapping a character to its binary representation is called encoding. There are different ways to encode a character. How characters are encoded is defined by an encoding scheme.
Java supports Unicode, an encoding scheme established by the Unicode Consortium to support the interchange, processing, and display of written texts in the world’s diverse languages. Unicode was originally designed as a 16-bit character encoding. The primitive data type char was intended to take advantage of this design by providing a simple data type that could hold any character. However, it turned out that the 65,536 characters possible in a 16-bit encoding are not sufficient to represent all the characters in the world. The Unicode standard therefore has been extended to allow up to 1,112,064 characters. Those characters that go beyond the original 16-bit limit are called supplementary characters. Java supports the supplementary characters. The processing and representing of supplementary characters are beyond the scope of this book. For simplicity, this book considers only the original 16-bit Unicode characters. These characters can be stored in a char type variable.
A 16-bit Unicode takes two bytes, preceded by \u, expressed in four hexadecimal digits that run from \u0000 to \uFFFF.
Most computers use ASCII (American Standard Code for Information Interchange), an 8-bit encoding scheme for representing all uppercase and lowercase letters, digits, punctuation marks, and control characters. Unicode includes ASCII code, with \u0000 to \u007F corresponding to the 128 ASCII characters. Table below shows the ASCII code for some commonly used characters.
You can use ASCII characters such as 'X', '1', and '$' in a Java program as well as Unicodes. Thus, for example, the following statements are equivalent:
char letter = 'A';
char letter = '\u0041'; // Character A's Unicode is 0041
Both statements assign character A to the char variable letter.
The increment and decrement operators can also be used on char variables to get the next or preceding Unicode character. For example, the following statements display character b.
char ch = 'a';
System.out.println(++ch);
Escape Sequences for Special Characters
Suppose you want to print a message with quotation marks in the output. Can you write a statement like this?
System.out.println("He said "Java is fun"");
No, this statement has a compile error. The compiler thinks the second quotation character is the end of the string and does not know what to do with the rest of characters. To overcome this problem, Java uses a special notation to represent special characters, as shown below.
This special notation, called an escape sequence, consists of a backslash (*) followed by a character or a combination of digits. For example, *\t is an escape sequence for the Tab character and an escape sequence such as \u03b1 is used to represent a Unicode. The symbols in an escape sequence are interpreted as a whole rather than individually. An escape sequence is considered as a single character.
So, now you can print the quoted message using the following statement:
System.out.println("He said \"Java is fun\"");
The output is
He said "Java is fun"
Note that the symbols ** and **" together represent one character. The backslash ** is called an escape character. It is a special character. To display this character, you have to use an escape sequence *\*. For example, the following code
System.out.println("\\t is a tab character");
displays
\t is a tab character
Casting between char and Numeric Types
A char can be cast into any numeric type, and vice versa. When an integer is cast into a char, only its lower 16 bits of data are used; the other part is ignored. For example:
char ch = (char)0XAB0041; // The lower 16 bits hex code 0041 is
// assigned to ch
System.out.println(ch); // ch is character A
When a floating-point value is cast into a char, the floating-point value is first cast into an int, which is then cast into a char.
char ch = (char)65.25; // Decimal 65 is assigned to ch
System.out.println(ch); // ch is character A
When a char is cast into a numeric type, the character’s Unicode is cast into the specified numeric type.
int i = (int)'A'; // The Unicode of character A is assigned to i
System.out.println(i); // i is 65
Implicit casting can be used if the result of a casting fits into the target variable. Otherwise, explicit casting must be used. For example, since the Unicode of 'a' is 97, which is within the range of a byte, these implicit castings are fine:
byte b = 'a';
int i = 'a';
But the following casting is incorrect, because the Unicode \uFFF4 cannot fit into a byte:
byte b = '\uFFF4';
To force this assignment, use explicit casting, as follows:
byte b = (byte)'\uFFF4';
Any positive integer between 0 and FFFF in hexadecimal can be cast into a character implicitly. Any number not in this range must be cast into a char explicitly.
All numeric operators can be applied to char operands. A char operand is automatically cast into a number if the other operand is a number or a character. If the other operand is a string, the character is concatenated with the string. For example, the following statements
int i = '2' + '3'; // (int)'2' is 50 and (int)'3' is 51
System.out.println("i is " + i); // i is 101
int j = 2 + 'a'; // (int)'a' is 97
System.out.println("j is " + j); // j is 99
System.out.println(j + " is the Unicode for character "
+ (char)j); // 99 is the Unicode for character c
System.out.println("Chapter " + '2');
display
i is 101
j is 99
99 is the Unicode for character c
Chapter 2
Comparing and Testing Characters
Two characters can be compared using the relational operators just like comparing two numbers. This is done by comparing the Unicodes of the two characters. For example,
'a' < 'b' is true because the Unicode for 'a' (97) is less than the Unicode for 'b' (98).
'a' < 'A' is false because the Unicode for 'a' (97) is greater than the Unicode for 'A' (65).
'1' < '8' is true because the Unicode for '1' (49) is less than the Unicode for '8' (56).
Often in the program, you need to test whether a character is a number, a letter, an uppercase letter, or a lowercase letter. The ASCII character set, that the Unicodes for lowercase letters are consecutive integers starting from the Unicode for 'a', then for 'b', 'c', . . ., and 'z'. The same is true for the uppercase letters and for numeric characters. This property can be used to write the code to test characters. For example, the following code
tests whether a character ch is an uppercase letter, a lowercase letter, or a digital character.
if (ch >= 'A' && ch <= 'Z')
System.out.println(ch + " is an uppercase letter");
else if (ch >= 'a' && ch <= 'z')
System.out.println(ch + " is a lowercase letter");
else if (ch >= '0' && ch <= '9')
System.out.println(ch + " is a numeric character");
For convenience, Java provides the following methods in the Character class for testing characters as shown below:
For example,
System.out.println("isDigit('a') is " + Character.isDigit('a'));
System.out.println("isLetter('a') is " + Character.isLetter('a'));
System.out.println("isLowerCase('a') is "
+ Character.isLowerCase('a'));
System.out.println("isUpperCase('a') is "
+ Character.isUpperCase('a'));
System.out.println("toLowerCase('T') is "
+ Character.toLowerCase('T'));
System.out.println("toUpperCase('q') is "
+ Character.toUpperCase('q'));
displays
isDigit('a') is false
isLetter('a') is true
isLowerCase('a') is true
isUpperCase('a') is false
toLowerCase('T') is t
toUpperCase('q') is Q