Unicode

Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world’s writing systems. Developed in conjunction with the Universal Character Set standard and published in book form as The Unicode Standard, the latest version of Unicode consists of a repertoire of more than 109,000 characters covering 93 scripts, a set of code charts for visual reference, an encoding methodology and set of standard character encodings, an enumeration of character properties such as upper and lower case, a set of reference data computer files, and a number of related items, such as character properties, rules for normalization, decomposition, collation, rendering, and bidirectional display order

Unicode can be implemented by different character encodings. The most commonly used encodings are UTF-8, the now-obsolete UCS-2, and UTF-16. UTF-8 uses one byte for any ASCII characters, which have the same code values in both UTF-8 and ASCII encoding, and up to four bytes for other characters. UCS-2 uses two bytes for each character but cannot encode every character in the current Unicode standard. UTF-16 extends UCS-2, using four bytes to handle each of the additional characters.

Leave a Reply

Your email address will not be published. Required fields are marked *