Skip to main content

Posts

Showing posts from December, 2018

ASCII and Unicode Character Set - UTF-8, UTF-16 & UTF-32 with example using Python

We all know or have heard that computers store everything in Binary Format. While for numbers, it is easy to convert them to binary but how can the alphabets be handled? ASCII (American Standard Code for Information Interchange) character set represents the english characters as numbers. Standard ASCII is a 7 bit set, which means it can represent 128 possible characters. Most computer systems use ASCII as text-format for storing files. There are other variations of ASCII available which utilizes all the 8 bits of the Byte (e.g. Extended ASCII, Latin ISO-1). Unicode was introduced to support more characters which can't be represented in ASCII standard like Greek, Arabic, Emoji's etc. Unicode standard is a character set designed to support the worldwide interchange, processing and display of the written texts of the diverse languages and technical disciplines of the modern world. In Unicode, every character is identified or represented by its code point. The current U