Google: 60 Percent of the Web's Content is Now in Unicode

Google: 60 Percent of the Web's Content is Now in Unicode

Unicode's mission is to enable people around the world to use computers in their language by creating a standard for encoding the characters of all writing systems in the world. Judging from the latest data from Google, Unicode is clearly on its way to fulfill this mission. According to Google, about 60% of the web's content is now encoded in Unicode.

Unicode

Just since 2006 alone, Unicode's usage has grown 800% and, as Google notes, if it had added the ASCII standard, which is basically a subset of most other encodings, Unicode's share would have been closer to 80%.

Today's Unicode standard includes close to 110,000 characters, including 75,000 Chinese ideographs, Arabic, Russian and hundreds of emoji symbols.

Google itself uses Unicode as the internal format for all the text in Google Search. Indeed, whenever it encounters a text in any other format, the first thing it does is convert in to Unicode.