The very short and simple guide to Unicode & UTF-8
12:59 Friday 21 October 2011 | Posted by: TomWC
Technical translation, Digital Marketing translations
Welcome to our blog post.
Today’s blog is a short reply and response to a question I was recently asked by a potential client. They are a small web development company who were looking to dip their toes into web localisation for one of their clients. They had skills in web design and development, and knew that the Unicode standard would come into play somewhere but had yet to create a website for an international audience.
Their question went along these lines “how can we make sure the translated text will display correctly when inserted into the design we create?”
In my reply as well as explaining how to achieve this I gave them a number of references from the web that provide a thorough explanation of Unicode and the best practices for web localisation. Although these resources are excellent and provide an extremely detailed and meticulous explanation (and I heartily recommend you have a look at them – links at the bottom of this post), they do tend to be quite wordy.
So in an effort to provide the information in its simplest possible terms and to distil the knowledge to a more manageable size I wanted to briefly outline the recommended steps to ensure translated content (i.e. non-roman based scripts – Greek, Arabic, Japanese etc) will display correctly.
First off here is our definition of what the Unicode standard is for those who are not sure. Unicode is a universal method of character (i.e. letter, numbers etc) encoding which will allow specific text (i.e. Roman such as English and non-Roman such as Arabic or Cyrillic) to be viewed correctly (i.e. as the person who wrote the text wanted it to appear) on multiple platforms (i.e. web browsers).
The encoding bit works by assigning a number or sequence of numbers to characters and numbers. Before Unicode there were hundreds of different encoding systems for representing numbers and characters, none of which were really suitable for representing all of the characters that may be needed on various systems. They also conflicted with one another by using the same number for two different characters. The premise of Unicode is that it works universally ‘no matter what the platform, no matter what the program, no matter what the language’.
The “how-to” bit of making use of Unicode – specifically its encoded form of UTF-8 is simple. Firstly you will need to ensure the file (your HTML etc) is saved as a UTF-8 file. Both graphic based HTML editors (such as Adobe Dreamweaver) and non-graphic based text based editors (such as Notepad) will allow you to save with this encoding applied to the file. In fact more often than not this will be set as default prior to commencing the editing of files but it is worth checking this is the case first.
You will also need to declare encoding in your page. Again this is relatively simple.
The example below shows how this can be done in a standard HTML page.
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
This piece of code will need to go between the <head></head> tags within the HTML.
For more a more in depth look at Unicode the following links will be of much help:
http://www.unicode.org/ - The Unicode Consortium – will tell you everything you need to know and more.
http://www.cl.cam.ac.uk/~mgk25/unicode.html - All you need to know to use Unicode/UTF-8 on Unix and Linux systems.
http://www.w3.org/TR/2000/NOTE-unicode-xml-20001215 - W3C information portal on Unicode.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=182192 – google's advice for creating a multilingual website.
About US
High quality translation services
Covering over 200 language combinations, we provide translation services to both large corporations and private clients. Whether you need to have your marketing material translated to help you attract new markets or simply need to have your birth certificate translated to apply for a visa, we have the language translation services you need.
We’ve built our business ethos to provide you with the services you need when you need it. We understand that time is often a critical factor and because of this our hours are not 9-5. We are committed to working the hours required to meet your deadline. Find out more about PS Translation and our range of translation services.




