UTF-8 Encoding of Non-ASCII Characters

<< Click to Display Table of Contents >>

Navigation:  Appendix > DOI Name Encoding >

UTF-8 Encoding of Non-ASCII Characters

UTF-8 is a Unicode encoding that allows characters to be encoded in terms of one to six octets. UTF-8 encoding plays a role when non-ASCII characters are used. For example, the Japanese word "nihongo" is written as:


The Unicode sequence representing the Han characters for "nihongo" is: 65E5 672C 8A9E. These may be encoded in UTF-8 as follows: E6 97 A5 E6 9C AC E8 AA 9E.

For further information on UTF-8 see RFC 3629.