Le 13 juillet 2016, 08:33 dans Humeurs • 0
Once you've chosen to support Unicode, you must decide on the specific character encoding you want to use, which will be dependent on the application requirements and technologies. UTF-8 is one of the commonly used character encodings defined within the Unicode Standard, which uses a single byte for each character unless it needs more, in which case it can expand up to 4 bytes. People sometimes refer to this The Cobalt Code Review as a variable-width encoding since the width of the character in bytes varies depending upon the character. The advantage of this character encoding is that all English (ASCII) characters will remain as single-bytes, saving data space. This is especially desirable for web content, since the underlying HTML markup will remain in single-byte ASCII. In general, UNIX platforms are optimized for UTF-8 character encoding. Concerning databases, where large amounts of application data are integral to the application, a developer may choose a UTF-8 encoding to save space if most of the data in the database does not need translation and so can remain in English (which requires only a single byte in UTF-8 encoding). Note that some databases will not support UTF-8, specifically Microsoft's SQL Server.
UTF-16 is another widely adopted encoding within the Unicode standard. It assigns two bytes for each character whether you need it or not. So the letter A is 00000000 01000001 or 9 zeros, a one, followed by 5 zeros and a one. If more than 2 bytes are needed for a character, four bytes can be combined, however you must adapt your software to be capable of handling this four-byte combination. Java and .Net internally process strings (text and messages) as UTF-16.For many applications, you can actually support multiple Unicode encodings so that for example your data is stored in your database as UTF-8 but is handled within your code as UTF-16, or vice versa. There are various reasons to do this, such as software limitations (different software components supporting different Unicode encodings), storage or performance advantages, etc.. But whether that's a good idea is one of those "it depends" kinds of questions. Implementing can be tricky and clients pay us good money to solve this.Microsoft's SQL Server is a bit of a special case, in that it supports UCS-2, which is like UTF-16 but without the 4-byte characters (only the 16-bit characters are supported).