UTF-32/UCS-4 Information & UTF-32/UCS-4 Links at HealthHaven.com
advertise
add site
services
publishers
database
health videos
Bookmark and Share

search wiki for    ?
web dir firms image gallery news pdf wiki shop video 
about
toolbar
stats
live show
health store
more stuff
JOIN/LOGIN
Featured Results:
Biofreeze Pump 32 Oz - Premier Medical Supply
Biofreeze Pump 32 Oz - Premier Medical Supply
premiermed.ca
 Medical News | Health News - Therapy 32 times more cost effective at...
Medical News | Health News - Therapy 32 times more cost effective at...
healthcanal.com
 Sony LMD-3250MD High Definition 32 inch Medical LCD Monitor
Sony LMD-3250MD High Definition 32 inch Medical LCD Monitor
ampronix.com
 
Unicode
Character encodings
UCS
Mapping
Bi-directional text
BOM
Han unification
Unicode and HTML
Unicode and E-mail
Unicode typefaces

UTF-32 (or UCS-4) is a protocol for encoding Unicode characters that uses exactly 32 bits for each Unicode code point. All other Unicode transformation formats use variable-length encodings.

Because UTF-32 uses 4 bytes for every character it is quite space inefficient. Specifically, non-BMP characters are so rare in most texts, they may as well be considered non-existent for sizing discussions, making UTF-32 between two and four times the size of other encodings.

Though a fixed number of bytes per code point seems convenient, it is not used as much as the other Unicode encodings. It makes truncation slightly easier but not significantly so compared to UTF-8 and UTF-16. It does not make calculating the displayed width of a string any easier except in very limited cases, since even with a “fixed width” font there may be more than one code point per character position (combining marks) or more than one character position per code point (for example CJK ideographs). Combining marks also mean editors cannot treat one code point as being the same as one unit for editing.

[edit] History

The original ISO 10646 standard defines a 31-bit encoding form called UCS-4, in which each encoded character in the Universal Character Set (UCS) is represented by a 32-bit friendly code value in the code space of integers between 0 and hexadecimal 7FFFFFFF.

UCS-4 is sufficient to represent all of the Unicode code space, which has 1114112 (= 220 + 216) code points and therefore requires only up to hexadecimal 10FFFF. Some people consider it wasteful to reserve such a large code space for mapping a relatively small set of code points, so a new encoding form, UTF-32, was proposed. UTF-32 is a subset of UCS-4 that uses 32-bit code values only in the 0 to 10FFFF code space.

UTF-32 was originally a subset of the UCS-4 standard, but the Principles and Procedures document of JTC1/SC2/WG2 states that all future assignments of characters will be constrained to the BMP or the first 14 supplementary planes, and has removed former provisions for private-use code positions in groups 60 to 7F and in planes E0 to FF.

Accordingly UCS-4 and UTF-32 are now identical except that the UTF-32 standard has additional Unicode semantics.

[edit] See also

[edit] External links




Product Results (view all...)

search wiki for    ?
web dir firms image gallery news pdf wiki shop video 



↑ top of page ↑about thumbshots