| advertise add site services publishers database health videos | ![]() | about toolbar stats live show health store more stuff JOIN/LOGIN |
Healing Your Feelings Order ORDER | I.C.S.T.R. (QLD) icstr.com.au | Stretch Marks, Scars, Remove Stretch Marks, Remove Scars | Sona MedSpa sonamedspa.com | Used Midmark IQ MARK - Used BRENTWOOD IQ MARK DIGITAL SPIROMTER For Sale world-widemedical.com | About Dr. Mark Harrington | Mark J. Harrington Orthodontics | Plymouth MN harrington-ortho.com |
The byte order mark (BOM) is a Unicode character used to signal the endianness (byte order) of a text file or stream. Its code point is Because Unicode can be encoded as 16-bit or 32-bit integers, a computer receiving Unicode text from arbitrary sources needs to know which byte order the integers are encoded in. The BOM gives the producer of the text a way to describe the text stream's endianness to the consumer of the text without requiring some contract or metadata outside of the text stream itself. Once the receiving computer has consumed the text stream, it presumably processes the characters in its own native byte order and no longer needs the BOM. Hence the need for a BOM arises in the context of text interchange, rather than in normal text processing within a closed environment.
[edit] UsageIn UTF-16, a BOM (
The Unicode value While UTF-8 does not have byte order issues, a BOM encoded in UTF-8 may nonetheless be encountered. A UTF-8 BOM is explicitly allowed by the Unicode standard[2], but is not recommended[3], as it only identifies a file as UTF-8 and does not state anything about byte order.[4] Many Windows programs (including Windows Notepad) add BOMs to UTF-8 files by default. However in Unix-like systems (which make heavy use of text files for file formats as well as for inter-process communication) this practice is not recommended, as it will interfere with correct processing of important codes such as the shebang at the start of an interpreted script.[5] It may also interfere with source for programming languages that don't recognise it. For example, gcc reports stray characters at the beginning of a source file, and in PHP, if output buffering is disabled, it has the subtle effect of causing the page to start being sent to the browser, preventing custom headers from being specified by the PHP script. The UTF-8 representation of the BOM is the byte sequence Although a BOM could be used with UTF-32, this encoding is rarely used for transmission. Otherwise the same rules as for UTF-16 are applicable. For the IANA registered charsets UTF-16BE, UTF-16LE, UTF-32BE, and UTF-32LE a "byte order mark" must not be used, an initial U+FEFF has to be interpreted as a (deprecated) "zero width no-break space", because the names of these charsets already determine the byte order. For the registered charsets UTF-16 and UTF-32, an initial U+FEFF indicates the byte order. If the BOM character appears in the middle of a data stream, it should, according to Unicode, be interpreted as a "zero-width non-breaking space" (essentially a null character). Its deliberate use for this purpose is deprecated in Unicode 3.2, however, with the "Word Joiner" character, [edit] Representations of byte order marks by encoding
[edit] See also[edit] References
[edit] External links |
| ↑ top of page ↑ | about thumbshots |