| advertise add site services publishers database health videos | ![]() | about toolbar stats live show health store more stuff JOIN/LOGIN |
DjVu (pronounced like déjà vu) is a computer file format designed primarily to store scanned documents, especially those containing a combination of text, line drawings, and photographs. It uses technologies such as image layer separation of text and background/images, progressive loading, arithmetic coding, and lossy compression for bitonal (monochrome) images. This allows for high-quality, readable images to be stored in a minimum of space, so that they can be made available on the web. DjVu has been promoted as an alternative[1] to PDF, as it gives smaller files than PDF for most scanned documents. The DjVu developers report[2] that color magazine pages compress to 40–70KB, black and white technical papers compress to 15–40KB, and ancient manuscripts compress to around 100KB; a satisfactory JPEG image typically requires 500KB. Like PDF, DjVu can contain an OCRed text layer, making it easy to perform cut and paste and text search operations.
[edit] HistoryThe DjVu technology was originally developed[2] by Yann LeCun, Léon Bottou, Patrick Haffner, and Paul G. Howard at AT&T Laboratories in 1996. Due to the high compression ratio and ease of which large volumes of texts can be converted into .djvu format, a large amount of academic texts that are being circulated on the Scene are also in .djvu format, with pdf files a close second. [edit] CompressionDjVu divides a single image into many different images, then compresses them separately. To create a DjVu file, the initial image is first separated into three images: a background image, a foreground image, and a mask image. The background and foreground images are typically lower-resolution color images (e.g., 100dpi); the mask image is a high-resolution bilevel image (e.g., 300dpi) and is typically where the text is stored. The background and foreground images are then compressed using a wavelet-based compression algorithm named IW44[2]. The mask image is compressed using a method called JB2 (similar to JBIG2). The JB2 encoding method identifies nearly-identical shapes on the page, such as multiple occurrences of a particular character in a given font, style, and size. It compresses the bitmap of each unique shape separately, and then encodes the locations where each shape appears on the page. Thus, instead of compressing a letter "e" in a given font multiple times, it compresses the letter "e" once (as a compressed bit image) and then records every place on the page it occurs. [edit] Format licensingDjVu is a free file format. In 2002, the DjVu file format was chosen by the Internet Archive as the format in which its Million Book Project provides scanned public domain books online (along with TIFF and PDF).[3]The file format specification is published as well as source code for the reference library. The ownership rights to the commercial development of the encoding software have been transferred to different companies over the years, including AT&T and LizardTech. The original authors maintain a GPLed implementation named "DjVuLibre".. [edit] References
[edit] External links
| ||||||||||||||||||||||||||
| ↑ top of page ↑ | about thumbshots |