Compression technology history For the last 100 years, the IT - TopicsExpress

IBM developerWorks Vietnam

Compression technology history For the last 100 years, the IT technology has evolved from very large systems built to execute simple mathematical operations. Now small devices are able to execute, generate, and manipulate massive amounts of data. Devices that are able to store and capture data are showing an exponentially growth from year to year. Because of requirements that data must be stored over a long period of time for long-term reference, compliance, and/or security purposes, new ways to optimize the capacity utilization are needed. For more information about data growth evolution, see Chapter 1, “The industry requirement for compression” on page 3. Historically, over the last 200 years, the technologies available to reduce the amount of data stored or transported from one place to another have been greatly improved. One of the first methods for reducing the amount of data was the usage of symbols and representations in mathematical format. For example, instead of writing the words “multiplied by,” the related representation used is the asterisk character ( * ). In the same way, the word “minus” is represented with the dash character ( - ). In 1838 the invention of Morse code allowed messages to be transmitted very quickly over long distances.Roman letters and Arabic numbers were replaced with symbols formed from lines and dots. In order to reduce the amount of dots or lines used to represent each letter, statistical analysis of the commonality of letters was performed. The most common letters are represented with a shorter combination of dots and lines. The commonality is different for each language, as is the Morse code. For example, in the English language, the letter “c” is represented in the Morse code by 3 dots, while the letter “h” is represented by 4 dots. The representation will therefore consist of 7 dots. However, in some languages “ch” is a very common combination, so the dots were replaced by lines, and “ch” is represented by 4 lines, effectively saving transmission time. Later in the 20th century the development of IT technologies raised the need for complex algorithms able to reduce the amount of data. This is done by interpreting the information beyond the simple substitution of specific strings or letters. One of the first techniques of mathematical data compression was proposed by Claude E. Shannon and Robert Fano in 1949. In the Shannon-Fano coding, symbols are sorted from the most probably to the least probable, and then encoded in a growing number of bits. For example, if the source data contains A B C D E, where A is the most common letter and E is the least common letter, the Shannon-Fano coding will be 00-01-10-110-111. In 1952 a Ph.D. student at MIT named David A. Huffman proposed a more efficient algorithm for mapping source symbols to unique string of bits. In fact, Huffman has proved that his coding is the most efficient method for this task, with the smallest average output bits per source symbol. Later, Abraham Lempel and Jacob Ziv in 1977 proposed a method of replacing repeating words with code words. The method was applicable also to a pattern of text such as expressions. This was the actual dawn of modern data compression. Later in 1984 Terry Welch improved the algorithm proposed by Lempel and Ziv (also known as LZ78) and developed a method known as LZW. Today this algorithm is the basis of modern compression techniques used in PKZIP for general file compression, or within GIF and TIFF formats for images. Over time, many data compression algorithms have been developed around the Lempel - Ziv method: LZSS (LZ - Storer-Szymanski), LZARI (LZ with Arithmetic encoding), and LZH (LZ + Huffman encoding, us ed by the ARJ utility). The IBM Real-time Compression Appliance also uses compression based on LZH. For more information about both algorithms, consult the following links: Details about Lempel-Ziv coding can be found at the following websites: Lempel-Ziv explained: www-math.mit.edu/~shor/PAM/lempel_ziv_notes.pdf Lempel-Ziv coding: code.ucsd.edu/cosman/NewLempel.pdf Details about Huffmann coding can be found at the following websites: Huffmann coding explained: phy.davidson.edu/fachome/dmb/py115/huffman_coding.htm Detailed Huffman coding: cs.nyu.edu/~melamed/courses/102/lectures/huffman.ppt?

Posted on: Tue, 06 Aug 2013 02:58:08 +0000

Compression technology history For the last 100 years, the IT - TopicsExpress

Trending Topics

Recently Viewed Topics