This compression system is a very handy invention, especially for Web users, because it lets you reduce the overall number of bits and bytes in a file so it can be transmitted faster over slower Internet connections, or take up less space on a disk. Once you download the file, your computer uses a program such as WinZip or Stuffit to expand the file back to its original size. If everything works correctly, the expanded file is identical to the original file before it was compressed.
At first glance, this seems very mysterious. How can you reduce the number of bits and bytes and then add those exact bits and bytes back later? As it turns out, the basic idea behind the process is fairly straightforward. In this article, we'll examine this simple method as we take a very small file through the basic process of compression. Most types of computer files are fairly redundant -- they have the same information listed over and over again.
File-compression programs simply get rid of the redundancy. Instead of listing a piece of information over and over again, a file-compression program lists that information once and then refers back to it whenever it appears in the original program. The quote has 17 words, made up of 61 letters, 16 spaces, one dash and one period.
If each letter, space or punctuation mark takes up one unit of memory , we get a total file size of 79 units. To get the file size down, we need to look for redundancies. Ignoring the difference between capital and lower-case letters, roughly half of the phrase is redundant. Nine words -- ask, not, what, your, country, can, do, for, you -- give us almost everything we need for the entire quote.
To construct the second half of the phrase, we just point to the words in the first half and fill in the spaces and punctuation. We'll look at how file-compression systems deal with redundancy in more detail in the next section. Most compression programs use a variation of the LZ adaptive dictionary-based algorithm to shrink files. The system for arranging dictionaries varies, but it could be as simple as a numbered list. When we go through Kennedy's famous words, we pick out the words that are repeated and put them into the numbered index.
Then, we simply write the number instead of writing out the whole word. If you knew the system, you could easily reconstruct the original phrase using only this dictionary and number pattern.
This is what the expansion program on your computer does when it expands a downloaded file. You might also have encountered compressed files that open themselves up. To create this sort of file, the programmer includes a simple expansion program with the compressed file. It automatically reconstructs the original file once it's downloaded.
But how much space have we actually saved with this system? In an actual compression scheme, figuring out the various file requirements would be fairly complicated; but for our purposes, let's go back to the idea that every character and every space takes up one unit of memory. We already saw that the full phrase takes up 79 units. Our compressed sentence including spaces takes up 37 units, and the dictionary words and numbers also takes up 37 units.
This gives us a file size of 74, so we haven't reduced the file size by very much. But this is only one sentence! You can imagine that if the compression program worked through the rest of Kennedy's speech, it would find these words and others repeated many more times. And, as we'll see in the next section, it would also be rewriting the dictionary to get the most efficient organization possible. In our previous example, we picked out all the repeated words and put those in a dictionary.
To us, this is the most obvious way to write a dictionary. But a compression program sees it quite differently: It doesn't have any concept of separate words -- it only looks for patterns. And in order to reduce the file size as much as possible, it carefully selects which patterns to include in the dictionary. As we mentioned above, lossless compression is important in cases where you can't remove any of the original file.
If you've been curious as to how ZIP files work, this is the answer. When you create a ZIP file from a program executable in Windows, it uses lossless compression. The ZIP file compression is a more efficient way to store the program, but when you unzip decompress it, all the original information is present. If you used lossy compression to compress executables, the unzipped version would be damaged and unusable.
Lossless formats for video are rare, because they would take up massive amounts of space. Now that we've looked at both forms of file compression, you might wonder when you should use one or the other.
As it turns out, there is no "better" form of compressionit all depends on what you're using the files for. In general, you should use lossless compression when you want a perfect copy of the source material, and lossy compression when an imperfect copy is good enough. Let's look at another example to see how they can work in harmony.
Say that you've just dug up your old CD collection and want to digitize it so you have all your music on your computer. This lets you have a master copy on your computer that's as good as the original CD.
Later, perhaps you want to put some music on your phone or an old MP3 player so you can listen on-the-go. This gives you an audio file that's still perfectly listenable, but doesn't take up as much space on your mobile device. You can even compress a video directly on your iPhone. The type of data represented in a file can also dictate which type of compression is best. Because PNG images use lossless compression, they offer small file sizes for images with lots of uniform space, like computer screenshots.
However, you'll notice that PNGs take up much more space when they represent the jumble of colors in real-world photos. As we've seen, converting lossless formats to lossy is fine, as is converting one lossless format to another. However, you should never convert a lossy format to lossless, and should beware converting one lossy format to another. Converting lossy formats to lossless is simply a waste of space.
Remember that lossy formats throw data out; it's impossible to recover that data. Say you have a 3MB MP3 file. Converting back to a lossless format doesn't "recover" the information that the MP3 compression threw out.
Finally, as mentioned earlier, converting one lossy format to another or repeatedly saving in the same format will degrade the quality further. Every time you apply the lossy compression, you lose more detail.
This becomes more and more noticeable until the file is essentially ruined. We've taken a look at both lossy and lossless compression to see how they work. Now you know how it's possible to store a file at a smaller size than its original form, and how to choose the best method for your needs. Of course, the algorithms that decide what data gets thrown out in lossy methods and how to best store redundant data in lossless compression are much more complicated than we've explained here.
There's a lot more to discover on this topic if you're interested. Tried out lossless compression and need to send something to a friend? Try these fast ways to transfer large files online. Lossless compression reduces file size without removing any bits of information. Instead, this format works by removing redundancies within data to reduce the overall file size. With lossless, it is possible to perfectly reconstruct the original file.
For example, the most common lossless compression format ZIP is often used for program files in Windows, as it preserves all the original information. Decompressing the file unzipping produces an executable program that would otherwise be useless with lossy. Lossless formats for video are rare, as the source files would take up massive amounts of space.
Compressing a file into a ZIP may reduce its size, but it is impossible to continue compressing the file further and reduce the size to nothing. How does data compression work from a technical standpoint?
Well, the actual algorithms that decide what data gets thrown out in lossy methods and how to best store redundant data in lossless compression are extremely complicated.
This overview of data compression is meant to serve as a high-level overview of the basics and provide context for how to apply these practices in real-world situations.
0コメント