# Pizer’s Weblog

programming, DSP, math

## Compression FAC

This is an attempt to answer claims found in the usenet group comp.compression. These kinds of claims pop up very often and represent a rather large fraction of the overall traffic. So, instead of repeating every argument the idea is to point those authors to this site in addition to section 9 of the official FAQ. It is believed that the authors of those claims are not familiar with the relevant mathematical principles and terms which is why I included many links to Wikipedia.

C: I can losslessly compress previously compressed data

A: That’s not very specific. It certainly is possible. For example, “previously compressed data” could be a ZIP archive of many small text files. It’s very likely that you can unpack those files, use another archiver that performs better than ZIP (i.e. supports “solid archives” like 7zip) and end up with a smaller archive which stores exactly the same informations (except maybe the files’ time stamps but this a little technicality that can be worked around).

C: I can losslessly compress random data

A: That’s not very specific. What kind of random data do you mean? It certainly is possible to generate random data with an exploitable low entropy and/or random data where the next symbol’s probability distribution changes depending on the previous symbols and thus gets more predictable. Be more specific. Describe the model of your “random data” and what exectly the meaning of the claim is. For example, if you mention the word “random” and claim to be able to compress “random data” you need to say what you mean by that. Do you mean you’re able to compress every possible sequence or just to reduce the expected file length? The latter one would allow some files to be “compressed” to larger files.

What’s NOT possible is to compress random data on average whose entropy is maximal. For example, if you toss a coin multiple times in a way that makes the coin toss independent from previous coin tosses and the probability of heads is the same as the probability of tails then you have a random sequence of “bits” that can’t be compressed on average because the entropy is as high as it could be. In technical terms this would be a “memoryless source” with an “uniform distribution” of symbols. If you subscribe to the idea of having a deterministic predictor that predicts the next bit correctly with a probability of above (or below) 50% you can be quickly proven wrong: The assumption was that the source is memoryless. So, all the past symbols won’t affect the next one. Since the symbol distribution is uniform it doesn’t matter what symbol is predicted. In any case you have the chance of getting it right 50% of the times. Of course, this extends to alphabets with more than two symbols.

Another popular approach is the “magic function theory”. One might think it’s possible to represent such a random sequence as a function using parameters that can be more compactly coded. While this is true for some data (i.e. image data with special properties and a transform that exploits these properties like the DCT of a DWT) it doesn’t apply to the “worst kind of” random data mentioned above.

C: I can losslessly compress all files

A: Be more specific. Did you mean “all possible files”? If yes, you’d be contradicting the Pigeon principle. Think in terms of sets. You have a set of possible input files A. If you limit the files’ size to multiples of 8 bits (bytes) and files that are no larger than 2^64-1 bytes this set will be finite but still very huge. Now, if you claim to be able to compress all files from this set and by compress you mean map this file to another file that is smaller, then this is simply not possible because this mapping would be a mapping from A to the set B where B is a real subset of A. The cardinality of B would be smaller than A: |A|>|B|. The problem is that no such mapping can be injective. That means there exists at least one pair of possible input files that are mapped to the same output file. The decompressor simply can’t know how to decompress this file again because it is the “image” of more than one input file. Hence, the compression would not be lossless. It’s that simple, really.

Variations of this claim include “I can compress all possible files that are larger than X”. The official comp.compression FAQ includes a mathematical proof that proves this to be impossible. See section 9.

C: I can compress random data but with a slight loss

A: Be more specific. There’s nothing special about having some lossy compression algorithm for random data. How does it compare to other algorithms? What kind of sources is this algorithm tailerd to? Define “slight loss”. What’s the quality measure?

What we’re talking about here is a potential building block for a specialized lossy compression algorithm. One example of such a building block would be a simple scalar quantizer followed by Huffman coding. Usually, the interesting quality measure is the p-norm of the error with p=2. Given a memoryless source and a certain sample distrubution you can maximize the compression (minimize the “rate”) subject to a certain minimum signal-to-noise ratio (keep “distortion” below X) by choosing an appropriate scalar quantizer and an appropriate Huffman code book. Instead of comparing your algorithm to some other algorithm you can also compare it to the theoretical optimum. Be sure to read up on rate-distortion theory.

In case of images, video and audio the quality measure would be linked to human perception. Also, this kind of data is hardly random. The perceptual coding topic might be much more complicated than you think. There is no such thing as a “general purpose lossy compressor” because all lossy compressors are tailored for a specific case. For example, you have JPEG for images and MP3 for audio. You won’t impress people with a “generel purpose” lossy compressor because it won’t probably be able to beat other coders that have been speifically optimized for a certain domain.