Limits of Lossless Compression...?
Reply #8 – 2005-11-04 09:58:15
A signal's entropy should be quite easy to compute, IIRC. If so, it would be interesting to see this ratio between actual compression and theoretical max compression in the output when you encode something. (Small hint to wavpack and flac developers ). [a href="index.php?act=findpost&pid=339454"][{POST_SNAPBACK}][/a] Although it's easy to measure entropy, there are many different kinds of entropy to measure. For example, I could give a string of 10000 numbers which any calculation would show to be extremely random. However, there could easily be a hidden generator which the entropy calculators wouldn't catch. What if my string was the 10000 digits of pi starting from digit 50000? That information could be losslessly encoded extremely well, given that the compression scheme knows about pi. The result of this is that I can create a codec which can make any song as small as I want. But in order to make one song one byte smaller, at least 256 songs have to get larger. I could take a codec which calculates the difference between the input song and "Stairway To Heaven" by Led Zepplin. If the input song is "Stairway To Heaven", it can be encoded as one bit! All the other songs would suffer, but if you find yourself making lots of encodes of that one song, it would be worth it! (it would be more worth it to use a playlist and get your brain checked, but that's beside the point...) This relates to what Acid8000 just posted: assuming your filesystem can handle it, you can have exactly one wav file down to one bit losslessly, but it wouldn't be too smart. The problem with this approaches is that your codec would have to have "Stairway To Heaven" in it so it will know what to play. However, if each file contained a list of indexes into a multi-gigabyte "most common" database, then it would be possible to make a handful of songs extremely small. (Hey! That sounds a lot like MIDI! ) What I'm trying to say (it's past my bedtime) is that entropy measurements won't be of any use unless they work similarly to the target codec. But that implies that you already know how the new codec will work, so it would be easier to simply use the new codec as an entropy measurement! (Does any of that make sense? It's very late... I'll deal with readability in the morning )