How does WAVPACK achieve its compression

2007-06-12 19:40:41

I am finding WAVPACK exceptional at about 300 kbps 44100 Hz 16 bit (gives you about 2 MB per minute of data). Very hard for me to tell the difference between that and lossless. I cannot seem to find much explanation of whether the compression is done on psychoacoustic modelling (deciding what can be thrown away as not relevant to the listening experience) or how. Can anyone explain in moderately difficult language ?

How does WAVPACK achieve its compression

Reply #1 – 2007-06-13 00:44:46

These is just my understanding of how WavPack (or any lossless-based compressor) works to achieve compression, it may be wrong (and I would welcome any corrections):

WavPack uses no psychoacoustic models, or time-to-frequency domain transformations, of any sort. This is both a strength in some ways and a weakness. It works completely different from formats such as mp3.

Instead, WavPack looks at previous samples and tries to predict (with linear or polynomial prediction) where the current sample of the wave data will be. In general the "low" and "middle frequencies" can be predicted very well with this method. After the prediction is done there is still error between the predicted value and the actual value of the current sample, mostly due to "high frequencies" being present in the music. These can't be predicted very well. Fortunately, in typical music, the amplitude of these frequencies is very low. It is generally much lower than the amplitude of the low and middle frequencies which got predicted away.

Therefore, the remaining lower-amplitude sound can be encoded perfectly with less bit-depth, maybe around 10-bits or so. (In practice, the way WavPack figures out how much bit-depth it needs is via traditional Huffman entropy-based compression, I think...the details don't matter to me).

To achieve further lossy compression, WavPack just throws out the lesser-significant bits. Therefore at 300 kbits/s you might be using around 4-bits per sample instead of 10-bits, on average. 4-bit audio might seem kind of bad, but several more things really help out. One is Joint Stereo...Joint Stereo works differently in WavPack than it does in mp3. In some ways it is way better (although in some ways it is worse). Overall I think it really helps. Joint Stereo is less useful in mp3 during times the left and right channels are too different. In WavPack, even if the low and middle frequencies are very different, it doesn't matter because they get predicted away. As long as the high frequencies don't have too much stereo separation, WavPack joint stereo can really help out. In practice, at 300 kbits/s your sound might have a bit-depth of closer to 6-bits per sample due to the joint stereo, most of the time.

6-bits still doesn't seem like great quality, but remember that 10-bits would have been lossless. So in some sense the quality of your WavPack file is now similar to a 9-bits WAV or something. Maybe that still doesn't sound good to you, but if you try to make and listen to real 8-bit or 10-bit WAVs for fun, you'll find that sometimes those sound like CD quality also. 16-bits isn't always needed for transparency, it depends on the kind of sound/music being encoded.

Furthermore, your "9-bit" (estimated) lossy WV file still sounds much better than a 9-bit WAV file. Why? Because WavPack is smart and actually that "9-bits" varies naturally with time (kind of like variable bit rate, but not really...WavPack is not variable bit rate). During times when a song is soft overall, that's when a 9-bit WAV file will sound real bad (you can hear hiss). But during those times a song is soft overall, WavPack compresses better, and during these times at your 300 kbits/s it will be more like a 12-bit file. So it naturally compensated. And during the times a song is loud overall, a 9-bit WAV file will sound transparent. And during those times WavPack will be unable to compress well, and it will be the equivalent of a 6-bit file. But lucky for you, during these loud times you still can't hear the difference between a 6-bit or 16-bit file (that's the way your ear works) and so the quality is more consistent with WavPack.

I'm not trying to say that WavPack lossy is perfect though. There are ways to really make it sound terrible (certain kinds of sounds or music). But it does achieve remarkable compression without using any psychoacoustics, and the explanation I just gave is the way I understand it.

How does WAVPACK achieve its compression

Reply #2 – 2007-06-13 01:07:50

Quote from: audioflac on 2007-06-12 19:40:41

I am finding WAVPACK exceptional at about 300 kbps 44100 Hz 16 bit (gives you about 2 MB per minute of data). Very hard for me to tell the difference between that and lossless...

Very hard!!!

It is more likely impossible for you to tell the difference. Did you ABX it?

How does WAVPACK achieve its compression

Reply #3 – 2007-06-13 01:11:25

AFAIK WavPack lossy does not use a psy-model extensively, if at all.

How does WAVPACK achieve its compression

Reply #4 – 2007-06-13 01:20:13

Quote from: Light-Fire on 2007-06-13 01:07:50

Quote from: audioflac on 2007-06-12 19:40:41
I am finding WAVPACK exceptional at about 300 kbps 44100 Hz 16 bit (gives you about 2 MB per minute of data). Very hard for me to tell the difference between that and lossless...

Very hard!!!

It is more likely impossible for you to tell the difference. Did you ABX it?

I'd qualify this by adding, "without having to resort to the synthesis of unrealistic samples."

How does WAVPACK achieve its compression

Reply #5 – 2007-06-15 00:55:40

I am not sure why Light-Fire feels a need to get on audioflac's case. He didn't make a strong statement, nor did he say that he thinks WavPack compression is inadequate. He said he thinks WavPack is exceptional. Why did you need to nitpick at what he said?

Even the WavPack documentation states that 384 kbps is generally required for transparency on real music. audioflac's comment was regarding 300 kbps, that's way lower. Real music samples requiring 350 to 400 kbps for transparency can be regularly found. Real, but extremely rare and unusual samples requiring 450 kbps or more can be found, too. I think most people who use WavPack lossy would encode at 320 to 500 kbps...

How does WAVPACK achieve its compression

Reply #6 – 2007-06-15 03:22:33

[deleted]

How does WAVPACK achieve its compression

Reply #7 – 2007-06-15 06:25:22

Interesting. I believe it comes down to size/quality tradeoff, music type, listener sensitivity, transparency vs convenience of smaller files. Even at low bitrates the added hiss might not bother many users as opposed to artifacts. I have heard speakers in shopping centers hissing like wavpack 200k , yet it never struck me as unnatural. At 300k most music has enough movement that masking is very good and the noise is completely inaudiable to the listener. A more dynamic signal may require a higher bitrate.

Notice