Temporal redundancy removal in compressed audio

2005-09-02 20:17:09

Hi!

If you were looking for a paper about improving vorbis compression by using
arithmetic coding and exploiting temporal redundancies (i.e. repeating sound)
you might have a look here:

http://web.interware.hu/rudas

Three ogg vorbis files were compressed losslessly, with compression ratios
between 2 - 8 % and compression time about 1/500 realtime (i.e. slow).
Theoretical upper bound of the compression for real music is estimated to be
between 10 and 20 % - further improvements to the methods are possible.

bye
Denes

Temporal redundancy removal in compressed audio

Reply #1 – 2005-09-03 05:41:21

I assume decompression is also slower than realtime?

Temporal redundancy removal in compressed audio

Reply #2 – 2005-09-03 05:58:17

Quote

I assume decompression is also slower than realtime?
[a href="index.php?act=findpost&pid=324639"][{POST_SNAPBACK}][/a]

I would not think that the decrompression would be slow because the computation time was used in finding the temporal redundancies.

Temporal redundancy removal in compressed audio

Reply #3 – 2005-09-03 12:58:12

What have we learned from that (Some may have known this already) ?
Extra-Long-term temporal prediction isn't really useful/practical.

Kudos to the author for going through the hassle of implementing that stuff, though.

An area I think one could try to explore:
(Short-term) Temporal & interchannel-prediction of the floor curves and residue codebook selection side infos instead.

Let me quote this part:

Quote

However, the most striking observation is that the MDCT transform, as used currently, is not well suited for finding and exploting temporal redundancy. The most likely cause of this is the MDCT's lack of translational invariance.

Possibly transforms that are closer to being translation invariant are necessary, these could be:
- A close model of the ear (not necessarily critically sampled or with the property of perfect reconstruction)
- wavelet packets or similar transforms
- mdct and mdst, perhaps tuning the angle with respect to the amplitude

While I agree with the first paragraph I fail to see how a wavelet packet transform is any better when it comes to shift invariance compared to the MDCT. With MDCT+MDST one can approximate a time shifted version in the freq domain for a low shift, though. But he needs to adjust the angle with respect to the frequency instead of the amplitude. Anyhow, I wouldn't encourage anyone to try that.

Let's not forget the amount of memory a decoder needs to memorize the last seen packets so it can restore the audio data due to the extra long term prediction.

Sebi

edit: grammar, typos

Notice