HydrogenAudio

Lossy Audio Compression => Other Lossy Codecs => Topic started by: przemyslawo on 2023-09-23 00:14:31

Title: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: przemyslawo on 2023-09-23 00:14:31
I came across this new audio codec: https://github.com/descriptinc/descript-audio-codec

Here are some samples of audios encoded with this: https://descript.notion.site/Descript-Audio-Codec-11389fce0ce2419891d6591a68f814d5

Unfortunately I can't do a listening test at the moment because my headphones aren't very good, but honestly I didn't feel any difference between the original audio and the audio encoded with this codec (I listened to the music audio in the demonstration page).

What do you think?
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: Bryanhoop on 2023-09-23 02:51:09
This isn't meant for acoustic audio compression, it's meant to feed simplified data into AI algos, so I'd imagine it sounds pretty good to a neutral network and pretty bad to a human ear.
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: Octocontrabass on 2023-09-23 06:55:26
It's definitely meant for acoustic audio compression, and the demos sound pretty good to me.

I notice they make no mention of how long it takes to encode or decode the audio.
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: magicgoose on 2023-09-23 08:31:19
This is really impressive.
Although the provided samples are rather simple, and only in mono. On these samples I can't quickly tell where to focus to hear a difference.
(worth noting, the source audio samples have apparently already went through some sort of lossy compression - but that doesn't necessarily make it easier to mask further losses)

I'll try to install it, hopefully it doesn't require a huge GPU to work.
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: magicgoose on 2023-09-23 09:51:50
Has anyone understood if it's only able to work in hard CBR mode, or is there some flexibility possible?
For example, can it use less bandwidth during periods with relatively simple signal, or periods of complete silence?
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: forart.eu on 2023-09-23 11:28:05
Other interesting "AI-based" audio compression resources:
https://github.com/forart/HyMPS/blob/main/AIaudio.md#codecs-
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: rc55 on 2023-09-23 11:33:26
I had a quick play - had to install CUDA 11.7 and a large amount of python stuff to get this working.

If you're on Windows and struggling with the pytorch not compiled for CUDA error, remove all Nvidia CUDA software and Nvidia drivers, and reinstall both using the CUDA 11.7 installer.

Encoded a WAV file (CD Audio, 1h16m51s) - 69 seconds to encode, 145 seconds to decode on a 3080.

Command line:
python -m dac encode in.wav --output .\
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: przemyslawo on 2023-09-23 23:56:36
I had a quick play - had to install CUDA 11.7 and a large amount of python stuff to get this working.

If you're on Windows and struggling with the pytorch not compiled for CUDA error, remove all Nvidia CUDA software and Nvidia drivers, and reinstall both using the CUDA 11.7 installer.

Encoded a WAV file (CD Audio, 1h16m51s) - 69 seconds to encode, 145 seconds to decode on a 3080.

Command line:
python -m dac encode in.wav --output .\

I tried to install with pipx in my Debian, but as it was taking so long to download, I had to abort at 10 minutes of installation, and also I was afraid that it could install a lot of stuff and it would be difficult to remove afterwards. But this is not a problem, there is LXC in which I can make a clean installation without messing with the system; I will give a try when I have a spare time.

My PC is of 2017 and my GPU is a Nvidia GT 1030, comparing to a 3080 it is 12~15x slower. My question is if the decoding will be that slower too.

In archive.org there is some royalty free FLAC musics: https://archive.org/search?query=royalty+free+flac

/\ I will try to convert to .dac and will post the result here.
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: C.R.Helmrich on 2023-09-24 13:40:33
Note that HydrogenAudio user Kamedo2 posted some audio samples for blind listening tests in the following thread, which (I think, given it's a noncommercial kind-of-research study) you could use as well:

https://hydrogenaud.io/index.php/topic,98003.0.html

Chris
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: przemyslawo on 2023-09-24 17:42:22
I successfully installed this encoder in a Python virtual environment (venv), it consumed 9.8GB of disk space.

I have success in converting a .wav file to this format, but I couldn't decode due to an error in the codec (or maybe my GPU is unsupported).

Here I use an AMD Ryzen 5  1400 and a Nvidia GT  1030 graphic board, it took 55 seconds to encode an audio with 6:12 of duration.

Maybe in the future this codec becomes more performant.
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: forestasia on 2023-09-24 19:49:06
Sounds amazing at 8kbps.

Definitely the best sounding music compression I've heard at that bitrate.
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: rutra80 on 2023-09-24 22:34:03
Amazing at 8kbps... Can it be decoded without that 9,8GB of disk space? ;]
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: itisljar on 2023-09-25 09:01:20
Amazing at 8kbps... Can it be decoded without that 9,8GB of disk space? ;]
That's what I wanted to ask - does it have some simple decoder, or does it need full power of AI computing to reconstruct it?
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: rc55 on 2023-09-25 17:53:01
Amazing at 8kbps... Can it be decoded without that 9,8GB of disk space? ;]
That's what I wanted to ask - does it have some simple decoder, or does it need full power of AI computing to reconstruct it?

I've had a brief experimentation with using "Auto Py To Exe" to compile the dac.py script to an executable, and it's made a 40MB exe with over 4GB of support files in a subfolder (the vast majority is PyTorch).

This does not include the weights file which is ~300MB. The weights file contains the model data used to encode and decode the audio and is a necessity, and the model is not interchangeable. You have to use the same model to decode any encoded file.

I anticipate there is plenty of scope to optimise the software size, but it's leaning heavily on the PyTorch baggage. I anticipate as operating system support matures for AI models, it might be that there could be a standard for using models at the OS level so hooking into a model is no different than making sure you have the latest version of DirectX on Windows.

I'm trying to be careful to not violate TOS #8 - the MUSHRA scores on the GitHub should suffice, but I'd just like to comment that the performance is profoundly good. Mods - feel free to redact this last paragraph if I've made a mistake here.


Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: przemyslawo on 2023-09-25 20:12:35
When I buy a decent headphone I will comment about its quality.

I hope that its developers optimize the code for less GPU usage in the future.
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: C.R.Helmrich on 2023-09-26 09:41:34
... it's made a 40MB exe ...
This does not include the weights file which is ~300MB. The weights file contains the model data used to encode and decode the audio and is a necessity, and the model is not interchangeable. You have to use the same model to decode any encoded file.
Thanks for the analysis! Out of curiosity: could you 7zip (preset Ultra) that 40MB exe and 300MB weight file and let us know what file size comes out? That would be a rough estimate of how much room for reduction there is.

Thanks,

Chris
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: itisljar on 2023-09-26 09:44:59
That would be a rough estimate of how much room for reduction there is.

You forgot 4 GB of support files there, Chris :)
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: C.R.Helmrich on 2023-09-26 09:57:19
No, I didn't, these apparently represent the Python/PyTorch installation itself and could be avoided in a software written e.g. in C or C++.

Chris
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: przemyslawo on 2023-09-26 10:00:01
Python is slower than compiled languages such as C++ or Go: https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/python3-go.html

But efforts have been made for speeding up Python, such as Codon compiler: https://github.com/exaloop/codon

I don't know if Python code would be that slow on a GPU as well as in a CPU, but it would be awesome to have this codec compiled through LLVM.

Codon is still not ready to compile 100% of Python code. But the developers are working to implement the missing features such as metaclass support.
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: fooball on 2023-09-26 10:46:12
Hmm... 9GB to store the decoder, or 9GB to store files with a slightly less efficient compression and lightweight demands on hardware.  Tricky...

AI is notorious for making things up in a believable way.  The output might sound beautiful, but is it true?
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: binaryhermit on 2023-09-26 11:04:15
90x smaller than wav, huh?  Assuming they mean 16/44.1 PCM wav files, that'd be somewhere in the ballpark of 16 kbps, right?

I'm going to go out on a limb and say it either sounds bad or is completely impractical for most use cases.  Or requires licensing over 9000 patents to implement.
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: Porcus on 2023-09-26 11:24:49
90x smaller than wav, huh?  Assuming they mean 16/44.1 PCM wav files, that'd be somewhere in the ballpark of 16 kbps, right?
Yeah, except: the sample files are mono.
So that means CDDA encoded at 16 kbps as dual mono, without any stereo decorrelation strategy. I have not bothered to look up whether they have any stereo decorreleation algorithm (yet), but obviously that is room for improvement - and also an opportunity to spend more processing power.

I'm going to go out on a limb and say it either sounds bad
Well you can test it ... ? Although the samples are not that interesting ...
or is completely impractical for most use cases.
As of now? Sure.
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: rc55 on 2023-09-26 18:01:38
... it's made a 40MB exe ...
This does not include the weights file which is ~300MB. The weights file contains the model data used to encode and decode the audio and is a necessity, and the model is not interchangeable. You have to use the same model to decode any encoded file.
Thanks for the analysis! Out of curiosity: could you 7zip (preset Ultra) that 40MB exe and 300MB weight file and let us know what file size comes out? That would be a rough estimate of how much room for reduction there is.

Thanks,

Chris

Happy to oblige!
dac.exe 41,865,797 bytes
dac.7z 41,456,443 bytes

weights.pth 306,720,768 bytes
weights.7z 278,740,892 bytes
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: binaryhermit on 2023-09-27 11:00:28
I'm going to go out on a limb and say it either sounds bad
Well you can test it ... ? Although the samples are not that interesting ...
Ever since the "64 kbps WMA sounds as good as 128 kbps mp3" stuff, I don't trust a codec developer to not cherrypick a codec implementation that's subpar for the opposing format or cherrypick samples, or otherwise be somewhat dishonest in things like this.

And it sounds like you currently need a nVidia GPU to work with this codec?  That's something I don't have to work with.
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: cid42 on 2023-09-27 15:25:52
I'm going to go out on a limb and say it either sounds bad
Well you can test it ... ? Although the samples are not that interesting ...
Ever since the "64 kbps WMA sounds as good as 128 kbps mp3" stuff, I don't trust a codec developer to not cherrypick a codec implementation that's subpar for the opposing format or cherrypick samples, or otherwise be somewhat dishonest in things like this.

And it sounds like you currently need a nVidia GPU to work with this codec?  That's something I don't have to work with.

You're right to distrust 1st party benchmarks always.

The following is vague guessing because I only have vague awareness of the tech, so pinch of salt:

There appears to be a CPU and GPU mode so you can run on the CPU, but likely the GPU mode is CUDA which is proprietary nvidia lock-in tech. It may be possible for AMD GPU's to run the cuda code (or a reasonably simple port job of it to hip) using rocm. On the other hand the repo contains mostly python (albeit it does reference cuda), the readme makes reference to torchrun which is presumably pytorch and I know Pytorch works on AMD, so maybe it wouldn't take too much for AMD GPU's to work. intel dGPU's I have less of a clue, they have oneAPI that they're trying to push as an interoperable standard, and apparently can also do pytorch. If you have AMD/intel GPU's and want to try then godspeed, it's likely the way of pain even if it is possible.
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: bennetng on 2023-09-27 16:02:58
I am still using a 2015 GTX950 which has 1.88x speed of GT1030 according to some gaming benchmarks, but still slow. I mean, when this thing becomes feasible on consumer level then it could be used on streaming services and such, or before this happens I can already plug myself into The Matrix.

Probably much more interesting if they can make every 64kbps wma file sounds like lossless without cheating, like looking up on some existing lossless music catalogs to find the same song.
Title: Re: Descript Audio Codec (.dac) - 90x smaller than .wav?
Post by: rc55 on 2024-06-07 14:15:15
DAC-JAX: A JAX Implementation of the Descript Audio Codec
https://arxiv.org/abs/2405.11554

Quote
We present an open-source implementation of the Descript Audio Codec (DAC) using Google's JAX ecosystem of Flax, Optax, Orbax, AUX, and CLU. Our codebase enables the reuse of model weights from the original PyTorch DAC, and we confirm that the two implementations produce equivalent token sequences and decoded audio if given the same input. We provide a training and fine-tuning script which supports device parallelism, although we have only verified it using brief training runs with a small dataset. Even with limited GPU memory, the original DAC can compress or decompress a long audio file by processing it as a sequence of overlapping "chunks." We implement this feature in JAX and benchmark the performance on two types of GPUs. On a consumer-grade GPU, DAC-JAX outperforms the original DAC for compression and decompression at all chunk sizes. However, on a high-performance, cluster-based GPU, DAC-JAX outperforms the original DAC for small chunk sizes but performs worse for large chunks.