Print Page - New lossless audio codec in development

Title: New lossless audio codec in development
Post by: mycroft on 2024-12-13 11:19:18

I got idea to make new audio lossless codec.

Main idea is to allow non-intra frames, like done in mlp/truehd but better and with bigger frame sizes.

Its currently in R&D phase only.

What do you think? Can using non-intra frames make compression ratio really better?
Usually lossless audio codecs use just LPC for prediction. I think this is not always optimal solution for compression.

Title: Re: New lossless audio codec in development
Post by: Klymins on 2024-12-13 12:05:52

I'm actually not interested in lossless audio codecs as I think that lossy codecs can achieve total transparency but I agree that lossless audio codecs have some good use cases (preventing generation loss in some cases and scientific scerarios), so that's good. For your question, I don't have enough knowledge to answer it. But I have three questions:

1: Will it support low bit depths like 8bps?
2: Which channel combinations will it support?
3: Will it use frequency domain? Could it provide better compression ratios even for lossless?

Title: Re: New lossless audio codec in development
Post by: ktf on 2024-12-13 12:43:24

Well, yes, I think this is a great idea, and tremendous if you could pull it off.

There have been several attempts at exploiting the as of yet mostly untapped potential of FLAC files with a variable blocksize. It turns out nobody has come up with a good and fast algorithm to determine how to split the audio in blocks in such a way that it improves compression. I would think an algorithm to determine where to use inter frames would be even more challenging, and potentially inspiring in solving the problem I just mentioned.

Title: Re: New lossless audio codec in development
Post by: magicgoose on 2024-12-13 14:20:56

I have a suspicion that some of those newer machine learning techniques could be used to make initial guesses about where are the sections that are similar enough to be useful for referencing to reduce redundancy.
Even then, there's a problem that pieces can sound very similarly but with rather different waveforms, so even after having those references I think it won't be an easy task to really benefit from them on many types of recordings where the repetition is not due to simple copy paste.

Title: Re: New lossless audio codec in development
Post by: genuine on 2024-12-13 15:46:43

Quote from: ktf on 2024-12-13 12:43:24

There have been several attempts at exploiting the as of yet mostly untapped potential of FLAC files with a variable blocksize. It turns out nobody has come up with a good and fast algorithm to determine how to split the audio in blocks in such a way that it improves compression.

If the length of the processing time will not be a problem, this process can be done even if it is not perfect. However, it will definitely not be suitable for practical use. And as side information, the size of each block will also need to be kept. This will also take away some of the return.

Title: Re: New lossless audio codec in development
Post by: mycroft on 2024-12-13 15:58:59

Quote from: Klymins on 2024-12-13 12:05:52

I'm actually not interested in lossless audio codecs as I think that lossy codecs can achieve total transparency but I agree that lossless audio codecs have some good use cases (preventing generation loss in some cases and scientific scerarios), so that's good. For your question, I don't have enough knowledge to answer it. But I have three questions:

1: Will it support low bit depths like 8bps?
2: Which channel combinations will it support?
3: Will it use frequency domain? Could it provide better compression ratios even for lossless?

8bit should be trivially supported, and that one could benefit the most because it only have 256 different states to use at once.

For about channel combinations, I currently try to make mono encoding compress well, once and if it reach state that it outperforms FLAC/TAK/Wavpack then I will try to make >1 possible, but for anything >2 it would be challenging to reach fast and really good compression at same time.

Currently working purely in time domain and using frequency domain for picking split points.

First idea is to compress all kinds of fixed sin/cos waves really good at zero extra cost.
Next step is to do similar with more complex sounds.

It would be nice to use only frequency domain, but once you get magnitude and phase you are more/less stuck.
Magnitude changes are less demanding to compress, while phase looks like pure noise.
Unwrapping phase could help, but that also have other problems.

Currently concentrating on just residue coding for simple sine waves, later will add and experiment with LPC prediction for less trivial waves. The most problematic part is compressing pure noise, my idea is just split where noise is and encode it own subframe.

Title: Re: New lossless audio codec in development
Post by: Porcus on 2024-12-13 17:06:35

I guess your purpose is something clever to improve compression without taking aeons. And the following might be worth something yes:

Quote from: mycroft on 2024-12-13 15:58:59

and using frequency domain for picking split points.

Yeah, as @ktf points out, it is not an easy task to do in variable-block FLAC without taking ages, and so it may call for some fast signal analysis trick to differentiate the strategy by input properties.
You probably know the block-length switching scheme of MPEG-4 ALS, see 3.5 of http://elvera.nue.tu-berlin.de/files/1216Liebchen2009.pdf - or at least IIRC something similar in TAK. The ALS strategy is confined to "successive halvings", not unlike what FLAC does with partitioning with different Rice exponents - which sometimes makes it outcompress the competition (https://hydrogenaud.io/index.php/topic,123025.msg1018251.html#msg1018251) even if FLAC actually has a "design flaw" there, ruling out the combination of any significant prediction length with extremely fine partition.

Also you could try to allow for more residual encoding methods and selection between them. Or other methods for LPC analysis, like Burg's algorithm, or to specify coefficients on the nth order difference rather than the nth past (why try that? FLAC can make small gains from setting precision, i.e. how many bits each need - if you want to switch it more often, then what? Storing the predictor will matter more?)

There are of course other possible uses than plainly compressing mono or stereo:
* One for the possible future, is to facilitate native handling of object-based audio.
* One that should have been around twenty years ago, is a format that stores CD rips with subchannel/correction data that a ripping app could read upon re-reading on a different drive - and could store several concurrent rips, which are for most samples bit-identical (up to offset?). Actually for CD use, a good frame size would be say 2^N * 588, though it needs a rule to subdivide below 147.

Quote from: mycroft on 2024-12-13 15:58:59

First idea is to compress all kinds of fixed sin/cos waves really good at zero extra cost.

That is something that doesn't necessarily follow from being good at "real-world audio" (a sine could need to tweak one parameter to arbitrary resolution) - but if you start at that from a bottom-up perspective and get it to work ...
I did a test on upsamples, which could give an idea of how strange "artificially smooth" signals act: https://hydrogenaud.io/index.php/topic,125607.0.html , I mean watch the difference between codecs.
Who knows what fraction of hi-rez audio has just noise, upsampling artefacts, and maybe actual overtones ... Again, if you get something out of signal analysis in the frequency fomain, you might improve compression of such spacewasters in the online store retail. (Just don't expect them to come running to pay you for exposing their signals as tons of empty bits ...)

Title: Re: New lossless audio codec in development
Post by: mudlord on 2024-12-13 17:56:48

Quote from: mycroft on 2024-12-13 11:19:18

Please remove my account from this forum.

I thought you want to leave? Or are you trolling? I really don't care if you are, just curious.

Title: Re: New lossless audio codec in development
Post by: Squeller on 2024-12-13 18:38:43

Quote from: Klymins on 2024-12-13 12:05:52

I'm actually not interested in lossless audio codecs

Then don't taint this thread with your presence.

Title: Re: New lossless audio codec in development
Post by: mycroft on 2024-12-13 19:55:03

Quote from: Porcus on 2024-12-13 17:06:35

I guess your purpose is something clever to improve compression without taking aeons. And the following might be worth something yes:

Quote from: mycroft on 2024-12-13 15:58:59
and using frequency domain for picking split points.

Yeah, as @ktf points out, it is not an easy task to do in variable-block FLAC without taking ages, and so it may call for some fast signal analysis trick to differentiate the strategy by input properties.
You probably know the block-length switching scheme of MPEG-4 ALS, see 3.5 of http://elvera.nue.tu-berlin.de/files/1216Liebchen2009.pdf - or at least IIRC something similar in TAK. The ALS strategy is confined to "successive halvings", not unlike what FLAC does with partitioning with different Rice exponents - which sometimes makes it outcompress the competition (https://hydrogenaud.io/index.php/topic,123025.msg1018251.html#msg1018251) even if FLAC actually has a "design flaw" there, ruling out the combination of any significant prediction length with extremely fine partition.

Also you could try to allow for more residual encoding methods and selection between them. Or other methods for LPC analysis, like Burg's algorithm, or to specify coefficients on the nth order difference rather than the nth past (why try that? FLAC can make small gains from setting precision, i.e. how many bits each need - if you want to switch it more often, then what? Storing the predictor will matter more?)

Maybe you are right about rudimentary sin/cos waves and probably this initial approach would work only for that simple cases and with not dramatic noise corruption.
This splitting of audio into similar chunks is not that trivial, at least I'm not aware of any robust and fast algorithm.
Currently I use (normalized)cross-correlation and auto-correlation computed via RDFT and that is too crude approach.

Will also explore integer MDCT maybe it can provide something more useful.

Title: Re: New lossless audio codec in development
Post by: C.R.Helmrich on 2024-12-16 07:50:39

Quote from: mycroft on 2024-12-13 11:19:18

... allow non-intra frames, like done in mlp/truehd but better and with bigger frame sizes.

Its currently in R&D phase only.

What do you think? Can using non-intra frames make compression ratio really better?
Usually lossless audio codecs use just LPC for prediction. I think this is not always optimal solution for compression.

I don't know much about MLP, but the non-Intra frame concept you're proposing sounds, to me, like a long-term prediction (LTP) approach, i.e., predicting samples in a given block from samples in a previous block. MPEG-4 ALS supports this, and it does seem to work well on some musical audio. From http://elvera.nue.tu-berlin.de/files/1216Liebchen2009.pdf:

Chris

Title: Re: New lossless audio codec in development
Post by: mycroft on 2024-12-16 09:33:32

I have little experience with MPEG-4-ALS and its very over-engineered codec IMHO.

My idea is like following (maybe its exact as LTP or not):

Take for example sine wave, find pitch, and split at correct zero crossing, if sine period is not fractional but integer you will get 0 difference with previous sine period. Now you can compress sine wave or any wave (if you find correct period) that is just repeated over and over again with almost 0 extra cost.
If there is no exact match just pick one period with max correlation and store difference via LPC+entropy+residue.
Note that both number of samples and lag/offset are variable here, because using fixed size frames and then doing lags is pointless IMHO.
So each frame would be of variable length - number of samples when encoding single channel.

For >1 channels the INTRA+INTER frames come to mind within one big super-frame because L/R/.. channels may not be very correlated most of time, so each lags and sizes are different in each of channel samples.

This idea of picking variable length periods works very well (at least) with the ascale (tempo adjuster filter) but it have some limitations with extremely low frequency content (and it does not handle background noise periods as I would like) and multi-channels filtering have sync issues one does not change periods to match across all channels.

I think this splitting audio into periods of equal correlations is similar to YIN algorithm?

Title: Re: New lossless audio codec in development
Post by: Porcus on 2024-12-16 16:57:45

I tested my usual 38 CDs with the "-p" long-term prediction switch in MPEG4-ALS, comparing -l (which searches for wasted bits and otherwise is the default, prediction order 10) with -l -p. Grand total it averaged 0.6 percent. That's percent of -l compressed size, not points relative to WAVE size.

As one could expect, there is more to save in classical music: 1.2 percent (ranging 0.6 to 2.3, the latter being flute)
The rest averaged 0.3 percent, ranging from 0.07 percent (Psycroptic and Sodom, that's thrash and tech. death metal) to two loners up at .9 and .8 (Sopor Aeternus and Springsteen, that's darkwave and singer/songwriter).

How much that is ... well. Up to opinion.

I also find ALS to be a bit "over-engineered", but back then they didn't know which over-engineering ideas would work out good. For example, the format allows for order-1023 prediction, and that's ... a lot.
They have an alternative entropy encoding method too, and a selection of different such ones might be worth checking out.

Title: Re: New lossless audio codec in development
Post by: dave_swanson on 2024-12-16 17:34:34

Since I am a layman to this topic, my perception to this topic is a different modality. I am approaching the incoming audio data stream as a delta sigma approach for each audio channel. Create a sufficient static buffer for lets say 4 seconds of audio at whatever sample rate you decide then perform a run-length encoding algorithm perhaps Group 4 Compression (ITU-T T.6). Then that output to another algorithm that dynamically changes to patterns within the frequency domain (FFT). Common tonal or frequencies reach optimal compression. Further break frequencies into most common frequency bands for their own data compression groups. I am thinking this type of approach is for archival purposes and not for real time streaming. I hope that makes sense.

Title: Re: New lossless audio codec in development
Post by: Porcus on 2024-12-17 10:23:05

There was a lossless codec posted here this autumn, which tried to get into the TAK ballpark by LPC and then a residual encoding scheme that differs from FLAC: https://hydrogenaud.io/index.php/topic,126582.0.html

Title: Re: New lossless audio codec in development
Post by: btc on 2024-12-18 17:36:51

Anyone is free to work on anything that he/she likes.

It's just my opinion that compression gains for new lossless format will be very limited. +10% vs FLAC or less.
Lossless part of MPEG D Audio format (xHE-AAC) was improved only by 3-6% compared to older formats.

New lossy format is a different thing. Many patents expired or will be expiring as in case of HE-AAC. SBR patents will expire in 2025 and Parametric Stereo patents in 2026. It's possible to model these parametric tools to scale very well with high bitrates. Also there is AI.
+20%-30% compression gain for codec like Opus is achievable while keeping complexity acceptable, and even larger gains for multichannel audio.

Title: Re: New lossless audio codec in development
Post by: Hakan Abbas on 2024-12-19 11:01:35

Quote from: btc on 2024-12-18 17:36:51

Anyone is free to work on anything that he/she likes.

It's just my opinion that compression gains for new lossless format will be very limited. +10% vs FLAC or less.
Lossless part of MPEG D Audio format (xHE-AAC) was improved only by 3-6% compared to older formats.

New lossy format is a different thing. Many patents expired or will be expiring as in case of HE-AAC. SBR patents will expire in 2025 and Parametric Stereo patents in 2026. It's possible to model these parametric tools to scale very well with high bitrates. Also there is AI.
+20%-30% compression gain for codec like Opus is achievable while keeping complexity acceptable, and even larger gains for multichannel audio.

Yes, the compression ratio in lossless audio compression is really limited. FLAC (default mode) can be improved by 5% on average. But we know that even this is not suitable for practical use. That's why the speed of running becomes more important.

If my glasses don't deceive me, my own lossless codec is currently able to offer the highest level of compression of FLAC by halving the processing speed. But even with this small compression gain, I can't accept halving the speed, because I don't think it's worth it.

On the other hand, with lossy codecs we always have more options. Just like with audio data, it's the same with image data. If the end user is not bothered and does not sense anything, the tricks can continue.

And AI (neural networks and deep neural networks) may seem at first glance suitable for audio compression. Compared to traditional methods, AI can do slightly better compression on specially selected and trained data by expending enormous energy and time. But it is currently not suitable for practical use in the real world. We can find many academic papers on this topic. And interestingly, the majority of them only talk about the compressed result. Of course, they don't add the size of this particular codec to the compressed result. They don't talk much about the processing time, nor about the size of the model, nor about the size of the decoder.

Title: Re: New lossless audio codec in development
Post by: Klymins on 2024-12-19 11:11:10

Quote from: Hakan Abbas on 2024-12-19 11:01:35

Yes, the compression ratio in lossless audio compression is really limited. FLAC (default mode) can be improved by 5% on average. But we know that even this is not suitable for practical use. That's why the speed of running becomes more important.

If my glasses don't deceive me, my own lossless codec is currently able to offer the highest level of compression of FLAC by halving the processing speed. But even with this small compression gain, I can't accept halving the speed, because I don't think it's worth it.

On the other hand, with lossy codecs we always have more options. Just like with audio data, it's the same with image data. If the end user is not bothered and does not sense anything, the tricks can continue.

And AI (neural networks and deep neural networks) may seem at first glance suitable for audio compression. Compared to traditional methods, AI can do slightly better compression on specially selected and trained data by expending enormous energy and time. But it is currently not suitable for practical use in the real world. We can find many academic papers on this topic. And interestingly, the majority of them only talk about the compressed result. Of course, they don't add the size of this particular codec to the compressed result. They don't talk much about the processing time, nor about the size of the model, nor about the size of the decoder.

I agree with you, and I hate artificial intelligence in almost everything including codecs. I think it's completely soulless.

Title: Re: New lossless audio codec in development
Post by: mudlord on 2024-12-21 18:02:44

Quote from: btc on 2024-12-18 17:36:51

Anyone is free to work on anything that he/she likes.

I am gonna risk being banned, but I call BS. Countless times over the years people pestered me to work on very specific things, in very specific ways. And if I deviate in any form, I am Worse Than Hitler. This is why I grown to hate things.

So no. At some point tho you have to learn to not care about what the public wants or even thinks of you, since really some of them are acting in utter bad faith, constantly, for your own sanity.

Title: Re: New lossless audio codec in development
Post by: Porcus on 2024-12-21 21:51:13

Quote from: mudlord on 2024-12-21 18:02:44

Quote from: btc on 2024-12-18 17:36:51
Anyone is free to work on anything that he/she likes.

I am gonna risk being banned, but I call BS. Countless times over the years people pestered me to work on very specific things, in very specific ways. And if I deviate in any form, I am Worse Than Hitler.

Hey. You are free to work on anything you like in addition! ;)

Title: Re: New lossless audio codec in development
Post by: mudlord on 2024-12-22 02:34:56

Quote from: Porcus on 2024-12-21 21:51:13

Quote from: mudlord on 2024-12-21 18:02:44
Quote from: btc on 2024-12-18 17:36:51
Anyone is free to work on anything that he/she likes.

I am gonna risk being banned, but I call BS. Countless times over the years people pestered me to work on very specific things, in very specific ways. And if I deviate in any form, I am Worse Than Hitler.

Hey. You are free to work on anything you like in addition! ;)

Which is pretty much exactly how it is. I get assigned to work on garbage, yet I have the *privilege* to work on things I also like.

Fancy that.

Title: Re: New lossless audio codec in development
Post by: C.R.Helmrich on 2024-12-22 15:50:14

Quote from: mycroft on 2024-12-16 09:33:32

My idea is like following (maybe its exact as LTP or not):
... if sine period is not fractional but integer you will get 0 difference with previous sine period. ...
If there is no exact match just pick one period with max correlation and store difference via LPC+entropy+residue.
Note that both number of samples and lag/offset are variable here, because using fixed size frames and then doing lags is pointless IMHO.

That sounds exactly like the LTP approach, except for the variable frame size. But why should fixed frame sizes be pointless in that case? You don't have to start or end at a zero-crossing, do you? Just try to find the "best" lag for the waveform segment in the given frame.

Quote

I think this splitting audio into periods of equal correlations is similar to YIN algorithm?

Yes, I think so. YIN is a lag-search algorithm.

Quote from: Hakan Abbas

On the other hand, with lossy codecs we always have more options. Just like with audio data, it's the same with image data. If the end user is not bothered and does not sense anything, the tricks can continue.

Well, as someone involved in that craft for 20 years, I can tell you that many things have been tried and that it's also really hard to make progress over state-of-the-art lossy codec solutions like MPEG-H Audio, at least at medium-low to high bit-rates. At very low rates, AI (more precisely, machine learning) does show considerable benefit, but as mentioned, requires much more computattional resources.

Quote from: btc on 2024-12-18 17:36:51

+20%-30% compression gain for codec like Opus is achievable while keeping complexity acceptable, and even larger gains for multichannel audio.

Given my above comments, I have to say I doubt that.

Chris

Title: Re: New lossless audio codec in development
Post by: btc on 2024-12-26 03:28:46

Quote from: C.R.Helmrich on 2024-12-22 15:50:14

Quote from: btc on 2024-12-18 17:36:51
+20%-30% compression gain for codec like Opus is achievable while keeping complexity acceptable, and even larger gains for multichannel audio.
Given my above comments, I have to say I doubt that.

Opus is low delay format, moving to high delay can save +10% (and that's being conservative, AAC-LD lossess much more than 10% to non-LD AAC family).
Another ++10% is reasonably achievable as processing budget of modern SoC has been significantly increased since MPEG-D/H and Opus were released.

Title: Re: New lossless audio codec in development
Post by: mycroft on 2025-04-11 09:54:09

I found out python "magic" code that does least square solution of impulse given two input vectors.
I think that this could be useful for lossless/lossy encoder, and maybe works better than LPC which is already used in many other lossless encoders. Are there any know-hows for this kind of approach?

Title: Re: New lossless audio codec in development
Post by: SebastianL on 2025-04-13 17:47:09

Quote from: mycroft on 2025-04-11 09:54:09

I found out python "magic" code that does least square solution of impulse given two input vectors.
I think that this could be useful for lossless/lossy encoder, and maybe works better than LPC which is already used in many other lossless encoders. Are there any know-hows for this kind of approach?

Least squares is used in almost every audio codec
LPC in FLAC is (static) Least Squares using Levinson recursion (https://en.wikipedia.org/wiki/Levinson_recursion)

I guess we'll have to wait a bit for your new codec :D

Title: Re: New lossless audio codec in development
Post by: mycroft on 2025-04-13 17:59:23

I was referring to this little gem: https://github.com/maka89/LeastSquaresFIR
Doing solution for y = x * z, where x and y are known.
It is not levinson recursion.
It is multiple algorithms with similar output; the latest one as used in example script is more powerful.
Its iterative algorithm, so not that fast like "trivial" levinson in FLAC.
I did just limited testing in my spare time and it is pretty good.
Maybe its my bias as I did not do more testing...

Title: Re: New lossless audio codec in development
Post by: SebastianL on 2025-04-13 19:39:47

Quote from: mycroft on 2025-04-13 17:59:23

I was referring to this little gem: https://github.com/maka89/LeastSquaresFIR
Doing solution for y = x * z, where x and y are known.
It is not levinson recursion.
It is multiple algorithms with similar output; the latest one as used in example script is more powerful.
Its iterative algorithm, so not that fast like "trivial" levinson in FLAC.
I did just limited testing in my spare time and it is pretty good.
Maybe its my bias as I did not do more testing...

As i already wrote - nearly all audio codecs model audio linear as y = x*z
It all boils down how you calculate the weights z
least squares using cholesky decomposition
least squares using levinson durbin
recursive least squares
iteratively reweighted least squares
(normalized) least mean squares
etc...

he is using conjugate gradient method using FFTs

Title: Re: New lossless audio codec in development
Post by: mycroft on 2025-04-13 19:45:47

But with this one, impulse result generated is so good that its residues are very very small.
I'm not arguing about linear audio modeling. I think this approach is superior to any one produced before.
My dream is to code very very efficient float hybrid (lossless+lossy) audio codec.

Title: Re: New lossless audio codec in development
Post by: Porcus on 2025-04-13 20:09:17

Which of the ones mentioned? If you use the Toeplitz solver, I guess it must then make the same kind of approximations that the Levinson-Durbin method uses? And if then it improves, it "must" be about numerical stability (or maybe not "stability" in the strict sense, but in roundoffs)?
There are other methods than least squares available in statistics packages, of course, but if different approaches to least squares actually make a sizeable impact, it is maybe interesting in its own right.
(And then: tried Burg's method? Used in Opus.)

Also a minor point could be a constant. Say for Rice coded residuals (like FLAC) then a residual of N is occasionally worse than -N if N happens to be k*(2^r) for the Rice exponent used. Likely happens only every now and then, and likely no big benefit in tweaking it, but you may want to keep residuals not close to "0", but to a small negative value. -1/4?

Title: Re: New lossless audio codec in development
Post by: SebastianL on 2025-04-14 10:13:51

Quote from: Porcus on 2025-04-13 20:09:17

And if then it improves, it "must" be about numerical stability (or maybe not "stability" in the strict sense, but in roundoffs)?

not necessarily - its usually about proper "regularization"
e.g. for ordinary least squares (like used in Sac) you first make an estimate of the auto-correlation matrix and then inverse this matrix.
The estimation error can be compensated by different techniques e.g. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=433840

On the other hand you don't really want to do "least squares" (L2) but "least absolute deviation" (L1).
The latter usually can not be calculated in closed form so you (again) rely on regularization or other methods like "Iteratively reweighted least squares"

Title: Re: New lossless audio codec in development
Post by: Porcus on 2025-04-14 18:37:29

Quote from: SebastianL on 2025-04-14 10:13:51

Quote from: Porcus on 2025-04-13 20:09:17
And if then it improves, it "must" be about numerical stability (or maybe not "stability" in the strict sense, but in roundoffs)?

not necessarily - its usually about proper "regularization"

... of the matrix to invert, then? But - as long as you stick to Toeplitzes - that is before the choice of solver? Or are the matrices so often so close to singular that method choice has impact far beyond what you would call roundoffs?

Edit: Though, if that repo offers a straightforward method that makes better than the other (straightforward) methods for this particular application, then that is interesting in itself.

Quote from: SebastianL on 2025-04-14 10:13:51

On the other hand you don't really want to do "least squares" (L2) but "least absolute deviation" (L1).

ffmpeg's flac encoder can do an IRLS procedure. A similar attempt was made with the reference implementation a few years ago, trying to approach an L1 estimate: https://hydrogenaud.io/index.php/topic,120158.msg1003256.html#msg1003256 .
Impact was surprisingly small, at least for the computational cost.
(And, testing ffmpeg's encoder I found that running more than say, six or seven reweighting passes, then often the files would start to grow again. If I remember correct, it would do the N iterations and then pick the final result - rather than the best of them.)

Title: Re: New lossless audio codec in development
Post by: SebastianL on 2025-04-15 07:38:47

Quote from: Porcus on 2025-04-14 18:37:29

... of the matrix to invert, then? But - as long as you stick to Toeplitzes - that is before the choice of solver? Or are the matrices so often so close to singular that method choice has impact far beyond what you would call roundoffs?

Key focus of regularization is to prevent overfitting and improve generalizability by introducing a small bias.
In the case of ordinary least squares a common technique is Ridge regression.
https://en.wikipedia.org/wiki/Ridge_regression

In code it is just adding a constant on the diagonal, check Sac
https://github.com/slmdev/sac/blob/72de44ba2eb7fe91409a04c8b9835e60418bdd4c/src/common/utils.h#L190

Effects vary on the signal but are not to be underestimated.
Determining the optimal regularization param is difficult.
If its too large you have a well-posed problem but the solution may be too far away from the noise-free solution.
If its too small you are near the noise-contaminated solution, which may be also too far away from the noise-free solution.

Quote

ffmpeg's flac encoder can do an IRLS procedure. A similar attempt was made with the reference implementation a few years ago, trying to approach an L1 estimate: https://hydrogenaud.io/index.php/topic,120158.msg1003256.html#msg1003256 .
Impact was surprisingly small, at least for the computational cost.
(And, testing ffmpeg's encoder I found that running more than say, six or seven reweighting passes, then often the files would start to grow again. If I remember correct, it would do the N iterations and then pick the final result - rather than the best of them.)

Thanks for the link. I can not tell for sure, why it is not improving the results more but imo
Flacs architecture is too limited to really benefit from more accurate coefficients.
IRLS is better used with very high orders (~2^10) because you try to determine a sparse system.
Its unlikely you find a lot of "sparseness" in the last 32 samples - or whatever the maximum for Flac is.

Also you really have to work on your heuristics - and test them,
because in the end you don't want to reduce a p-norm but the result of residual encoding.

I tested a lot of different cost-functions for "how much benefit in compression does this parameter change make"
and came to the conclusion that L1 or L2/RMS (which are most often used) are not a good approximation for the entropy encoding stage in lossless audio compression.

you can try it yourself using Sac
It supports L1, RMS, o0-entropy, golomb and bitplane cost function

All profiles except --best use "entropy"
https://github.com/slmdev/sac/blob/72de44ba2eb7fe91409a04c8b9835e60418bdd4c/src/libsac/cost.h#L69

which is surprising, because o0-entropy does not care about the size of coefficients, only their distribution.

Title: Re: New lossless audio codec in development
Post by: C.R.Helmrich on 2025-04-15 08:42:20

Oh, a very detailed technical discussion, nice!

Quote from: SebastianL on 2025-04-15 07:38:47

... o0-entropy does not care about the size of coefficients, only their distribution.

Can you clarify what you mean by "distribution" in this case? Something related to the variance of the (quantized) coefficient values?

Chris

Title: Re: New lossless audio codec in development
Post by: SebastianL on 2025-04-15 09:31:51

Quote from: C.R.Helmrich on 2025-04-15 08:42:20

Can you clarify what you mean by "distribution" in this case? Something related to the variance of the (quantized) coefficient values?

Sorry that was a typo - i was not talking about coefficients but the errors=residuals.
When you make an approximation of the coding cost you usually do
L1: sum of absolute values (assume laplace errors)
L2: sum of squared values (assume gaussian errors)

which makes sense, as smaller residuals tend to result in smaller encodings
My observation was, that (at least for Sac) there are better approximations for the encoding cost.

Entropy (in information theory) measures the uncertainty of the errors.
Thus, if the errors are e.g. constant, it does not matter if its large or small - the information content is zero.
You can check https://en.wikipedia.org/wiki/Entropy_(information_theory) (https://en.wikipedia.org/wiki/Entropy_(information_theory))

and see the formula in Sac using an order-0 model
https://github.com/slmdev/sac/blob/72de44ba2eb7fe91409a04c8b9835e60418bdd4c/src/libsac/cost.h#L90

Title: Re: New lossless audio codec in development
Post by: mycroft on 2025-04-15 11:11:16

The last one is trivial entropy calculation from histogram, right?

Title: Re: New lossless audio codec in development
Post by: SebastianL on 2025-04-15 11:20:52

yes order-0 markov implies "counting the number of occurrences"

the cost function has its limits - using a golomb coder as approximation is better but a lot slower
you can try it out yourself using e.g. --optimize=0.2,250,c where c is from [L1,RMS,ENT,GLB,BPN]

I advice using --verbose to see the actual value of the objective function after optimization,
because a lower objective function does not automatically result in a smaller file

Title: Re: New lossless audio codec in development
Post by: C.R.Helmrich on 2025-04-15 11:47:19

Quote from: SebastianL on 2025-04-15 09:31:51

Sorry that was a typo - i was not talking about coefficients but the errors=residuals.
...
if the errors are e.g. constant, it does not matter if its large or small - the information content is zero.

Ah, the prediction residual coefficients, thanks for clarifying. Makes sense, and the histogram thing ~~for "more trivial" residuals~~ is a good idea. Thanks!

Chris

Title: Re: New lossless audio codec in development
Post by: Porcus on 2025-04-15 15:45:20

Quote from: SebastianL on 2025-04-15 07:38:47

Quote from: Porcus on 2025-04-14 18:37:29
ffmpeg's flac encoder can do an IRLS procedure. A similar attempt was made with the reference implementation a few years ago, trying to approach an L1 estimate: https://hydrogenaud.io/index.php/topic,120158.msg1003256.html#msg1003256 .
Impact was surprisingly small, at least for the computational cost.
(And, testing ffmpeg's encoder I found that running more than say, six or seven reweighting passes, then often the files would start to grow again. If I remember correct, it would do the N iterations and then pick the final result - rather than the best of them.)

Thanks for the link. I can not tell for sure, why it is not improving the results more but imo
Flacs architecture is too limited to really benefit from more accurate coefficients.
IRLS is better used with very high orders (~2^10) because you try to determine a sparse system.
Its unlikely you find a lot of "sparseness" in the last 32 samples - or whatever the maximum for Flac is.

32 is max for FLAC the format, but "all the development" has focused on up to 12, because that ensures it is within the streamable subset. Making the reference encoder select well between orders all the way up to 32? That territory is only lightly explored.

One observation is that reweighting and iterating appears expensive compared to trying different shots at it. One is how reference FLAC tries different weightings (first part of block, and last part of block - even more for -8) and thus often can "handle" a difficult part of the signal by simply trying with and without it. Another is that it partitions the block ("subframe", i.e. a one channel - and 4096 samples typically) in 2^r partitions each with its own Golomb-Rice exponent - that sometimes does wonders.

Maybe FLAC is quite close to how well you can actually model the signal by order 12 (and no dynamic update). TAK does 32 to 160 IIRC. ALS can go up to the 2^10 you mention.

Quote from: SebastianL on 2025-04-15 07:38:47

in the end you don't want to reduce a p-norm but the result of residual encoding.

Plus the size of the predictor. IIRC reference FLAC already does a heuristic that sometimes chooses lower prediction order to save the coefficient bits. Also FLAC can save ~ a per-mille point by trading off the resolution of the predictor vector. How much is that? Hardly enough to care for storage cost. Enough to care for the sport.
(For 16 bit signals, reference FLAC will also limit the resolution to facilitate decoding with 32-bit words. I don't think ffmpeg's implementation cares about that.)

But given the order and the residual coding parameter(s) (the Golomb-Rice exponent for FLAC), then it could in principle make sense to in the end make a slight reweighting with high weight on those that are close to gaining/losing a bit, and low weight on those whose residuals could be let slightly bigger without spending any bits. Not at all a convex optimization program, so I guess the weights have to be set just heuristically. And how much would it make for? Maybe surprisingly little if you have flexibility on the residual coding method?

Title: Re: New lossless audio codec in development
Post by: Hakan Abbas on 2025-04-16 14:35:44

Quote from: mycroft on 2025-04-13 19:45:47

I'm not arguing about linear audio modeling. I think this approach is superior to any one produced before.
My dream is to code very very efficient float hybrid (lossless+lossy) audio codec.

https://github.com/Hakan-Abbas/Dynamic-Linear-Prediction
You can try DLP. Maybe you can see what I can't. This is a new linear prediction method that I worked on a long time ago. It works much more efficiently, especially for small blocks. For larger blocks (over 4,000 samples) the efficiency may decrease.

Apart from this test tool, I haven't had time to measure its performance in a real application yet. The main reason for this is that the processing speed will decrease a bit. The current best parameters need to be updated again at regular intervals. And how long that interval is needs to be determined accurately. Because while it is efficient to go with the best for each small block, there will be a significant penalty in terms of processing speed. If I can integrate it into a real application, I plan to share how it works. I need to be completely sure of the results.

Here's how it works briefly. First a csv file(no, value) or a 16 bit, 2 channel wav file is loaded. If the data is more than 100,000 samples, no graph is drawn. To see the graph, the slider at the bottom should be narrowed by pulling the slider from the right and left to the desired range. For DLP training, samples in the range of 100-1000 are sufficient. The smallest error sum for the relevant block is manually determined by changing the parameters. Then, we can see that these parameters work equally well for many previous and next blocks(according to my observations). So with a quick experiment/training on a small dataset, we can create a very good predictor for a larger dataset. DLP gives better results than Levinson-Durbin in most cases. We can see this by testing, except for very large blocks and very complex data. In addition, we need to know only the 2-3 parameter obtained for DLP.

Various tests can be performed with fixed estimators, Levinson-Durbin or DLP. The parametric data on absolute error and squared error sums can be seen at the bottom.

Title: Re: New lossless audio codec in development
Post by: Porcus on 2025-04-16 15:03:27

What does it do, really? Sure fixed predictors, then a Levinson-Durbin on who knows how you have (or have not) windowed the data - and then: Does the machine learning start to learn from there on, or from scratch?

And are those coefficients at the end really ... did a clever third-order really improve that much over a different least squares algorithm? It shouldn't ... should it?

Title: Re: New lossless audio codec in development
Post by: Hakan Abbas on 2025-04-16 15:33:28

Quote from: Porcus on 2025-04-16 15:03:27

What does it do, really? Sure fixed predictors, then a Levinson-Durbin on who knows how you have (or have not) windowed the data - and then: Does the machine learning start to learn from there on, or from scratch?

And are those coefficients at the end really ... did a clever third-order really improve that much over a different least squares algorithm? It shouldn't ... should it?

DLP is completely different from Levinson-Durbin. However, it can be thought of as a dynamic version of fixed estimators. The learning mechanism in DLP is actually trying to find the best case for a block selected from zero (e.g. 512 samples), i.e. the case with the least error. Here a decision is made by looking at only 2 or 3 samples. And the mistakes made are tried to be improved. This is not very interesting.

But what is interesting is that once the appropriate parameters are set, the best result can be obtained with the same parameters in the previous or subsequent blocks. Even if this depends on the shape of the data, in my tests it can sometimes be valid for hundreds of blocks before or after. you can see this by trying it immediately. Maybe we will also see the negative aspects of the method.

Title: Re: New lossless audio codec in development
Post by: SebastianL on 2025-04-17 08:38:34

Next step in lossless audio compression should involve
1. merging prediction + residual encoding -> model p(x_t | x_{t-1}, x_{t-2}, ...} directly
2. use lstm/gru for prediction, e.g. wavenet: https://arxiv.org/abs/1609.03499

Issue with 2. is that for a frame the NN does not converge fast enough using (stochastic) gradient descent.
Possible solutions:
- use a hybrid approach, where you load a (large) base model from disk and update during prediction
- train a (small) NN on every frame and save it -> have to find a compromise between model-size, accuracy and final file size

Having a large base model opens a discussion how to compare such variants to classical (statistical) codecs
See https://en.wikipedia.org/wiki/Kolmogorov_complexity (https://en.wikipedia.org/wiki/Kolmogorov_complexity)

Title: Re: New lossless audio codec in development
Post by: genuine on 2025-04-17 14:12:03

Quote from: Hakan Abbas on 2025-04-16 15:33:28

DLP is completely different from Levinson-Durbin. However, it can be thought of as a dynamic version of fixed estimators. The learning mechanism in DLP is actually trying to find the best case for a block selected from zero (e.g. 512 samples), i.e. the case with the least error. Here a decision is made by looking at only 2 or 3 samples. And the mistakes made are tried to be improved. This is not very interesting.

But what is interesting is that once the appropriate parameters are set, the best result can be obtained with the same parameters in the previous or subsequent blocks. Even if this depends on the shape of the data, in my tests it can sometimes be valid for hundreds of blocks before or after. you can see this by trying it immediately. Maybe we will also see the negative aspects of the method.

In my experiments I used blocks of 500 samples. In DLP, if the best parameters are determined for each block, the error averages can be much lower. However, even with the same parameters, it seems that many consecutive blocks can operate with similar efficiency. If there are no inaccuracies, I have also found that fixed estimators give very good results for the music I have selected according to Levinson. And they are inexpensive. But Levinson gives better results than fixed estimators and close to DLP on larger blocks.

https://www.rarewares.org/test_samples
ATrain.wav
DLP 1.9 / 0.9 and Levinsion Degree 10

Quote

Range 120,000 - 120,500
P-1: (Absolute Error Avarage) |E|= 443.106   (Root Mean Square Error) RMSE= 562.444
P-2: (Absolute Error Avarage) |E|= 207.323   (Root Mean Square Error) RMSE= 278.034
P-3: (Absolute Error Avarage) |E|= 233.894   (Root Mean Square Error) RMSE= 303.262
P-4: (Absolute Error Avarage) |E|= 337.748   (Root Mean Square Error) RMSE= 428.854
L-D: (Absolute Error Avarage) |E|= 385.817   (Root Mean Square Error) RMSE= 512.501
DLP: (Absolute Error Avarage) |E|= 149.771   (Root Mean Square Error) RMSE= 196.561

Range 123,000 - 123,500
P-1: (Absolute Error Avarage) |E|= 310.696   (Root Mean Square Error) RMSE= 405.519
P-2: (Absolute Error Avarage) |E|= 125.683   (Root Mean Square Error) RMSE= 167.74
P-3: (Absolute Error Avarage) |E|= 121.777   (Root Mean Square Error) RMSE= 153.605
P-4: (Absolute Error Avarage) |E|= 163.332   (Root Mean Square Error) RMSE= 200.383
L-D: (Absolute Error Avarage) |E|= 301.466   (Root Mean Square Error) RMSE= 393.957
DLP: (Absolute Error Avarage) |E|= 81.6198   (Root Mean Square Error) RMSE= 104.565

Range 130,000 - 130,500
P-1: (Absolute Error Avarage) |E|= 257.808   (Root Mean Square Error) RMSE= 346.281
P-2: (Absolute Error Avarage) |E|= 134.14   (Root Mean Square Error) RMSE= 185.864
P-3: (Absolute Error Avarage) |E|= 153.179   (Root Mean Square Error) RMSE= 197.29
P-4: (Absolute Error Avarage) |E|= 225.078   (Root Mean Square Error) RMSE= 281.94
L-D: (Absolute Error Avarage) |E|= 260.342   (Root Mean Square Error) RMSE= 345.381
DLP: (Absolute Error Avarage) |E|= 106.201   (Root Mean Square Error) RMSE= 134.686

Range 150,000 - 150,500
P-1: (Absolute Error Avarage) |E|= 191.308   (Root Mean Square Error) RMSE= 247.628
P-2: (Absolute Error Avarage) |E|= 128.058   (Root Mean Square Error) RMSE= 159.02
P-3: (Absolute Error Avarage) |E|= 175.956   (Root Mean Square Error) RMSE= 222.19
P-4: (Absolute Error Avarage) |E|= 270.7   (Root Mean Square Error) RMSE= 338.797
L-D: (Absolute Error Avarage) |E|= 196.845   (Root Mean Square Error) RMSE= 253.754
DLP: (Absolute Error Avarage) |E|= 96.0289   (Root Mean Square Error) RMSE= 121.365

Range 200,000 - 200,500
P-1: (Absolute Error Avarage) |E|= 174.672   (Root Mean Square Error) RMSE= 214.922
P-2: (Absolute Error Avarage) |E|= 171.539   (Root Mean Square Error) RMSE= 213.147
P-3: (Absolute Error Avarage) |E|= 257.679   (Root Mean Square Error) RMSE= 320.909
P-4: (Absolute Error Avarage) |E|= 407.173   (Root Mean Square Error) RMSE= 510.26
L-D: (Absolute Error Avarage) |E|= 179.434   (Root Mean Square Error) RMSE= 222.833
DLP: (Absolute Error Avarage) |E|= 145.787   (Root Mean Square Error) RMSE= 181.068

And Range 200,000 - 200,500 for DLP 1.2 / 0.9
DLP: (Absolute Error Avarage) |E|= 119.179   (Root Mean Square Error) RMSE= 146.419

https://www.rarewares.org/test_samples
Bachpsichord.wav
DLP 0.9 / 0.6 and Levinsion Degree 10

Quote

Range 781,000 - 781,500
P-1: (Absolute Error Avarage) |E|= 323.39   (Root Mean Square Error) RMSE= 401.816
P-2: (Absolute Error Avarage) |E|= 246.164   (Root Mean Square Error) RMSE= 308.946
P-3: (Absolute Error Avarage) |E|= 264.817   (Root Mean Square Error) RMSE= 332.567
P-4: (Absolute Error Avarage) |E|= 319.684   (Root Mean Square Error) RMSE= 406.375
L-D: (Absolute Error Avarage) |E|= 328.579   (Root Mean Square Error) RMSE= 401.791
DLP: (Absolute Error Avarage) |E|= 236.809   (Root Mean Square Error) RMSE= 295.039

Range 783,500 - 784,000
P-1: (Absolute Error Avarage) |E|= 576.748   (Root Mean Square Error) RMSE= 732.896
P-2: (Absolute Error Avarage) |E|= 739.136   (Root Mean Square Error) RMSE= 923.171
P-3: (Absolute Error Avarage) |E|= 1188.61   (Root Mean Square Error) RMSE= 1482.96
P-4: (Absolute Error Avarage) |E|= 2095.01   (Root Mean Square Error) RMSE= 2596.91
L-D: (Absolute Error Avarage) |E|= 565.347   (Root Mean Square Error) RMSE= 722.512
DLP: (Absolute Error Avarage) |E|= 599.783   (Root Mean Square Error) RMSE= 758.233

Range 785,000 - 785,500
P-1: (Absolute Error Avarage) |E|= 858.63   (Root Mean Square Error) RMSE= 1076.89
P-2: (Absolute Error Avarage) |E|= 1068.71   (Root Mean Square Error) RMSE= 1312.39
P-3: (Absolute Error Avarage) |E|= 1539.71   (Root Mean Square Error) RMSE= 1930.56
P-4: (Absolute Error Avarage) |E|= 2525.31   (Root Mean Square Error) RMSE= 3107.55
L-D: (Absolute Error Avarage) |E|= 801.279   (Root Mean Square Error) RMSE= 1003.89
DLP: (Absolute Error Avarage) |E|= 774.577   (Root Mean Square Error) RMSE= 961.112

Range 789,000 - 789,500
P-1: (Absolute Error Avarage) |E|= 541.868   (Root Mean Square Error) RMSE= 674.853
P-2: (Absolute Error Avarage) |E|= 560.549   (Root Mean Square Error) RMSE= 700.241
P-3: (Absolute Error Avarage) |E|= 763.349   (Root Mean Square Error) RMSE= 948.568
P-4: (Absolute Error Avarage) |E|= 1119.4   (Root Mean Square Error) RMSE= 1435.04
L-D: (Absolute Error Avarage) |E|= 533.588   (Root Mean Square Error) RMSE= 670.675
DLP: (Absolute Error Avarage) |E|= 439.466   (Root Mean Square Error) RMSE= 548.961

Range 795,500 - 796,000
P-1: (Absolute Error Avarage) |E|= 305.056   (Root Mean Square Error) RMSE= 387.991
P-2: (Absolute Error Avarage) |E|= 269.337   (Root Mean Square Error) RMSE= 337.022
P-3: (Absolute Error Avarage) |E|= 321.195   (Root Mean Square Error) RMSE= 393.435
P-4: (Absolute Error Avarage) |E|= 412.865   (Root Mean Square Error) RMSE= 506.12
L-D: (Absolute Error Avarage) |E|= 306.578   (Root Mean Square Error) RMSE= 390.907
DLP: (Absolute Error Avarage) |E|= 229.701   (Root Mean Square Error) RMSE= 290.912

And Range 795,500 - 796,000 for DLP 1.5 / 0.8
DLP: (Absolute Error Avarage) |E|= 174.152   (Root Mean Square Error) RMSE= 207.577

HydrogenAudio

Lossless Audio Compression => Lossless / Other Codecs => Topic started by: mycroft on 2024-12-13 11:19:18