Linear PCM is crap, SACD is best

Topic: Linear PCM is crap, SACD is best (Read 38646 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Linear PCM is crap, SACD is best

Reply #50 – 2004-04-05 10:34:30

MugFunky - you're right that 4-bit + high sample rate and lots of bits + lower sample rate are not fundamentally different, and can both be used to represent exactly the same information within the conventional audio band (20Hz-20kHz).

Assuming correct dither, stable noise shaping, and good filters, there are two fundamental limits:

1. If the sampling rate is lower than twice the frequency you wish to store, it won't work - you get aliasing.

2. If the bitdepth is lower than that required to correctly dither the signal, then you will have uncorrectable distortion. You can reduce the distortion with oversampling and noise shaping, but without correct dither, you can't eliminate it.

This means that the sampling rate has to be higher than 40-50kHz, and the bit depth needs to be greater than 1-bit (because you need 1-bit of dither). A low sample rate suggests you need a high bitdepth, and a low bitdepth suggest you need a high sample rate.

Between these limits, there are lots of possible parameters that will give you 0-20kHz sampled audio with a 120dB dynamic range. For example, you can use an intermediate sample rate and bitdepth like 192kHz 8bits with appropriate noise shaping - many other combinations are possible. Which combination of sample rate, bitdepth, and noise shaping works "best" (in the real world, in objective terms) depends on the design of the converters and the other equipment.

You can over-design the system, to give a wider bandwidth or a lower noise floor, or both.

One thing is certain: if the sample rate or bit depth are too low (see the above two rules) then the system doesn't work ideally in the 20Hz - 20kHz audio band, never mind what happens outside of the audio band!

Cheers,
David.

P.S. as dekkersj hasn't responded to any of my posts so far, I won't bother pulling to pieces rubbish like "Whatever the number of bits you have, you will have per bit a defined level at which the bit turns from 0 to 1." I just wish dekkersj understood the theory of dither. Working with 1-bit audio, it's no surprise that he doesn't! There may be merit in what he's doing (though he hasn't communicated it here), but the "straw man" version of PCM which he's arguing against is pathetic.

Linear PCM is crap, SACD is best

Reply #51 – 2004-04-05 11:07:46

Quote

Quote
low bitresolution affects the perceived audible quality in your opinion?
The first thing what pops up is stereo image. Distorsion is very likely (can be in the frequency and/or phase domain as well as in the amplitude), but I will try to investigate this more with mathematical tools.

Thanks for your reply. So let's have a closer look at stereo image. 3D perception of sounds is caused by

Delay between channels (-> sound from the left arrives earlier at the left ear than at the right ear)
The sound traveling arround the head and being reflected by the ear causing direction-dependant
- frequency-dependant attenuation/amplification (= "equalization")
- frequency-dependant phase shift (= "phase distortion")

Taking these things into account is only necessary for creating binaural recordings for listening with headphones; for playback through amp -> speakers setup it can be simplified to

Delay between channels
Phase distortion (obvious case: invert one channel (= 180° phase distortion) -> stereo image is moved)
Attenuation between channels

Now to find out how many bits resolution are enough for sure, we would need to know what's the smallest delay, phase and volume difference that leads to noticable differences in stereo perception. Does anyone have reliable figures about this?

Some clarification about the resolution vs. dynamic range thing: The often quotet number 16bit = 96dB dynamic range means this (simplified): If a single sample with (at 16bit resolution) max. value = 32767 is rounded from higher resolution, the max. error can be 0.5. 20*log(0.5/32767) = -96.3

If you want to know how much resolution is "left" for a signal with lower than (+/- 32767) sample values, you have to choose: Either you take the sample values as they are or you transform to frequency domain as you (dekkersj) do in your argumentation. If you split the signal in single (1+A)cos(omega*t+phi) waves (with much lower peak volume compared to peak volume of time domain signal) you must take into account that the resulting waves depend on many samples, depending on fft lenght used for time->frequency transformation.

Example: If you have e.g. a fft length of 1024, you get 512 (1+A)cos(omega*t+phi) waves + DC. As easy example, let the original signal be a repetition of -40dB dirac impulses, every 1024 samles, i.e. the sample values of the impulses are 328. This signal can be transformed to 512 waves with amplitudes of 328/512 = 0.64. In your argumentation, we are at a point of (much) too low resolution here because the next possible steps would be 1 (= 3.9dB louder than 0.64) or 0 (= silence).
But now try this: Take any sample and change the value by 1. When looking at the result in some frequency analysis and compare to the plain dirac ipulse, you'll notice that the biggest part of frequencies have a different energy now, but these differences are much smaller than one would expect when following your argumentation.
Conclusion: When looking at frequency (and phase which only exists related to frequencies), the "resolution" is, depending on the fft size you use, much bigger then you suggested. (With increasing fft size the energy of single waves becomes lower, but this is compensated by increasing "resolution".)

You can look at it from another direction: If you do a time->frequency transformation, the error a single sample (rounded to 16bit) can contain (+/- 0.5) must be regarded as distributed to all (e.g. 512) waves. So for every single of those waves, the error caused by one sample is < +/-0.5/512 on average. Using dither (which BTW is done extensively by DSD, as said before) you can make sure that the errors caused by all samples in the fft window (1024 here) don't sum up for certain frequencies (this would be truncation distortion) but are equally distributed (or shifted to less audible frequency ranges = noiseshaping) and canceled to some degree.

Linear PCM is crap, SACD is best

Reply #52 – 2004-04-05 16:01:30

This guy talks just like this other guy I met in head-fi whom after many heated debates, was exposed for not even knowing what a frequency is

Linear PCM is crap, SACD is best

Reply #53 – 2004-04-05 22:13:27

Please, stay on-topic. This is not because someone's claims are dismissed by other members that we can show him disrespect.

Linear PCM is crap, SACD is best

Reply #54 – 2004-04-05 23:07:53

Some theory that I've been thinking about relative to this thread:

1) Sample rate and amplitude resolution can be traded off as previously mentioned.
1 bit = 2x sampling frequency.

For example:
1 bit amplitude resolution with one sample:
sample 1: 1 bit = 1 or 0 for a given sample = 6dB S/N ratio

1 bit amplitude resolution with two samples in the same time period as the above example (2x sampling rate):
sample 1: 1 bit = 1 or 0; sample 2: 1 bit = 1 or 0, average is 1, 0.5 or 0 = 12dB S/N ratio

2) To losslessly (equivalently) represent 16/44.1kHz with a 1 bit resoltion would require a sampling rate of 2^16 x 44,100 = 2.89 GHz (Gigahertz)

3) To losslessly (equivalently) represent 24/192kHz with a 1 bit resolution would require a sampling rate of 2^24 x 192,000 = 3.2 THz (Terahertz)

4) SACD/DSD samples with a resolution of 1 bit at a sampling rate of 2.8224 MHz (Megahertz) [per Philips website] which in theory is not equivalent (lower?) resolution to CD.

If the native resolution of DSD is lower than CD, then the quantization noise is higher across the passband. This can be dramatically modified by using aggressive noise shaping to push the quantization noise (which you can move but not eliminate) from the audio passband to the ultrasonic region. This is how SACD/DSD can achieve 20~24 bits of resolution over much of the audio passband (assumed to be approximately 20Hz ~ 20kHz) by pushing the quantization noise into the ultrasonic region (>20kHz), presumably where you won't be able to hear it. There may be other factors to take into account - I welcome anyone to jump in and enhance the theory here.

Of course, noise shaping can be used in a LPCM system too, resulting in very high dynamic range in the audio passband too, but with 24 bits = 144dB range I question whether this is of any practical need.

The relatively large amount of high frequency noise is a bit troublesome to me - if there is benefit to audio bandwidth >20kHz then it is compromised by such a high ultrasonic noise floor in the DSD system and might beg the case for LPCM because of its superior S/N ratios above 20kHz.

So I have difficulty with the argument that SACD/DSD is inherently superior to LPCM. In theory I believe that the AES papers make the above argument that LPCM can offer superior performance to DSD. I would not say that SACD/DSD is bad - far from it, it sounds quite good to my ears. But the OT is a difficult claim to support.

Linear PCM is crap, SACD is best

Reply #55 – 2004-04-06 01:08:45

Quote

The reason for DVD-Audio to fail is also very simpel: linear PCM. Those guys who "invented" this format are living in the past and by now studies show that 24 bits are indeed not enough.

To the best of my knowledge, by now studies show that the 18-bit quantization level is the threshold of human hearing, and with dithering properly done your lowly 16-bit Red Book format is perfectly adequate.

You can throw *both* SACD and DVD-A away.

:-)

Linear PCM is crap, SACD is best

Reply #56 – 2004-04-06 11:23:01

Still busy with making test signals, but in the mean time I came across an interesting article. It describes a comparison between 24 bits PCM formats (one at 96 kHz and the other at 192 kHz) and also DSD is taken into account. It adresses the issue of filtering and the pre-ringing of such a filter. http://www.dcsltd.co.uk/papers/effects.pdf

What I am looking at now is the phase difference between the left and right channel and the threshold of the human hearing. In the IQ test I mentioned earlier this is a very strong constraint and the outcome is not surprising. Due to the rounding effect of the ADC there must be an error (probabily one bit to start with) if the phase is different. Let's start with a signal that is in the middle and it will have no phase difference. Next, if it is moved a bit to the right by 5 degrees what will happen?

Linear PCM is crap, SACD is best

Reply #57 – 2004-04-06 11:38:30

Quote

Still busy with making test signals, but in the mean time I came across an interesting article. It describes a comparison between 24 bits PCM formats (one at 96 kHz and the other at 192 kHz) and also DSD is taken into account. It adresses the issue of filtering and the pre-ringing of such a filter. http://www.dcsltd.co.uk/papers/effects.pdf

What I am looking at now is the phase difference between the left and right channel and the threshold of the human hearing. In the IQ test I mentioned earlier this is a very strong constraint and the outcome is not surprising. Due to the rounding effect of the ADC there must be an error (probabily one bit to start with) if the phase is different. Let's start with a signal that is in the middle and it will have no phase difference. Next, if it is moved a bit to the right by 5 degrees what will happen?

If you'd read the FAQ entries, you'd see we've discussed this paper a long time ago.

Also, you still don't appear to understand dither. Please go and read about it, and then come back and try to figure out this phase thing. Which, you will find, is not a problem for LPCM.

Cheers,
David.

Linear PCM is crap, SACD is best

Reply #58 – 2004-04-06 11:42:53

DigitalMan,

You're spot on. The calculation isn't quite right because "1bit=6dB" isn't quite true, but apart from that, it's exactly as you say. You need noise shaping (which isn't a bad thing in itself) otherwise you'd need insanely high sampling rates.

The various AES papers show that DSD can't be dithered properly, and without correct dither, any digital system is inherently non-linear. (DSD is close though).

This doesn't make DSD (or DVD-A) "bad".

It's all in the FAQ - we've been around this subject more than once!

Cheers,
David.

Notice