Linear PCM is crap, SACD is best
Reply #51 – 2004-04-05 11:07:46
low bitresolution affects the perceived audible quality in your opinion? The first thing what pops up is stereo image. Distorsion is very likely (can be in the frequency and/or phase domain as well as in the amplitude), but I will try to investigate this more with mathematical tools. Thanks for your reply. So let's have a closer look at stereo image. 3D perception of sounds is caused byDelay between channels (-> sound from the left arrives earlier at the left ear than at the right ear) The sound traveling arround the head and being reflected by the ear causing direction-dependantfrequency-dependant attenuation/amplification (= "equalization") frequency-dependant phase shift (= "phase distortion") Taking these things into account is only necessary for creating binaural recordings for listening with headphones; for playback through amp -> speakers setup it can be simplified toDelay between channels Phase distortion (obvious case: invert one channel (= 180° phase distortion) -> stereo image is moved) Attenuation between channels Now to find out how many bits resolution are enough for sure, we would need to know what's the smallest delay, phase and volume difference that leads to noticable differences in stereo perception. Does anyone have reliable figures about this? Some clarification about the resolution vs. dynamic range thing: The often quotet number 16bit = 96dB dynamic range means this (simplified): If a single sample with (at 16bit resolution) max. value = 32767 is rounded from higher resolution, the max. error can be 0.5. 20*log(0.5/32767) = -96.3 If you want to know how much resolution is "left" for a signal with lower than (+/- 32767) sample values, you have to choose: Either you take the sample values as they are or you transform to frequency domain as you (dekkersj) do in your argumentation. If you split the signal in single (1+A)cos(omega*t+phi) waves (with much lower peak volume compared to peak volume of time domain signal) you must take into account that the resulting waves depend on many samples, depending on fft lenght used for time->frequency transformation. Example: If you have e.g. a fft length of 1024, you get 512 (1+A)cos(omega*t+phi) waves + DC. As easy example, let the original signal be a repetition of -40dB dirac impulses, every 1024 samles, i.e. the sample values of the impulses are 328. This signal can be transformed to 512 waves with amplitudes of 328/512 = 0.64. In your argumentation, we are at a point of (much) too low resolution here because the next possible steps would be 1 (= 3.9dB louder than 0.64) or 0 (= silence). But now try this: Take any sample and change the value by 1. When looking at the result in some frequency analysis and compare to the plain dirac ipulse, you'll notice that the biggest part of frequencies have a different energy now, but these differences are much smaller than one would expect when following your argumentation. Conclusion: When looking at frequency (and phase which only exists related to frequencies), the "resolution" is, depending on the fft size you use, much bigger then you suggested. (With increasing fft size the energy of single waves becomes lower, but this is compensated by increasing "resolution".) You can look at it from another direction: If you do a time->frequency transformation, the error a single sample (rounded to 16bit) can contain (+/- 0.5) must be regarded as distributed to all (e.g. 512) waves. So for every single of those waves, the error caused by one sample is < +/-0.5/512 on average. Using dither (which BTW is done extensively by DSD, as said before) you can make sure that the errors caused by all samples in the fft window (1024 here) don't sum up for certain frequencies (this would be truncation distortion) but are equally distributed (or shifted to less audible frequency ranges = noiseshaping) and canceled to some degree.