Normalization of PCM audio data

Topic: Normalization of PCM audio data (Read 8772 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Normalization of PCM audio data

2007-05-24 22:37:18

I have written a small command line tool that normalizes multiple wave files according to the highest peak of all the files. So, basically it is an album normalizer. The tool can handle 32-Bit float and 16-Bit integer.

Normalizing 16-Bit integer values introduces varying rounding errors because after multiplying with the normalization factor (which has to be a floating point value) the normalized sample values have to be converted back to integer values.

Here are the exact steps that my program does:

1. Scan the highest peak of all files as the absolute value: Integer values value between 0 and 32768

2. Divide the result of step 1 by 32768.0: Floating point values between 0.0 and 1.0

3. Divide the sample values of all files by the result of step 2 (or multiply by the reciprocal value): Floating point values between -32768.0 and 32767.0

4. Round the result of step 3 to integer values: Integer values between -32768 and 32767

I wonder if the rounding errors are negligible or if some kind of dithering should be applied.

Normalization of PCM audio data

Reply #1 – 2007-05-25 18:58:24

Without entering straight with *why* do you want to normalize wavs ( see this post : http://www.hydrogenaudio.org/forums/lofive...php/t42608.html ) , yes, dithering would be reasonable (as it is in any final step from floating point to integer conversion).

Normalization of PCM audio data

Reply #2 – 2007-05-25 22:25:19

I have chosen to use triangular dithering. Is it correct to add two times a random number between 0.0 and 1.0 to the sample values and use floor instead of round to get integer values?

Normalization of PCM audio data

Reply #3 – 2007-05-25 22:41:40

the method you describe simply does round to nearest, as opposite to round to zero (or truncate). Should be better than truncation (because it reduces the range of the error).

Normalization of PCM audio data

Reply #4 – 2007-05-25 23:34:36

I think I was wrong in my last post, because I assumed "flooring" values between 0.0 and 2.0 would be the same as rounding values between -1.0 and +1.0. But applying the first approach to a silent signal would yield values between 0 and 2 while the second approach would yield values between -1 and +1, which should be the correct behaviour.

But I found one more thing that puzzles me a bit. The 16-Bit (signed) integer format allows values ranging from -32768 to +32767. So there's one more negative value than positive values. When I create a new file in Adobe Audition and generate a signal at 100% volume the peaks range from -32768 to +32767 (doesn't that introduce a DC offset of 0.5, by the way?). Thus, I assume that a negative peak value of -32768 means 100% amplitude and a positive peak value of +32767 also means 100% amplitude. That means there are two values that stand for 100% amplitude.

So shouldn't I make a case differentiation while scanning for the highest peak value? foobar2000 doesn't seem to do it. It just seems to divide the highest peak value by 32768, so that only negative peak values can be interpreted as 100% amplitude and positive peak values can reach 99.9969482421875% amplitude at best.

A case differentiation would look like this:

Code: [Select]

if (sample_value_int < 0.0) // negative values
    sample_value_float = sample_value_int / 32768.0;
else if (sample_value_int > 0.0) // positive values
    sample_value_float = sample_value_int / 32767.0;
else
    sample_value_float = 0.0;

So, what is the better approach?

Normalization of PCM audio data

Reply #5 – 2007-05-26 11:06:07

As you might know, that's a consequence of bits being binary. ( 2^x ). The truth is that 100% in positive range cannot be achieved, but this fact is ignored due to the minimal impact it has (this is not one bit. it is one value).

In the end, what happens is that if your application allows to amplify the output, you will have to ensure you clip it at those top values. Concretely, you can forget about the values when converting to float, and clip the values when converting back to integer instead.

Normalization of PCM audio data

Reply #6 – 2007-05-26 14:12:33

So you mean that I shouldn't make a case differentiation but always just divide by 32768?

I can imagine where that could lead to problems. Imagine a file where the highest positive sample value is 32767. In this case it's also the highest absolute sample value. If I would divide that value by 32768 the result would be less than 1.0, so the program would "think" it has to increase the volume because the maximum possible peak is not reached. But in fact it is, because there is no +32768. This way the program would do nothing beneficial. It would just clip the peak and introduce noise by the dithering process. If I would instead define that a positive sample value of 32767 means maximum possible peak, then the program would do nothing because the file appears already normalized.

Example 1 (dividing by 32768):
32767 / 32768 = 0.999969482421875
32767 / 0.999969482421875 = 32768
You see that the old value 32767 becomes 32768. Because that value is not allowed it is clipped back to 32767.

Example 2 (dividing by 32767 - for positive values only):
32767 / 32767 = 1
32767 / 1 = 32767
In this case there is nothing done to the signal. A little check in the program can also prevent any dithering from being applied in this case.

Normalization of PCM audio data

Reply #7 – 2007-05-26 17:44:31

That's correct. But it would seem odd that there's a peak at 32767 and none at -32768 (possible, of course).

On a side note, avoid to do divisions. multiply by 1/x

Normalization of PCM audio data

Reply #8 – 2007-05-26 18:36:07

Quote

' date='May 26 2007, 18:44' post='494569']On a side note, avoid to do divisions. multiply by 1/x

Why? Do you think that a division decreases decimal precision? That doesn't seem to be the case in C#.

Code: [Select]

double result1 = 28739.0 / 32747.0;

double temp = 1.0 / 32747.0;
double result2 = 28739.0 * temp;

Both results are exactly the same:
result1: 0,877607109048157
result2: 0,877607109048157

Comparing the two values (==) also proves that they are identical.

Normalization of PCM audio data

Reply #9 – 2007-05-26 21:31:20

It's not about precision. It's about speed.
A processor cannot divide as fast as it multiplies.

Edit:

Obviouly, i mean to do "result2 = 28739.0 * 0.00003053714844 " , not to calculate the 1/x each time.

Normalization of PCM audio data

Reply #10 – 2007-05-27 00:45:49

I have compared division vs. multiplication where the reciprocal value is only computed once before the actual test.

Factor and divisor:

Code: [Select]

double factor = 1.0 / 32747.0;
double divisor = 32747.0;

Calculations:

Code: [Select]

double result1 = 28739.0 / divisor;
double result2 = 28739.0 * factor;

I repeated each calculation 1 billion times and measured the time before and after. This way I was able to measure the time how long 1 billion divisions and multiplications take.

There is really a significant difference. But actually it's the division that is faster, not the multiplication. The multiplication is about 50% slower.

Here are 5 results of the test:

Code: [Select]

Division:       00:00:01.0106775
Multiplication: 00:00:01.5038100

Division:       00:00:01.0067715
Multiplication: 00:00:01.5067395

Division:       00:00:01.0077480
Multiplication: 00:00:01.5018570

Division:       00:00:01.0272780
Multiplication: 00:00:01.5653295

Division:       00:00:01.0575495
Multiplication: 00:00:01.5995070

Surprised?

P.S.: I googled a bit and found a site that also stated that multiplications are faster than divisions. That was even related to C#. But I double checked the results and even tested it with float values instead of double values and used a value with more decimals. But it was always this 50% that the multiplications took longer.

P.P.S.: A friend has run the test program and the results are quite different. On his computer both division and multiplication take virtually identical times. So, maybe there are computers where the multiplication is faster than the division.

Normalization of PCM audio data

Reply #11 – 2007-05-27 01:32:53

Quote from: Hancoque on 2007-05-27 00:45:49

I have compared division vs. multiplication where the reciprocal value is only computed once before the actual test.

Factor and divisor:
Code: [Select]
double factor = 1.0 / 32747.0;
double divisor = 32747.0;
Calculations:
Code: [Select]
double result1 = 28739.0 / divisor;
double result2 = 28739.0 * factor;
I repeated each calculation 1 billion times and measured the time before and after. This way I was able to measure the time how long 1 billion divisions and multiplications take.

There is really a significant difference. But actually it's the division that is faster, not the multiplication. The multiplication is about 50% slower.

Here are 5 results of the test:
Code: [Select]
Division:       00:00:01.0106775
Multiplication: 00:00:01.5038100
Surprised?

P.S.: I googled a bit and found a site that also stated that multiplications are faster than divisions. That was even related to C#. But I double checked the results and even tested it with float values instead of double values and used a value with more decimals. But it was always this 50% that the multiplications took longer.

P.P.S.: A friend has run the test program and the results are quite different. On his computer both division and multiplication take virtually identical times. So, maybe there are computers where the multiplication is faster than the division.

Your results are without any doubt wrong.

Any x86 cpu i know needs more than 30 clock cycles (there are some rare exceptions when they are faster) to perform a division with double precision. And divison is not pipelined. Let's calculate how fast your CPU had to be:

1000,1000,1000 * 30 / 01.0106775 (seconds) = 29.683059 GHz

Amazing!

Most likely your compiler does perform some optimization: It only calculates the expression once. What you are measuring is the loop overhead and because of some memory alignment issues the very similar loops may take diffent times to execute. Because differnet cpu's react different to such alignment issues, different timing results are possible.

Normalization of PCM audio data

Reply #12 – 2007-05-27 01:40:22

I sent the program to a third person. On his computer the multiplication is much faster than the division. So three computers - three totally different results. On all computers the same binary has been used. But maybe the version of the .NET framework is also a factor.

Normalization of PCM audio data

Reply #13 – 2007-05-27 01:51:00

I don't know if i have made it clear enough:

Quote from: TBeck on 2007-05-27 01:32:53

Most likely your compiler does perform some optimization: It only calculates the expression once. What you are measuring is the loop overhead and because of some memory alignment issues the very similar loops may take diffent times to execute. Because differnet cpu's react different to such alignment issues, different timing results are possible.

This means, that the compiler most probably replaces the calculation (division or multiplication) inside of the loop with a simple assignment. The calculation itself is beeing performed only once outside of the loop. What you are measuring is the loop overhead, the assignment (possibly the loop is even empty) and the effect of code aligment of the loop entry point, but not the time it takes to perform a division or multiplication.

Normalization of PCM audio data

Reply #14 – 2007-05-27 02:07:28

Okay, this time I used random values that were generated inside the loop. The calculation process took much longer. The result on my computer is that both division and multiplication took nearly the same time when calculating the reciprocal value during each run of the loop.

Result:

Code: [Select]

Division:       00:00:31.8153465
Multiplication: 00:00:31.4823600

Division:       00:00:32.0565420
Multiplication: 00:00:31.6053990

But if the reciprocal value is precalculated, so that no division is necessary inside the loop, the multiplication is indeed faster.

Result:

Code: [Select]

Division:       00:00:31.9129965
Multiplication: 00:00:22.8862305

Division:       00:00:31.8260880
Multiplication: 00:00:23.1186375

So it seems that [JAZ] is right after all.

Normalization of PCM audio data

Reply #15 – 2007-05-27 12:06:56

Quote from: Hancoque on 2007-05-27 02:07:28

So it seems that [JAZ] is right after all.

Phew... seems that 10 years of coding served to learn something

Normalization of PCM audio data

Reply #16 – 2007-05-27 12:32:54

Apart from the discussion about the technical details of the calculations, why would you want to do a peak based normalization at all? Maximum peak amplitude of an audio sample tells very little about the perceived overall loudness.
From the HA wiki:

Quote

Replaygain is different from peak normalization. In peak normalization, you merely ensure that the peak amplitude reaches a certain level. This does not ensure equal loudness. The replaygain technique measures the effective power (i.e. taking RMS after an Equal Loudness contour) of the waveform, and amplifies the waveform accordingly. The result is that replaygained waveforms are usually more uniformly amplified that peak-normalized waveforms.

Replay gain calculation

Normalization of PCM audio data

Reply #17 – 2007-05-27 14:02:38

I use ReplayGain but this is a special case. I would not peak normalize my "normal" music.

In this case I downmix 5.1 DTS to 2.0 MP3. Depending on the plugin that I use for the downmixing process the peaks are either too high (my own plugin, which doesn't lower the volume of the output - to save one multiplication) or too low (Channel Mixer - because it is designed for clippingless playback).

So I was looking for a convenient way to normalize those files, as it would be stupid to "master" files that have a maximum peak of about 65%. 16-Bit integer normalization is in fact only a by-product because all the steps are done in 32-Bit float. But I wanted to have the feature in because a program that can only work with 32-Bit float would be a bit limited. Then I saw that working with 16-Bit integer files is much more complicated than working with 32-Bit float files (converting to float for processing, different positive and negative maxima, dithering). So, to be absolutely sure about all the aspects of normalizing 16-Bit integer files I created this thread.

P.S.: I modified my multiplication vs. division test program in the way that the random values aren't calculated inside the loop. They are now filled into an array before. Now the multiplication is about 4 times faster than the division. But I could not test it with the same amount of values because the array would need about 7.4 GiB memory then.

Notice