MP3 vs RM/RA

Topic: MP3 vs RM/RA (Read 7676 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

MP3 vs RM/RA

2005-09-18 04:06:41

Can anyone explain to me why 16K/16K RealAudio mono sounds better for spoken audio than does 32K/22.5K MP3? It is a shocking difference. The RealAudio is encoded using RealProducer, and the MP3 is encoded using multiple versions of LAME or Adobe Audition's Fraunhofer MP3 codec. The only thing that I can figure is that RealAudio must have much more accurate bandwidth limiters.

MP3 vs RM/RA

Reply #1 – 2005-09-18 05:36:32

RealAudio's voice codec (as well as WMA Voice, Speex, and other speech-specific codecs) is fundamentally different from MP3/WMA/OGG/etc. It goes far beyond just bandwidth limiting. When you select the "Voice" setting in RA, it is actually using a different codec than it does for music, not just scaling down its standard music codec, which is what MP3 must do. IIRC, RA's voice codecs are based on ACELP, a codec used in telecommunications (I think I read that in the setup guide for RealProducer).

MP3 is not optimized for extremely low bitrates, nor for voice-only encoding, so it's performance will always fall short of the voice-only codecs mentioned above in this application. There are several LAME command lines to try for squeezing out the best voice performance (a quick search of HA should turn up a few), but here's a starting point:

-b 32 -q 1 -m m --resample 16 --lowpass 5.5

All major versions of lame (3.90.3, 3.96.1, and 3.97b1) will give similar results, but I've always found 3.90.3 to be slightly cleaner at these very low bitrates - 3.97 in particular begins to impart a bit of hollowness to the voice.

Cheers,
Mix

Edit: Removed my original "edit" and made it a separate post...still new to forums...

MP3 vs RM/RA

Reply #2 – 2005-09-18 06:12:26

Did a quick search of HA myself , and came up with these command lines:

from http://www.hydrogenaudio.org/forums/index....showtopic=35214 :

-V3 --vbr-new --lowpass 8
(bitrate ~48kb)

from http://www.hydrogenaudio.org/forums/index....ST&f=31&t=36879 :

--abr 16 -a --resample 11 --lowpass 5 --athtype 2 -X3

--alt-preset 24 -a --resample 22 --lowpass 7

The "--abr 16" one is indeed "impressive" as the original poster said - tweaking it to this:

--abr 24 -a --resample 16 --lowpass 5.5 --athtype2 -X3

gives results that are damn near the same as my sample command line, but at ~6 kbps less.

MP3 vs RM/RA

Reply #3 – 2005-09-18 13:45:51

Thanks for the excellent information and for the suggestions. As I understand it, Internet streaming of MP3 files requires that VBR not be used. Does that same limitation apply to ABR? What is the difference between CBR and ABR? I assume that ABR is a form of VBR.

All that being said, what command line makes sense for CBR 32K/22.5K?

MP3 vs RM/RA

Reply #4 – 2005-09-18 20:12:02

Quote

Thanks for the excellent information and for the suggestions. As I understand it, Internet streaming of MP3 files requires that VBR not be used. Does that same limitation apply to ABR? What is the difference between CBR and ABR? I assume that ABR is a form of VBR.

I don't have much experience with streaming, but I think the technically correct answer to VBR/ABR streaming is "no." The practical answer is more like "it depends," as basically every Internet stream is VBR from the client's side of things - that's what buffering is for. I think the limitation comes from the server side, because with VBR/ABR streams, you could, at any given moment, have the stream call for a bitrate much higher than the average.

For example, if you had 10 clients connected to a server and you were streaming a file that had been encoded with LAME at -V2/--alt-preset standard, at any given moment, the stream could need to be as high as 320kb/sec, which means that in order to ensure dropout-free transmission, your connection to the Internet would need to be at least 3.2 Mb/sec + overhead, so maybe 3.5 Mb/sec.

Again, this is just a guess on my part...and yes, ABR is a form of VBR - it's basically very strict VBR, where the bitrate isn't allowed to vary nearly as much.

Of course, with speech, not only would it be highly unlikely that you would ever see a 320kb/sec frame, you can invoke the -B switch in LAME to limit the maximum bitrate that VBR/ABR encoding could use. This is usually a very bad thing to do with music because it really defeats the purpose of VBR when maximum sound quality is the goal, but with bandwidth-limited speech, it can be useful...

That is obviously the long answer ...the short answer is since CBR is indeed constant, you can always know exactly what your required bitrate will be for each client connection, and can therefore calculate your required bandwidth with no surprises.

Quote

All that being said, what command line makes sense for CBR 32K/22.5K?

Do you want 22.05 kHz sampling for increased frequency response, or for compatibility reasons? I'm going to assume it's for compatibility reasons (I know some MP3 players don't support all of the valid MPEG sample rates), so the command line would be the one I gave in my first post, but with the 22.05 kHz sample rate, so

-b 32 -q 1 --resample 22.05 --lowpass 5.5

Note that this assumes a mono input file - you'll need to add "-a" if the input file is stereo.

MP3 vs RM/RA

Reply #5 – 2005-09-18 21:43:28

Quote

Do you want 22.05 kHz sampling for increased frequency response, or for compatibility reasons? I'm going to assume it's for compatibility reasons (I know some MP3 players don't support all of the valid MPEG sample rates), so the command line would be the one I gave in my first post, but with the 22.05 kHz sample rate, so

-b 32 -q 1 --resample 22.05 --lowpass 5.5

Note that this assumes a mono input file - you'll need to add "-a" if the input file is stereo.[a href="index.php?act=findpost&pid=327868"][{POST_SNAPBACK}][/a]

Yes, the 22.05 is for compatibility, or so I'm told. One more question. What is the difference between "-a" and "-m m"?

Thanks!

MP3 vs RM/RA

Reply #6 – 2005-09-18 22:18:40

Quote

One more question. What is the difference between "-a" and "-m m"?

Sorry, I got a little careless on that... "-m m" forces WAV or AIFF files to be encoded as mono, whereas "-a" forces incoming raw PCM data to be encoded as mono.

Since I've always done my encoding of voice files from the command line, and have always used WAV files as the input, I've just used "-m m" if they were stereo, or nothing if they were mono (LAME can determine if it's mono from the WAV header). "-a" should be used when encoding to mono from stdin (for example, when using an app like Audiograbber to record straight to an MP3 from your soundcard's line input).

Note that when using WAV or AIFF (or MP3) files and encoding from the command line, "-m m" and "-a" can be used interchangeably. When sending command line parameters from within an app like dbPowerAmp's "mp3 (Lame.exe)" codec, it may or may not work the same, depending on what additional parameters that app adds to the command line.

MP3 vs RM/RA

Reply #7 – 2005-10-20 17:08:12

If someone were going to take the best advice on using MP3 for voice at a low bit rate (say 16Kbps), what settings would you use in Oddcast to get the best quality at the lower rate.

Thanks,
Steven Clift

MP3 vs RM/RA

Reply #8 – 2005-10-20 18:11:16

Quote

Thanks for the excellent information and for the suggestions. As I understand it, Internet streaming of MP3 files requires that VBR not be used.

Its not a hard and fast rule, streams are buffered so as long as the maximum sustained
bitrate of a vbr stream is unlikely to 'burst' the usual buffer sizes, vbrs efficiency can be benifited from.

Quote

Does that same limitation apply to ABR? What is the difference between CBR and ABR? I assume that ABR is a form of VBR.

ABR is VBR which is especialy managed to not sustain too high or low bitrate for too long. A less graceful way to manage the the sustained bitrate is to use -B to set a maximum framesize. The actual bitrate will never achieve that size unless the signals bandwidth is overloading the encoder.

Quote

All that being said, what command line makes sense for CBR 32K/22.5K?

--preset cbr 32 --resample 22

Ive used,
lame.exe -V8 -m m --resample 22 --lowpass 10 --lowpass-width -B64

for 25-40 kbs streams

or --abr 32 -m m --resample 22 --lowpass 10 --lowpass-width 3 -B80

for ~32 kbs streams

I prefer to hear the odd restrained artifact and keep the lowpass above 8 khz, rather than take all the crispness out of the sound with a lowpass around 5khz.

hth

MP3 vs RM/RA

Reply #9 – 2005-10-20 18:15:44

Quote

If someone were going to take the best advice on using MP3 for voice at a low bit rate (say 16Kbps), what settings would you use in Oddcast to get the best quality at the lower rate.

Thanks,
Steven Clift
[{POST_SNAPBACK}][/a]

Pretty basic - 11025 for your sample rate, 1 channel...and that's pretty much it.

About the only thing I can think of that might help is to roll off the lowend of your incoming files/source a little bit. At 11.025 kHz sampling, you're losing a lot of top end, so low frequencies can become overpowering. Rolling off/shelving down the low frequencies can often help with intelligibility (the oft-cited example is a standard telephone with an upper frequency limit of only 3kHz, but also with a low-end rolloff starting at 300 Hz.)

If your source is line-in from your soundcard, then it would be probably be simplest to use some external EQ. If it's from files on your hard drive, then you could either use Winamp's built-in EQ (not really recommended as it's not the best quality, especially with MP3s, but it's quick and easy), or use a DSP stacker plugin (search winamp.com) and Shibatch's Super Equalizer (from RareWares' [a href="http://www.rarewares.org/others.html]Others[/url] page), which is a very clean EQ, and cut-only, so it absolutely can't cause clipping. I've never tried stacking DSPs myself, but "it should work fine."

Note that with Shibatch's EQ, a little goes a long ways - just 3-6 dB of cut from 300 Hz on down can make a substantial difference.

MP3 vs RM/RA

Reply #10 – 2005-10-20 18:17:33

Quote

If someone were going to take the best advice on using MP3 for voice at a low bit rate (say 16Kbps), what settings would you use in Oddcast to get the best quality at the lower rate.

Thanks,
Steven Clift

For ~16 kbs,
-V9 -m m --resample 22 --lowpass 10 --lowpass-width 7 -B24

- could sound pretty bad actualy

theres always,
--preset 16

MP3 vs RM/RA

Reply #11 – 2005-10-20 18:45:43

Quote

Quote
If someone were going to take the best advice on using MP3 for voice at a low bit rate (say 16Kbps), what settings would you use in Oddcast to get the best quality at the lower rate.

Thanks,
Steven Clift

For ~16 kbs,
-V9 -m m --resample 22 --lowpass 10 --lowpass-width 7 -B24

- could sound pretty bad actualy

theres always,
--preset 16
[a href="index.php?act=findpost&pid=335995"][{POST_SNAPBACK}][/a]

Unfortunately, you can't send command-line parameters in Oddcast. You can edit the .cfg file for each defined encoder, and in theory can get ABR/VBR to work via that method, but I've had unpredictable results. Still, might be worth a shot - the latest version of Oddcast (3.1.5) may do a better job of sending switches to the DLL (I guess I should try it myself ).

The only sure-fire setting I've been able to get to work in the .cfg file is changing LAMEPreset from the default of -1 to 12, on advice from the author of Oddcast. This enables the CBR preset, and with the LAME 3.97b1 DLL and a bitrate of 128, it does indeed seem to be the equivalent of "-b 128" on the command line.

MP3 vs RM/RA

Reply #12 – 2005-10-20 18:58:59

Quote

Unfortunately, you can't send command-line parameters in Oddcast. You can edit the .cfg file for each defined encoder, and in theory can get ABR/VBR to work via that method, but I've had unpredictable results. Still, might be worth a shot - the latest version of Oddcast (3.1.5) may do a better job of sending switches to the DLL (I guess I should try it myself ).

Thanks for filling me in. I was just eyeing up lames --highpass switch for reducing the low freqs you suggested.
I take it you found the low sampling rate necessary to keep your streams acceptable.
I bow to your superior experience in this area

MP3 vs RM/RA

Reply #13 – 2005-10-20 20:39:18

Quote

Thanks for filling me in. I was just eyeing up lames --highpass switch for reducing the low freqs you suggested.
[a href="index.php?act=findpost&pid=336006"][{POST_SNAPBACK}][/a]

I've tried that, and let's just say that it is not nearly as well-tuned as the lowpass filter...yikes. Even at its lowest allowed frequency (which, at 11.025 kHz sampling, appears to be 133 Hz), it produces some very interesting, vocoder-type artifacts.

Quote

I take it you found the low sampling rate necessary to keep your streams acceptable.

Correct. I completely understand your statement about wanting to keep the sound "crisp" and being able to tolerate some artifacts, and indeed, if you have the ability to tweak settings for individual voices and/or recording conditions, you can sometimes get by with a substantially higher lowpass setting at a given bitrate. However, I capture everything live, and can have as many as 12-15 people speaking over the course of one recording, so my settings are designed to be as "bomb-proof" as possible (my usual setting is the -b32 --resample 22.05 --lowpass 5.5 command line from earlier in the thread). Also, many of my recordings end up being used for transcription, and since the agency we use does the majority of their transcriptions live from a phone line, even a 5.5 kHz bandwidth sounds excellent to them.

Quote

I bow to your superior experience in this area

Notice