Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: MP3 Bytes per frame? (Read 10097 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

MP3 Bytes per frame?

I'm currently writing an MP3 frame parser (because I have too much spare time  ) and I'm wondering how to figure out how many bits are used per frame.

The relevant info must be stored in the side information, but where? I assumed you just add up all the Part2_3 lengths in the frame, but it looks like that's not quite right. It works some times, but other times it looks like there is just some additional junk past where the Part2_3 lengths say.

For example, I made a simple CBR 320 file, and the first frame is 1045 bytes long. It's a fairly simple frame, so it looks like(*) only 547 of them are used. Subtracting the 36 bytes for the header and side info, I get 511 bytes used for the bitstream. However, simply adding up the Part2_3 lengths gives me 495 bytes. 32 bytes are unaccounted for! 

What is the correct way for finding out how much data is used per frame?

Thanks!


PS. I'm using LAME 3.96.1 and 3.97a7, both with the --nores switch. The problem doesn't seem to happen on some other random MP3s I have sitting around. Does LAME add junk after the end of the frame data for some reason?
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2

MP3 Bytes per frame?

Reply #1
You need the bitrate, samples per frame, frequency (in Hz), padding bit, and padding size to get the framesize.

FrameSize = Bitrate * 1000/8 * SamplesPerFrame / Frequency + IsPadding * PaddingSize

SamplesPerFrame depends on both the MPEG version and Layer.  For MPEG-1.0 Layer-III and Layer-II the samples per frame is 1152 bytes.  For MPEG-1.0 Layer-I it's 384 bytes.  For MPEG-2.0 and MPEG-2.5, Layer-I is 384 bytes, Layer-II is 1152 bytes, and Layer-III is 576 bytes.

PaddingSize is determined by the layer number.  Layer-I is 4 bytes and Layer-II and Layer-III are 1 byte.  It's the same across MPEG-1.0, 2.0, and 2.5.

IsPadding is used on a per-frame basis, you have to check it for each frame.

The tables for bitrate and frequency can be found here http://www.mp3-tech.org/programmer/frame_header.html

Going the extra mile here, you may also want to consider the possibility of bad data, as happens a lot with 'random' mp3's.  A false frame sync is often present in the mix of bad data, so you may want to do a sanity check, such as did the MPEG version or Layer suddenly change?  Is there another frame sync at exactly the position this supposed frame header indicates?  Have I encountered a tag?  There's lots to think about.

MP3 Bytes per frame?

Reply #2
@Jud -
I think you may have misunderstood me a bit. I can find how many bytes per frame, but what I'm looking for is how many of those are used. It's a bit more complicated than finding the actual bitrate. Thanks to the limited number of frame sizes, MP3 uses a "bit reservoir" to use part of one frame for the next. Thus it is important to know how much of the current frame is actually filled with current-frame-data.

What I'm eventually going to make is an "MP3 repacker", which will take an MP3 and futz around with the frames, hopefully reducing the size. This would help the most for high-bitrate CBR, like --alt-preset insane. API gives CBR 320, but on one of my files, only ~290kbps were actually used. I'm almost positive that it would be possible to losslessly convert this to a VBR 290 file, shaving off ~10% of the file size without impacting quality at all.

It's written entirely in Perl, so it will be of limited use to the world at large. But like I said, I was mainly doing this because I'm bored (I don't even use MP3s any more... I'm a FLAC guy now  )
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2

MP3 Bytes per frame?

Reply #3
Quote
Does LAME add junk after the end of the frame data for some reason?
[a href="index.php?act=findpost&pid=278787"][{POST_SNAPBACK}][/a]


Yes, that's right. The area beyond the audio data called "auxiliary data", not "junk".  It's only there to fill up the fixed size of the frame.

It is correct: the side info area of the frame contains the number of bits used for audio data (called "part2_3 length" in the reference decoder).

If you force the encoder to not use the bit reservoir (using -nores option) then the size of the encoded audio data is always less or equal the maximum space inside a single frame. How many bits are actually used depends on the data itself and on the encoder's quality settings.

If you would allow the use of the bit reservoir then the sizes of the audio data could be smaller, equal or larger than the size of a single frame.

To summarise: the MPEG audio frames can have a few distinct sizes, only. (indicated by the bitrate field in the header). The actual audio contents can vary in size (indicated by the side info fields).

Btw. padding as mentioned by Jud varies the frame sizes only by one byte, in order to match the bitrate setting for the stream. IMHO that is not related to your question.


I suggest that you have a look at the sources of a working decoder to find out all the details. This is usually more helpful than just reading some explanations, at least it is for me. 

MP3 Bytes per frame?

Reply #4
Quote
What I'm eventually going to make is an "MP3 repacker", which will take an MP3 and futz around with the frames, hopefully reducing the size.
[a href="index.php?act=findpost&pid=278800"][{POST_SNAPBACK}][/a]


Yeah, finally someone picks up this task!    I thought about writing such a program a long time ago but never found enough time/motivation to do it...

Please keep informing us about the progress!

MP3 Bytes per frame?

Reply #5
Quote
It is correct: the side info area of the frame contains the number of bits used for audio data (called "part2_3 length" in the reference decoder).
So the amount of bits used by the frame is the simple addition of all the part3_2 lengths? This is the crux of my question...

Quote
I suggest that you have a look at the sources of a working decoder to find out all the details. This is usually more helpful than just reading some explanations, at least it is for me. 
I've been brooding over LAME's bitstream.c to try to figure out the structure of a frame, but I'm kind of stuck with this problem.

Quote
If you would allow the use of the bit reservoir then the sizes of the audio data could be smaller, equal or larger than the size of a single frame.

If you force the encoder to not use the bit reservoir (using -nores option) then the size of the encoded audio data is always less or equal the maximum space inside a single frame. How many bits are actually used depends on the data itself and on the encoder's quality settings.
Yup, which is why this will probably do nothing for VBR and low-bitrate CBR files. However, for high-bitrate CBR files, there may be stretches of data which are incapable of filling up the frame. If this occurs for long enough, part of the free space may be pushed past the 511 byte limit for the reservoir, rendering it inaccessible. (If that makes any sense...  )

Plus, the bit reservoir is completely useless for CBR 320 files, as the maximum amount of data must not be more than can fit in a 320kbps frame.

To check which files would benifit the most from this, I figured out how much space was just padding. In an APS file, the savings would only be 0.08%. In a CBR 128 file, the savings would be 0.1%. However, for API you might be able to save 10% due to the wasted space.

Quote
Yes, that's right. The area beyond the audio data called "auxiliary data", not "junk".   It's only there to fill up the fixed size of the frame.
I haven't ruled out the possibility of it being simple padding, but it doesn't really look like padding. Most LAME padding says stuff like "LAME 3.96.1UUUUUUU...", or is just a bunch of nulls. However, for one file, the extra data is:
Code: [Select]
31 05 35 14 CC B8 E4 DC 80 A1 85 B1 C1 A1 84 A4 00 00 00 00 00 00 00 00 ...
(*)
which doesn't spell anything, and doesn't have any nice binary properties (like "U") It looks like it actually means something, but I can't figure out what.

Quote
Btw. padding as mentioned by Jud varies the frame sizes only by one byte, in order to match the bitrate setting for the stream. IMHO that is not related to your question.
Yup. I figured out how to parse the frame header, and how to detect padding, CRC, and all the other tidbits about the frame. The problem now is the data within the frame...
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2

MP3 Bytes per frame?

Reply #6
AH! Just solved it!  Rejoice!

In my code snippet above, the data for the frame actually ends on the 6th bit of the byte before the one I posted. So I suppose there should be a "01" before the rest of the bytes I posted.

Once I figured that out, I thought that since MP3 is a bitstream, the extra data might mean something, and just be unaligned. So I converted the "(01)31053514CCB8E4DC80A185B1C1A184A4" to binary, right-shifted it twice, and end up with "4C414D45332E39372028616C70686129", which is hex for "LAME3.97 (alpha)". Looks like standard LAME padding to me. It was just throwing me off because it was unaligned. Huzzah!
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2

MP3 Bytes per frame?

Reply #7
Quote
So the amount of bits used by the frame is the simple addition of all the part3_2 lengths? This is the crux of my question...

Yes, it is.

Quote
I've been brooding over LAME's bitstream.c to try to figure out the structure of a frame, but I'm kind of stuck with this problem.

So, just look at a DEcoder source code, then. (e.g. MAD)  Advantage: the decoder parses the data just as your program has to do it.

Quote
Plus, the bit reservoir is completely useless for CBR 320 files, as the maximum amount of data must not be more than can fit in a 320kbps frame.

Is it, really?    Do you have a link to the specs for this?
I always thought that the maximum size of a single frame's audio data is the maximum frame size (320kbps setting) PLUS the maximum bit reservoir capacity (511 bytes). This is the way I implemented the parser in my own MPEG player, anyway.

Quote
To check which files would benifit the most from this, I figured out how much space was just padding. In an APS file, the savings would only be 0.08%. In a CBR 128 file, the savings would be 0.1%. However, for API you might be able to save 10% due to the wasted space.

Which is still a nice achievement, especially when you consider that this is a lossless transformation and 100% compliant to MPEG standard.



Do you know if LAME already optimizes the Huffman coding? If not, this would be another option to slightly reduce the size of MP3 data in a lossless way.
(this idea is similar to a Vorbis tool called "rehuff", IIRC. the difference is that in MP3 there is only a fixed set of Huffman tables to be used - the optimization would be simply to pick the table that yields the fewest encoded bits.)


Have you already thought of the option to "inflate" an MP3 stream to a higher bitrate by adding auxiliary data? This could be useful to embed any data inside the stream (text, images) WITHOUT the need to use tags...

MP3 Bytes per frame?

Reply #8
Quote
Do you know if LAME already optimizes the Huffman coding? If not, this would be another option to slightly reduce the size of MP3 data in a lossless way.

Yes, but other encoders might not fully try to reach the most efficient Huffman coding.

Quote
Have you already thought of the option to "inflate" an MP3 stream to a higher bitrate by adding auxiliary data? This could be useful to embed any data inside the stream (text, images) WITHOUT the need to use tags...

...or to create multichannel mp3 files

MP3 Bytes per frame?

Reply #9
Quote
Quote
Plus, the bit reservoir is completely useless for CBR 320 files, as the maximum amount of data must not be more than can fit in a 320kbps frame.

Is it, really?    Do you have a link to the specs for this?
I always thought that the maximum size of a single frame's audio data is the maximum frame size (320kbps setting) PLUS the maximum bit reservoir capacity (511 bytes). This is the way I implemented the parser in my own MPEG player, anyway.
No specs, but that's what I've heard from various people. Here's Gabriel pointing it out (post #9). I've just seen it come up here and there, usually involved in a "why doesn't API use the reservoir?" question.

Quote
Do you know if LAME already optimizes the Huffman coding? If not, this would be another option to slightly reduce the size of MP3 data in a lossless way.
(this idea is similar to a Vorbis tool called "rehuff", IIRC. the difference is that in MP3 there is only a fixed set of Huffman tables to be used - the optimization would be simply to pick the table that yields the fewest encoded bits.)
I remember toying with rehuff when it came out. I probably won't implement it, for a couple of reasons:
* I don't know any "real" programming languages. Perl's the only thing I can code in, and I don't think programming a Huffman (re)coder in Perl will be very fast, or very fun.
* I have no idea what the actual MP3 specs are. A lot of pages say what the header bits do, but I had to look at the LAME source code and some Chinese PDF to figure out how the side-info works. (Do you know where I can get them? Do you have to buy them, or what?)
* From what I remember, "rehuff" was fairly buggy, and didn't really reduce size much. I don't think an MP3 version will be much better.

Quote
Have you already thought of the option to "inflate" an MP3 stream to a higher bitrate by adding auxiliary data? This could be useful to embed any data inside the stream (text, images) WITHOUT the need to use tags...
It might be nice, but it wouldn't be supported at all, and tags would probably be better anyway. I'll probably add a "minimum bitrate" option to limit the size of frames. I remember something about certain portables not supporting lower bitrates, although maybe I just made that up... 

HOWEVER, I was thinking of making a "magic" MP3 enhancer based on this. It would go something like this:
1. Make a program which will take any MP3, and pad it so it's CBR 320
2. Go onto various fora (the ones which still teach things like "joint stereo is evil" and "smearing goo on your CDs make them sound better")
3. Post there saying I found this great program that utilizes the latest in "spectral enhancement Markov chains" and "non-deterministic B-spline generators", etc. to increase the quality of your MP3s
4. See how many people believe me. I'm sure there will be responses like "It sounds warmer, yet clearer and deeper, without the harsh boominess of the original. Plus, I found $5 under my sofa."
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2

MP3 Bytes per frame?

Reply #10
My understanding of the spec is that the maximum size of relevant data for a frame is the size of a "320kbps frame without bit reservoir". This limit is indicating how much memory an hardware player must feature.
That is why we do not use bit reservoir with 320kbps cbr.
However, this rule is a little relaxed in Lame. As it is defining the necessary memory size of a player, and because the lowest sampling freq of a given mpeg mode has bigger frame sizes, a decoder must have at least enough memory to decode a 320kbps file using the lowest sampling freq of an mpeg mode.
We are using this value as a limit.
As an example, an mpeg-1 44.1/48kHz file will have frames with no more data than the size of a 32kHz 320kbps frame, which is a little higher than the strict limit (size of a 44.1/48kHz 320kbps frame).

MP3 Bytes per frame?

Reply #11
Quote
HOWEVER, I was thinking of making a "magic" MP3 enhancer based on this. It would go something like this:
...
4. See how many people believe me. I'm sure there will be responses like "It sounds warmer, yet clearer and deeper, without the harsh boominess of the original. Plus, I found $5 under my sofa."
[a href="index.php?act=findpost&pid=278833"][{POST_SNAPBACK}][/a]

Hehe, sounds like a good idea! 
Just make sure that you use really wild claims ("even better than CD quality!!!") and lots of marketing buzz words to promote it.