How does gapless playback work?

Topic: How does gapless playback work? (Read 8877 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

How does gapless playback work?

2006-02-01 23:38:27

Hi!
I'm planning to convert my cd collection to Ogg Vorbis. One reason for this decision is the much cited "native support for gapless playback". That sounds nice, but I would like to know how it actually works. I took a look at the Vorbis I specification, asked Google, searched these forums and followed some links like that "Vorbis illuminated" article, but I'm still not having a clue why Vorbis is "natively gapless".

As far as I understand the specification, Vorbis streams consist of audio packets with two different frame sizes, which can be freely defined but have to be powers of two. That leads me to my first question: What happens to the samples at the end of an input wav file? In 99.9% the input data length cannot be reconstructed with the limitation of having to recreate it with chunks of only two possible lenghts. Is the last frame padded with silence, like it's done by the MP3 codec? If so, how much more native is that in comparison to MP3? Or is the last frame allowed to be of arbitrary size? But wouldn't that be a problem for the MDCT algorithm?

My second question refers to the transition between two songs. Doesn't a seamless transition require that one frame spans the end of the first song and the beginning of the second one? How can Vorbis guarantee that end and beginning of two subsequent songs match exactly, without introducing any audible cracks?

How does gapless playback work?

Reply #1 – 2006-02-01 23:43:16

http://wiki.hydrogenaudio.org/index.php?title=Gapless

How does gapless playback work?

Reply #2 – 2006-02-01 23:47:34

I've read that article before registering to post in this forum, but it didn't answer my questions.
How does Vorbis handle gapless playback in detail?

How does gapless playback work?

Reply #3 – 2006-02-02 00:10:49

Quote

I've read that article before registering to post in this forum, but it didn't answer my questions.
How does Vorbis handle gapless playback in detail?
[a href="index.php?act=findpost&pid=361364"][{POST_SNAPBACK}][/a]

I believe Vorbis provides gapless playback by simply not introducing a gap at the start or end of playback by design of the format itself (in contrast many formats do introduce a gap which then has to be skipped).

I could be confusing vorbis and mpc though.

How does gapless playback work?

Reply #4 – 2006-02-02 00:31:40

Well this is the only vorbis specific information that I could find.

http://www.xiph.org/vorbis/doc/vorbisfile/crosslap.html

Quote

I thought Vorbis was gapless
It is. Vorbis introduces no extra samples at the beginning or end of a stream, nor does it remove any samples. Gapless encoding eliminates 99% of the click, pop or outright blown speaker that would occur if boundaries had gaps or made no effort to align transitions. However, gapless encoding is not enough to entirely eliminate stairstep discontinuities all the time for exactly the reasons described above.

Frame lapping, like Vorbis performs internally during continuous playback, is necessary to eliminate that last epislon of trouble.

How does gapless playback work?

Reply #5 – 2006-02-02 02:33:15

Ogg Vorbis files have gapless metadata in the Ogg layer, according to Gabriel, AFAIK.

How does gapless playback work?

Reply #6 – 2006-02-02 06:50:53

Vorbis is not gapless by itself. Ogg vorbis can be, as you can store encoder delay and padding in the ogg container.
So it is exactly the same as Nero aac or Lame mp3: values are stored and must be read by the decoder to remove unnecessary samples.

How does gapless playback work?

Reply #7 – 2006-02-02 08:00:53

Unfortunately this is IMHO not well documented. It's somehow achieved through the "granule pos" (contained in the page headers). I think the document to examine is this one:
http://www.xiph.org/vorbis/doc/framing.html

Sebi

Edit: at least it used to be badly documented. I havn't checked the current documentation.

Edit2: I think this is it.

How does gapless playback work?

Reply #8 – 2006-02-02 09:20:46

Thanks a lot, the links provided by Zoom and SebastianG are great.
The last frame is padded and the exact position of the stream end is stored in the metadata, as far as I get it from chapter A of the specification:

Quote

A granule position on the final page in a stream that indicates less audio data than the final packet would normally return is used to end the stream on other than even frame boundaries. The difference between the actual available data returned and the declared amount indicates how many trailing samples to discard from the decoding process.

And if I understand this paragraph correctly...

Quote

Ideally, vorbisfile internally reads an extra frame of audio from the old stream/position to perform lapping into the new stream/position. However, automagic crosslapping works properly even if the old stream/position is at EOF. In this case, the synthetic post-extrapolation generated by the encoder to pad out the last block with appropriate data (and avoid encoding a stairstep, which is inefficient) is used for crosslapping purposes. Although this is synthetic data, the result is still usually completely unnoticable even in careful listening (and always preferable to a click or pop).

...then the Vorbis encoder doesn't pad the last frame with zeroes, but with some data derived from the beginning of that frame, the data it is actually meant to encode. This allows for a good compression of the last frame and that extra "imaginary" data doesn't matter, because it's usually cut off by the decoder. However, when crossfading two streams, this extra data can actually be used to crossfade with the new stream, efficiently eating up any clicks that might occur by simply concatenating two decoded frames.

If I understand that completely wrong, someone start barking right now, please. :-)

On the other hand this means that gapless support is no more native in Ogg Vorbis than it is in MP3. The granule position of the stream end is stored in the metadata, just like the lame tag. Besides that crossfading thing with the extra data introduced by the encoder, no special measures are taken to avoid clicks between two different streams. So one can actually hope that a stream doesn't end exactly on a frame boundary, for this will pretty sure introduce clicks because of the missing "crossfade reserve data".
After all, gapless playback is not a native feature of the Vorbis format, but depends entirely on the container (Ogg) and functionality in the decoder (like that crossfading in the vorbisfile library).

Again, if I got this totally wrong, start barking now, please. :-)

@Gabriel:
Can you tell me how lame is encoding the last frame of a stream? Does it just pad it with zeroes or does it fill in some "encoder friendly" data before encoding the whole frame like the Vorbis encoder seems to do it?
If the lame encoder doesn't, and we're encoding at a high bitrate, there won't be any crossfading reserve in the decoded last frame, I assume. So gapless MP3 would entirely rely on the hope that the last sample of the first stream matches the first sample of the second stream to avoid clicking, is that correct?

How does gapless playback work?

Reply #9 – 2006-02-02 10:12:32

Quote

On the other hand this means that gapless support is no more native in Ogg Vorbis than it is in MP3.
[a href="index.php?act=findpost&pid=361438"][{POST_SNAPBACK}][/a]

I could be wrong, but as I know to get gapless MP3 one must use the corresponding switch which is out of standard.
However if you want to get gapless Ogg Vorbis playback on hardware player you have to know there are only two hardware players which support gapless playback:

1. Rio Karma - which is out of production
2. iPods with RockBox firmware

Anyway it is more than null (I mean hardware support of gapless MP3 or AAC playback).

How does gapless playback work?

Reply #10 – 2006-02-02 11:34:34

Quote

However if you want to get gapless Ogg Vorbis playback on hardware player you have to know there are only two hardware players which support gapless playback:

1. Rio Karma - which is out of production
2. iPods with RockBox firmware
[a href="index.php?act=findpost&pid=361445"][{POST_SNAPBACK}][/a]

Rockbox on any player without hardware decoders will support gapless playback, not only on ipods. Rockbox on iriver H1x0 and H3x0 have supported gapless for a long time now.

How does gapless playback work?

Reply #11 – 2006-02-02 12:26:27

Quote

Rockbox on any player without hardware decoders will support gapless playback, not only on ipods. Rockbox on iriver H1x0 and H3x0 have supported gapless for a long time now.
[a href="index.php?act=findpost&pid=361455"][{POST_SNAPBACK}][/a]

Great news! Your work makes me (and I blieve not only me) think about buying an iPod.

How does gapless playback work?

Reply #12 – 2006-02-02 13:06:13

Quote

Can you tell me how lame is encoding the last frame of a stream? Does it just pad it with zeroes or does it fill in some "encoder friendly" data before encoding the whole frame like the Vorbis encoder seems to do it?
If the lame encoder doesn't, and we're encoding at a high bitrate, there won't be any crossfading reserve in the decoded last frame, I assume. So gapless MP3 would entirely rely on the hope that the last sample of the first stream matches the first sample of the second stream to avoid clicking, is that correct?
[a href="index.php?act=findpost&pid=361438"][{POST_SNAPBACK}][/a]

Lame only adds zeros at the end of the time data. I never thought about duplicating actual samples to be more "encoder friendly", but it seems to be a good idea. However, I do not understand how this would make gapless decoding easier. To me it would only save a few bits, but nothing else.
Yes, gapless mp3 rely on the hope that the last sample of part A is close to the first sample of part B, but this does not seem different than gapless playing of raw wav files.

Quote

I could be wrong, but as I know to get gapless MP3 one must use the corresponding switch which is out of standard.

No extra switch required, but yes this is not specified in the mp3 standard. Mp3 is only an audio compression format, like vorbis, so no standard provisions for gapless playback.

How does gapless playback work?

Reply #13 – 2006-02-02 15:06:48

Quote

Lame only adds zeros at the end of the time data. I never thought about duplicating actual samples to be more "encoder friendly", but it seems to be a good idea.

But it would break compatibility with players not aware of the lame tags. They wouldn't crop the duplicate data, which might lead to undesired artifacts.

Quote

However, I do not understand how this would make gapless decoding easier.

Well, it would partly replace the missing overlapping frame, the one consisting to 50% of data from the first stream's end and 50% of data from the beginning of the following stream. It could be used for seamlessly crossfading two tracks. Of course that's not a clean solution, but it can avoid cracks if the amount of overlapping data is large enough.

Quote

Yes, gapless mp3 rely on the hope that the last sample of part A is close to the first sample of part B, but this does not seem different than gapless playing of raw wav files.

With the difference that properly ripped wav files will always match across file boundaries. How well do decoded MP3 / Vorbis frames match on their boundaries at common bitrates used for encoding music?
I guess a _perfect_ match between two tracks is not that important, because the transitions are usually not as loud as the overall music track itself. But if the decoder is capable of properly recreating the frame borders, I ask myself why we actually need those overlapping windows at all? Couldn't the bandwith wasted for the double encoding of overlapping frames better be used for some method to control a perfect match of the frame boundaries?
Well, I guess not. If so someone else would probably already have implemented this lightyears ago. :-)

How does gapless playback work?

Reply #14 – 2006-02-03 08:22:43

Quote

Lame only adds zeros at the end of the time data. I never thought about duplicating actual samples to be more "encoder friendly", but it seems to be a good idea.
[a href="index.php?act=findpost&pid=361465"][{POST_SNAPBACK}][/a]

This is a bit off topic but here it goes: I believe the Vorbis encoders use linear prediction to extrapolate beyond the borders which is IMHO a neat idea.

Sebi

Notice