| This document describes some things to know about the Ogg format, as well |
| as implementation details in GStreamer. |
| |
| INTRODUCTION |
| ============ |
| |
| ogg and the granulepos |
| ---------------------- |
| |
| An ogg stream contains pages with a serial number and a granulepos. |
| The granulepos is a 64 bit signed integer. It is a value that in some way |
| represents a time since the start of the stream. |
| The interpretation as such is however both codec-specific and |
| stream-specific. |
| |
| ogg has no notion of time: it only knows about bytes and granulepos values |
| on pages. |
| |
| The granule position is just a number; the only guarantee for a valid ogg |
| stream is that within a logical stream, this number never decreases. |
| |
| While logically a granulepos value can be constructed for every ogg packet, |
| the page is marked with only one granulepos value: the granulepos of the |
| last packet to end on that page. |
| |
| theora and the granulepos |
| ------------------------- |
| |
| The granulepos in theora is an encoding of the frame number of the last |
| key frame ("i frame"), and the number of frames since the last key frame |
| ("p frame"). The granulepos is constructed as the sum of the first number, |
| shifted to the left for granuleshift bits, and the second number: |
| granulepos = (pframe << granuleshift) + iframe |
| |
| (This means that given a framenumber or a timestamp, one cannot generate |
| the one and only granulepos for that page; several granulepos possibilities |
| correspond to this frame number. You also need the last keyframe, as well |
| as the granuleshift. |
| However, given a granulepos, the theora codec can still map that to a |
| unique timestamp and frame number for that theora stream) |
| |
| Note: currently theora stores the "presentation time" as the granulepos; |
| ie. a first data page with one packet contains one video frame and |
| will be marked with 0/0. Changing that to be 1/0 (so that it |
| represents the number of decodable frames up to that point, like |
| for Vorbis) is being discussed. |
| |
| vorbis and granulepos |
| --------------------- |
| |
| In Vorbis, the granulepos represents the number of samples that can be |
| decoded from all packets up to that point. |
| |
| In GStreamer, the vorbisenc elements produces a stream where: |
| - OFFSET is the time corresponding to the granulepos |
| number of bytes produced before |
| - OFFSET_END is the granulepos of the produced vorbis buffer |
| - TIMESTAMP is the timestamp matching the begin of the buffer |
| - DURATION is set to the length in time of the buffer |
| |
| Ogg media mapping |
| ----------------- |
| |
| Ogg defines a mapping for each media type that it embeds. |
| |
| For Vorbis: |
| |
| - 3 header pages, with granulepos 0. |
| - 1 page with 1 packet header identification |
| - N pages with 2 packets comments and codebooks |
| - granulepos is samplenumber of next page |
| - one packet can contain a variable number of samples but one frame |
| that should be handed to the vorbis decoder. |
| |
| For Theora |
| |
| - 3 header pages, with granulepos 0. |
| - 1 page with 1 packet header identification |
| - N pages with 2 packets comments and codebooks |
| - granulepos is framenumber of last packet in page, where framenumber |
| is a combination of keyframe number and p frames since keyframe. |
| - one packet contains 1 frame |
| |
| |
| |
| |
| DEMUXING |
| ======== |
| |
| ogg demuxer |
| ----------- |
| |
| This ogg demuxer has two modes of operation, which both share a significant |
| amount of code. The first mode is the streaming mode which is automatically |
| selected when the demuxer is connected to a non-getrange based element. When |
| connected to a getrange based element the ogg demuxer can do full seeking |
| with great efficiency. |
| |
| 1) the streaming mode. |
| |
| In this mode, the ogg demuxer receives buffers in the _chain() function which |
| are then simply submitted to the ogg sync layer. Pages are then processed when |
| the sync layer detects them, pads are created for new chains and packets are |
| sent to the peer elements of the pads. |
| |
| In this mode, no seeking is possible. This is the typical case when the |
| stream is read from a network source. |
| |
| In this mode, no setup is done at startup, the pages are just read and decoded. |
| A new logical chain is detected when one of the pages has the BOS flag set. At |
| this point the existing pads are removed and new pads are created for all the |
| logical streams in this new chain. |
| |
| |
| 2) the random access mode. |
| |
| In this mode, the ogg file is first scanned to detect the position and length |
| of all chains. This scanning is performed using a recursive binary search |
| algorithm that is explained below. |
| |
| find_chains(start, end) |
| { |
| ret1 = read_next_pages (start); |
| ret2 = read_prev_page (end); |
| |
| if (WAS_HEADER (ret1)) { |
| } |
| else { |
| } |
| |
| } |
| |
| a) read first and last pages |
| |
| start end |
| V V |
| +-----------------------+-------------+--------------------+ |
| | 111 | 222 | 333 | |
| BOS BOS BOS EOS |
| |
| |
| after reading start, serial 111, BOS, chain[0] = 111 |
| after reading end, serial 333, EOS |
| |
| start serialno != end serialno, binary search start, (end-start)/2 |
| |
| start bisect end |
| V V V |
| +-----------------------+-------------+--------------------+ |
| | 111 | 222 | 333 | |
| |
| |
| after reading start, serial 111, BOS, chain[0] = 111 |
| after reading end, serial 222, EOS |
| |
| while ( |
| |
| |
| |
| testcases |
| --------- |
| |
| a) stream without BOS |
| |
| +----------------------------------------------------------+ |
| 111 | |
| EOS |
| |
| b) chained stream, first chain without BOS |
| |
| +-------------------+--------------------------------------+ |
| 111 | 222 | |
| BOS EOS |
| |
| |
| c) chained stream |
| |
| +-------------------+--------------------------------------+ |
| | 111 | 222 | |
| BOS BOS EOS |
| |
| |
| d) chained stream, second without BOS |
| |
| +-------------------+--------------------------------------+ |
| | 111 | 222 | |
| BOS EOS |
| |
| What can an ogg demuxer do? |
| --------------------------- |
| |
| An ogg demuxer can read pages and get the granulepos from them. |
| It can ask the decoder elements to convert a granulepos to time. |
| |
| An ogg demuxer can also get the granulepos of the first and the last page of a |
| stream to get the start and end timestamp of that stream. |
| It can also get the length in bytes of the stream |
| (when the peer is seekable, that is). |
| |
| An ogg demuxer is therefore basically able to seek to any byte position and |
| timestamp. |
| |
| When asked to seek to a given granulepos, the ogg demuxer should always convert |
| the value to a timestamp using the peer decoder element conversion function. It |
| can then binary search the file to eventually end up on the page with the given |
| granule pos or a granulepos with the same timestamp. |
| |
| Seeking in ogg currently |
| ------------------------ |
| |
| When seeking in an ogg, the decoders can choose to forward the seek event as a |
| granulepos or a timestamp to the ogg demuxer. |
| |
| In the case of a granulepos, the ogg demuxer will seek back to the beginning of |
| the stream and skip pages until it finds one with the requested timestamp. |
| |
| In the case of a timestamp, the ogg demuxer also seeks back to the beginning of |
| the stream. For each page it reads, it asks the decoder element to convert the |
| granulepos back to a timestamp. The ogg demuxer keeps on skipping pages until |
| the page has a timestamp bigger or equal to the requested one. |
| |
| It is therefore important that the decoder elements in vorbis can convert a |
| granulepos into a timestamp or never seek on timestamp on the oggdemuxer. |
| |
| The default format on the oggdemuxer source pads is currently defined as a the |
| granulepos of the packets, it is also the value of the OFFSET field in the |
| GstBuffer. |
| |
| MUXING |
| ====== |
| |
| Oggmux |
| ------ |
| |
| The ogg muxer's job is to output complete Ogg pages such that the absolute |
| time represented by the valid (ie, not -1) granulepos values on those pages |
| never decreases. This has to be true for all logical streams in the group at |
| the same time. |
| |
| To achieve this, encoders are required to pass along the exact time that the |
| granulepos represents for each ogg packet that it pushes to the ogg muxer. |
| This is ESSENTIAL: without this exact time representation of the granulepos, |
| the muxer can not produce valid streams. |
| |
| The ogg muxer has a packet queue per sink pad. From this queue a page can |
| be flushed when: |
| - total byte size of queued packets exceeds a given value |
| - total time duration of queued packets exceeds a given value |
| - total byte size of queued packets exceeds maximum Ogg page size |
| - eos of the pad |
| - encoder sent a command to flush out an ogg page after this new packet |
| (in 0.8, through a flush event; in 0.10, with a GstOggBuffer) |
| - muxer wants a flush to happen (so it can output pages) |
| |
| The ogg muxer also has a page queue per sink pad. This queue collects |
| Ogg pages from the corresponding packet queue. Each page is also marked |
| with the timestamp that the granulepos in the header represents. |
| |
| A page can be flushed from this collection of page queues when: |
| - ideally, every page queue has at least one page with a valid granulepos |
| -> choose the page, from all queues, with the lowest timestamp value |
| - if not, muxer can wait if the following limits aren't reached: |
| - total byte size of any page queue exceeds a limit |
| - total time duration of any page queue exceeds a limit |
| - if this limit is reached, then: |
| - request a page flush from packet queue to page queue for each queue |
| that does not have pages |
| - now take the page from all queues with the lowest timestamp value |
| - make sure all later-coming data is marked as old, either to be still |
| output (but producing an invalid stream, though it can be fixed later) |
| or dropped (which means it's gone forever) |
| |
| The oggmuxer uses the offset fields to fill in the granulepos in the pages. |
| |
| GStreamer implementation details |
| -------------------------------- |
| As said before, the basic rule is that the ogg muxer needs an exact time |
| representation for each granulepos. This needs to be provided by the encoder. |
| |
| Potential problems are: |
| - initial offsets for a raw stream need to be preserved somehow. Example: |
| if the first audio sample has time 0.5, the granulepos in the vorbis encoder |
| needs to be adjusted to take this into account. |
| - initial offsets may need be on rate boundaries. Example: |
| if the framerate is 5 fps, and the first video frame has time 0.1 s, the |
| granulepos cannot correctly represent this timestamp. |
| This can be handled out-of-band (initial offset in another muxing format, |
| skeleton track with initial offsets, ...) |
| |
| Given that the basic rule for muxing is that the muxer needs an exact timestamp |
| matching the granulepos, we need some way of communicating this time value |
| from encoders to the Ogg muxer. So we need a mechanism to communicate |
| a granulepos and its time representation for each GstBuffer. |
| |
| (This is an instance of a more generic problem - having a way to attach |
| more fields to a GstBuffer) |
| |
| Possible ways: |
| - setting TIMESTAMP to this value: bad - this value represents the end time |
| of the buffer, and thus conflicts with GStreamer's idea of what TIMESTAMP |
| is. This would cause problems muxing the encoded stream in other muxing |
| formats, or for streaming. Note that this is what was done in GStreamer 0.8 |
| - setting DURATION to GP_TIME - TIMESTAMP: bad - this breaks the concept of |
| duration for this frame. Take the video example above; each buffer would |
| have a correct timestamp, but always a 0.1 s duration as opposed to the |
| correct 0.2 s duration |
| - subclassing GstBuffer: clean, but requires a common header used between |
| ogg muxer and all encoders that can be muxed into ogg. Also, what if |
| a format can be muxed into more than one container, and they each have |
| their own "extra" info to communicate ? |
| - adding key/value pairs to GstBuffer: clean, but requires changes to |
| core. Also, the overhead of allocating e.g. a GstStructure for *each* buffer |
| may be expensive. |
| - "cheating": |
| - abuse OFFSET to store the timestamp matching this granulepos |
| - abuse OFFSET_END to store the granulepos value |
| The drawback here is that before, it made sense to use OFFSET and OFFSET_END |
| to store a byte count. Given that this is not used for anything critical |
| (you can't store a raw theora or vorbis stream in a file anyway), |
| this is what's being done for now. |
| |
| In practice |
| ----------- |
| - all encoders of formats that can be muxed into Ogg produce a stream where: |
| - OFFSET is abused to be the timestamp corresponding exactly to the |
| granulepos |
| - OFFSET_END is abused to be the granulepos of the encoded theora buffer |
| - TIMESTAMP is the timestamp matching the begin of the buffer |
| - DURATION is the length in time of the buffer |
| |
| - initial delays should be handled in the GStreamer encoders by mangling |
| the granulepos of the encoded packet to take the delay into account as |
| best as possible and store that in OFFSET; |
| this then brings TIMESTAMP + DURATION to within less |
| than a frame period of the granulepos's time representation |
| The ogg muxer will then create new ogg packets with this OFFSET as |
| the granulepos. So in effect, the granulepos produced by the encoders |
| does not get used directly. |
| |
| TODO |
| ---- |
| - decide on a proper mechanism for communicating extra per-buffer fields |
| - the ogg muxer sets timestamp and duration on outgoing ogg pages based on |
| timestamp/duration of incoming ogg packets. |
| Note that: |
| - since the ogg muxer *has* to output pages sorted by gp time, representing |
| end time of the page, this means that the buffer's timestamps are not |
| necessarily monotonically increasing |
| - timestamp + duration of buffers don't match up; the duration represents |
| the length of the ogg page *for that stream*. Hence, for a normal |
| two-stream file, the sum of all durations is twice the length of the |
| muxed file. |
| |
| TESTING |
| ------- |
| Proper muxing can be tested by generating test files with command lines like: |
| - video and audio start from 0: |
| gst-launch -v videotestsrc ! theoraenc ! oggmux audiotestsrc ! audioconvert ! vorbisenc ! identity ! oggmux0. oggmux0. ! filesink location=test.ogg |
| |
| - video starts after audio: |
| gst-launch -v videotestsrc timestamp-offset=500000000 ! theoraenc ! oggmux audiotestsrc ! audioconvert ! vorbisenc ! identity ! oggmux0. oggmux0. ! filesink location=test.ogg |
| |
| - audio starts after video: |
| gst-launch -v videotestsrc ! theoraenc ! oggmux audiotestsrc timestamp-offset=500000000 ! audioconvert ! vorbisenc ! identity ! oggmux0. oggmux0. ! filesink location=test.ogg |
| |
| The resulting files can be verified with oggz-validate for correctness. |