| Encoding and Muxing |
| ------------------- |
| |
| Summary |
| ------- |
| A. Problems |
| B. Goals |
| 1. EncodeBin |
| 2. Encoding Profile System |
| 3. Helper Library for Profiles |
| I. Use-cases researched |
| |
| |
| A. Problems this proposal attempts to solve |
| ------------------------------------------- |
| |
| * Duplication of pipeline code for gstreamer-based applications |
| wishing to encode and or mux streams, leading to subtle differences |
| and inconsistencies across those applications. |
| |
| * No unified system for describing encoding targets for applications |
| in a user-friendly way. |
| |
| * No unified system for creating encoding targets for applications, |
| resulting in duplication of code across all applications, |
| differences and inconsistencies that come with that duplication, |
| and applications hardcoding element names and settings resulting in |
| poor portability. |
| |
| |
| |
| B. Goals |
| -------- |
| |
| 1. Convenience encoding element |
| |
| Create a convenience GstBin for encoding and muxing several streams, |
| hereafter called 'EncodeBin'. |
| |
| This element will only contain one single property, which is a |
| profile. |
| |
| 2. Define a encoding profile system |
| |
| 2. Encoding profile helper library |
| |
| Create a helper library to: |
| * create EncodeBin instances based on profiles, and |
| * help applications to create/load/save/browse those profiles. |
| |
| |
| |
| |
| 1. EncodeBin |
| ------------ |
| |
| 1.1 Proposed API |
| ---------------- |
| |
| EncodeBin is a GstBin subclass. |
| |
| It implements the GstTagSetter interface, by which it will proxy the |
| calls to the muxer. |
| |
| Only two introspectable property (i.e. usable without extra API): |
| * A GstEncodingProfile* |
| * The name of the profile to use |
| |
| When a profile is selected, encodebin will: |
| * Add REQUEST sinkpads for all the GstStreamProfile |
| * Create the muxer and expose the source pad |
| |
| Whenever a request pad is created, encodebin will: |
| * Create the chain of elements for that pad |
| * Ghost the sink pad |
| * Return that ghost pad |
| |
| This allows reducing the code to the minimum for applications |
| wishing to encode a source for a given profile: |
| |
| ... |
| |
| encbin = gst_element_factory_make("encodebin, NULL); |
| g_object_set (encbin, "profile", "N900/H264 HQ", NULL); |
| gst_element_link (encbin, filesink); |
| |
| ... |
| |
| vsrcpad = gst_element_get_src_pad(source, "src1"); |
| vsinkpad = gst_element_get_request_pad (encbin, "video_%u"); |
| gst_pad_link(vsrcpad, vsinkpad); |
| |
| ... |
| |
| |
| 1.2 Explanation of the Various stages in EncodeBin |
| -------------------------------------------------- |
| |
| This describes the various stages which can happen in order to end |
| up with a multiplexed stream that can then be stored or streamed. |
| |
| 1.2.1 Incoming streams |
| |
| The streams fed to EncodeBin can be of various types: |
| |
| * Video |
| * Uncompressed (but maybe subsampled) |
| * Compressed |
| * Audio |
| * Uncompressed (audio/x-raw) |
| * Compressed |
| * Timed text |
| * Private streams |
| |
| |
| 1.2.2 Steps involved for raw video encoding |
| |
| (0) Incoming Stream |
| |
| (1) Transform raw video feed (optional) |
| |
| Here we modify the various fundamental properties of a raw video |
| stream to be compatible with the intersection of: |
| * The encoder GstCaps and |
| * The specified "Stream Restriction" of the profile/target |
| |
| The fundamental properties that can be modified are: |
| * width/height |
| This is done with a video scaler. |
| The DAR (Display Aspect Ratio) MUST be respected. |
| If needed, black borders can be added to comply with the target DAR. |
| * framerate |
| * format/colorspace/depth |
| All of this is done with a colorspace converter |
| |
| (2) Actual encoding (optional for raw streams) |
| |
| An encoder (with some optional settings) is used. |
| |
| (3) Muxing |
| |
| A muxer (with some optional settings) is used. |
| |
| (4) Outgoing encoded and muxed stream |
| |
| |
| 1.2.3 Steps involved for raw audio encoding |
| |
| This is roughly the same as for raw video, expect for (1) |
| |
| (1) Transform raw audo feed (optional) |
| |
| We modify the various fundamental properties of a raw audio stream to |
| be compatible with the intersection of: |
| * The encoder GstCaps and |
| * The specified "Stream Restriction" of the profile/target |
| |
| The fundamental properties that can be modifier are: |
| * Number of channels |
| * Type of raw audio (integer or floating point) |
| * Depth (number of bits required to encode one sample) |
| |
| |
| 1.2.4 Steps involved for encoded audio/video streams |
| |
| Steps (1) and (2) are replaced by a parser if a parser is available |
| for the given format. |
| |
| |
| 1.2.5 Steps involved for other streams |
| |
| Other streams will just be forwarded as-is to the muxer, provided the |
| muxer accepts the stream type. |
| |
| |
| |
| |
| 2. Encoding Profile System |
| -------------------------- |
| |
| This work is based on: |
| * The existing GstPreset system for elements [0] |
| * The gnome-media GConf audio profile system [1] |
| * The investigation done into device profiles by Arista and |
| Transmageddon [2 and 3] |
| |
| 2.2 Terminology |
| --------------- |
| |
| * Encoding Target Category |
| A Target Category is a classification of devices/systems/use-cases |
| for encoding. |
| |
| Such a classification is required in order for: |
| * Applications with a very-specific use-case to limit the number of |
| profiles they can offer the user. A screencasting application has |
| no use with the online services targets for example. |
| * Offering the user some initial classification in the case of a |
| more generic encoding application (like a video editor or a |
| transcoder). |
| |
| Ex: |
| Consumer devices |
| Online service |
| Intermediate Editing Format |
| Screencast |
| Capture |
| Computer |
| |
| * Encoding Profile Target |
| A Profile Target describes a specific entity for which we wish to |
| encode. |
| A Profile Target must belong to at least one Target Category. |
| It will define at least one Encoding Profile. |
| |
| Ex (with category): |
| Nokia N900 (Consumer device) |
| Sony PlayStation 3 (Consumer device) |
| Youtube (Online service) |
| DNxHD (Intermediate editing format) |
| HuffYUV (Screencast) |
| Theora (Computer) |
| |
| * Encoding Profile |
| A specific combination of muxer, encoders, presets and limitations. |
| |
| Ex: |
| Nokia N900/H264 HQ |
| Ipod/High Quality |
| DVD/Pal |
| Youtube/High Quality |
| HTML5/Low Bandwith |
| DNxHD |
| |
| 2.3 Encoding Profile |
| -------------------- |
| |
| An encoding profile requires the following information: |
| |
| * Name |
| This string is not translatable and must be unique. |
| A recommendation to guarantee uniqueness of the naming could be: |
| <target>/<name> |
| * Description |
| This is a translatable string describing the profile |
| * Muxing format |
| This is a string containing the GStreamer media-type of the |
| container format. |
| * Muxing preset |
| This is an optional string describing the preset(s) to use on the |
| muxer. |
| * Multipass setting |
| This is a boolean describing whether the profile requires several |
| passes. |
| * List of Stream Profile |
| |
| 2.3.1 Stream Profiles |
| |
| A Stream Profile consists of: |
| |
| * Type |
| The type of stream profile (audio, video, text, private-data) |
| * Encoding Format |
| This is a string containing the GStreamer media-type of the encoding |
| format to be used. If encoding is not to be applied, the raw audio |
| media type will be used. |
| * Encoding preset |
| This is an optional string describing the preset(s) to use on the |
| encoder. |
| * Restriction |
| This is an optional GstCaps containing the restriction of the |
| stream that can be fed to the encoder. |
| This will generally containing restrictions in video |
| width/heigh/framerate or audio depth. |
| * presence |
| This is an integer specifying how many streams can be used in the |
| containing profile. 0 means that any number of streams can be |
| used. |
| * pass |
| This is an integer which is only meaningful if the multipass flag |
| has been set in the profile. If it has been set it indicates which |
| pass this Stream Profile corresponds to. |
| |
| 2.4 Example profile |
| ------------------- |
| |
| The representation used here is XML only as an example. No decision is |
| made as to which formatting to use for storing targets and profiles. |
| |
| <gst-encoding-target> |
| <name>Nokia N900</name> |
| <category>Consumer Device</category> |
| <profiles> |
| <profile>Nokia N900/H264 HQ</profile> |
| <profile>Nokia N900/MP3</profile> |
| <profile>Nokia N900/AAC</profile> |
| </profiles> |
| </gst-encoding-target> |
| |
| <gst-encoding-profile> |
| <name>Nokia N900/H264 HQ</name> |
| <description> |
| High Quality H264/AAC for the Nokia N900 |
| </description> |
| <format>video/quicktime,variant=iso</format> |
| <streams> |
| <stream-profile> |
| <type>audio</type> |
| <format>audio/mpeg,mpegversion=4</format> |
| <preset>Quality High/Main</preset> |
| <restriction>audio/x-raw,channels=[1,2]</restriction> |
| <presence>1</presence> |
| </stream-profile> |
| <stream-profile> |
| <type>video</type> |
| <format>video/x-h264</format> |
| <preset>Profile Baseline/Quality High</preset> |
| <restriction> |
| video/x-raw,width=[16, 800],\ |
| height=[16, 480],framerate=[1/1, 30000/1001] |
| </restriction> |
| <presence>1</presence> |
| </stream-profile> |
| </streams> |
| |
| </gst-encoding-profile> |
| |
| 2.5 API |
| ------- |
| A proposed C API is contained in the gstprofile.h file in this directory. |
| |
| |
| 2.6 Modifications required in the existing GstPreset system |
| ----------------------------------------------------------- |
| |
| 2.6.1. Temporary preset. |
| |
| Currently a preset needs to be saved on disk in order to be |
| used. |
| |
| This makes it impossible to have temporary presets (that exist only |
| during the lifetime of a process), which might be required in the |
| new proposed profile system |
| |
| 2.6.2 Categorisation of presets. |
| |
| Currently presets are just aliases of a group of property/value |
| without any meanings or explanation as to how they exclude each |
| other. |
| |
| Take for example the H264 encoder. It can have presets for: |
| * passes (1,2 or 3 passes) |
| * profiles (Baseline, Main, ...) |
| * quality (Low, medium, High) |
| |
| In order to programmatically know which presets exclude each other, |
| we here propose the categorisation of these presets. |
| |
| This can be done in one of two ways |
| 1. in the name (by making the name be [<category>:]<name>) |
| This would give for example: "Quality:High", "Profile:Baseline" |
| 2. by adding a new _meta key |
| This would give for example: _meta/category:quality |
| |
| 2.6.3 Aggregation of presets. |
| |
| There can be more than one choice of presets to be done for an |
| element (quality, profile, pass). |
| |
| This means that one can not currently describe the full |
| configuration of an element with a single string but with many. |
| |
| The proposal here is to extend the GstPreset API to be able to set |
| all presets using one string and a well-known separator ('/'). |
| |
| This change only requires changes in the core preset handling code. |
| |
| This would allow doing the following: |
| gst_preset_load_preset (h264enc, |
| "pass:1/profile:baseline/quality:high"); |
| |
| 2.7 Points to be determined |
| --------------------------- |
| |
| This document hasn't determined yet how to solve the following |
| problems: |
| |
| 2.7.1 Storage of profiles |
| |
| One proposal for storage would be to use a system wide directory |
| (like $prefix/share/gstreamer-0.10/profiles) and store XML files for |
| every individual profiles. |
| |
| Users could then add their own profiles in ~/.gstreamer-0.10/profiles |
| |
| This poses some limitations as to what to do if some applications |
| want to have some profiles limited to their own usage. |
| |
| |
| 3. Helper library for profiles |
| ------------------------------ |
| |
| These helper methods could also be added to existing libraries (like |
| GstPreset, GstPbUtils, ..). |
| |
| The various API proposed are in the accompanying gstprofile.h file. |
| |
| 3.1 Getting user-readable names for formats |
| |
| This is already provided by GstPbUtils. |
| |
| 3.2 Hierarchy of profiles |
| |
| The goal is for applications to be able to present to the user a list |
| of combo-boxes for choosing their output profile: |
| |
| [ Category ] # optional, depends on the application |
| [ Device/Site/.. ] # optional, depends on the application |
| [ Profile ] |
| |
| Convenience methods are offered to easily get lists of categories, |
| devices, and profiles. |
| |
| 3.3 Creating Profiles |
| |
| The goal is for applications to be able to easily create profiles. |
| |
| The applications needs to be able to have a fast/efficient way to: |
| * select a container format and see all compatible streams he can use |
| with it. |
| * select a codec format and see which container formats he can use |
| with it. |
| |
| The remaining parts concern the restrictions to encoder |
| input. |
| |
| 3.4 Ensuring availability of plugins for Profiles |
| |
| When an application wishes to use a Profile, it should be able to |
| query whether it has all the needed plugins to use it. |
| |
| This part will use GstPbUtils to query, and if needed install the |
| missing plugins through the installed distribution plugin installer. |
| |
| |
| I. Use-cases researched |
| ----------------------- |
| |
| This is a list of various use-cases where encoding/muxing is being |
| used. |
| |
| * Transcoding |
| |
| The goal is to convert with as minimal loss of quality any input |
| file for a target use. |
| A specific variant of this is transmuxing (see below). |
| |
| Example applications: Arista, Transmageddon |
| |
| * Rendering timelines |
| |
| The incoming streams are a collection of various segments that need |
| to be rendered. |
| Those segments can vary in nature (i.e. the video width/height can |
| change). |
| This requires the use of identiy with the single-segment property |
| activated to transform the incoming collection of segments to a |
| single continuous segment. |
| |
| Example applications: PiTiVi, Jokosher |
| |
| * Encoding of live sources |
| |
| The major risk to take into account is the encoder not encoding the |
| incoming stream fast enough. This is outside of the scope of |
| encodebin, and should be solved by using queues between the sources |
| and encodebin, as well as implementing QoS in encoders and sources |
| (the encoders emitting QoS events, and the upstream elements |
| adapting themselves accordingly). |
| |
| Example applications: camerabin, cheese |
| |
| * Screencasting applications |
| |
| This is similar to encoding of live sources. |
| The difference being that due to the nature of the source (size and |
| amount/frequency of updates) one might want to do the encoding in |
| two parts: |
| * The actual live capture is encoded with a 'almost-lossless' codec |
| (such as huffyuv) |
| * Once the capture is done, the file created in the first step is |
| then rendered to the desired target format. |
| |
| Fixing sources to only emit region-updates and having encoders |
| capable of encoding those streams would fix the need for the first |
| step but is outside of the scope of encodebin. |
| |
| Example applications: Istanbul, gnome-shell, recordmydesktop |
| |
| * Live transcoding |
| |
| This is the case of an incoming live stream which will be |
| broadcasted/transmitted live. |
| One issue to take into account is to reduce the encoding latency to |
| a minimum. This should mostly be done by picking low-latency |
| encoders. |
| |
| Example applications: Rygel, Coherence |
| |
| * Transmuxing |
| |
| Given a certain file, the aim is to remux the contents WITHOUT |
| decoding into either a different container format or the same |
| container format. |
| Remuxing into the same container format is useful when the file was |
| not created properly (for example, the index is missing). |
| Whenever available, parsers should be applied on the encoded streams |
| to validate and/or fix the streams before muxing them. |
| |
| Metadata from the original file must be kept in the newly created |
| file. |
| |
| Example applications: Arista, Transmaggedon |
| |
| * Loss-less cutting |
| |
| Given a certain file, the aim is to extract a certain part of the |
| file without going through the process of decoding and re-encoding |
| that file. |
| This is similar to the transmuxing use-case. |
| |
| Example applications: PiTiVi, Transmageddon, Arista, ... |
| |
| * Multi-pass encoding |
| |
| Some encoders allow doing a multi-pass encoding. |
| The initial pass(es) are only used to collect encoding estimates and |
| are not actually muxed and outputted. |
| The final pass uses previously collected information, and the output |
| is then muxed and outputted. |
| |
| * Archiving and intermediary format |
| |
| The requirement is to have lossless |
| |
| * CD ripping |
| |
| Example applications: Sound-juicer |
| |
| * DVD ripping |
| |
| Example application: Thoggen |
| |
| |
| |
| * Research links |
| |
| Some of these are still active documents, some other not |
| |
| [0] GstPreset API documentation |
| http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstPreset.html |
| |
| [1] gnome-media GConf profiles |
| http://www.gnome.org/~bmsmith/gconf-docs/C/gnome-media.html |
| |
| [2] Research on a Device Profile API |
| http://gstreamer.freedesktop.org/wiki/DeviceProfile |
| |
| [3] Research on defining presets usage |
| http://gstreamer.freedesktop.org/wiki/PresetDesign |
| |