9 2. Encoding Profile System
10 3. Helper Library for Profiles
11 I. Use-cases researched
14 A. Problems this proposal attempts to solve
15 -------------------------------------------
17 * Duplication of pipeline code for gstreamer-based applications
18 wishing to encode and or mux streams, leading to subtle differences
19 and inconsistencies accross those applications.
21 * No unified system for describing encoding targets for applications
22 in a user-friendly way.
24 * No unified system for creating encoding targets for applications,
25 resulting in duplication of code accross all applications,
26 differences and inconsistencies that come with that duplication,
27 and applications hardcoding element names and settings resulting in
35 1. Convenience encoding element
37 Create a convenience GstBin for encoding and muxing several streams,
38 hereafter called 'EncodeBin'.
40 This element will only contain one single property, which is a
43 2. Define a encoding profile system
45 2. Encoding profile helper library
47 Create a helper library to:
48 * create EncodeBin instances based on profiles, and
49 * help applications to create/load/save/browse those profiles.
60 EncodeBin is a GstBin subclass.
62 It implements the GstTagSetter interface, by which it will proxy the
65 Only two introspectable property (i.e. usable without extra API):
66 * A GstEncodingProfile*
67 * The name of the profile to use
69 When a profile is selected, encodebin will:
70 * Add REQUEST sinkpads for all the GstStreamProfile
71 * Create the muxer and expose the source pad
73 Whenever a request pad is created, encodebin will:
74 * Create the chain of elements for that pad
76 * Return that ghost pad
78 This allows reducing the code to the minimum for applications
79 wishing to encode a source for a given profile:
83 encbin = gst_element_factory_make("encodebin, NULL);
84 g_object_set (encbin, "profile", "N900/H264 HQ", NULL);
85 gst_element_link (encbin, filesink);
89 vsrcpad = gst_element_get_src_pad(source, "src1");
90 vsinkpad = gst_element_get_request_pad (encbin, "video_%d");
91 gst_pad_link(vsrcpad, vsinkpad);
96 1.2 Explanation of the Various stages in EncodeBin
97 --------------------------------------------------
99 This describes the various stages which can happen in order to end
100 up with a multiplexed stream that can then be stored or streamed.
102 1.2.1 Incoming streams
104 The streams fed to EncodeBin can be of various types:
107 * Uncompressed (but maybe subsampled)
110 * Uncompressed (audio/x-raw-{int|float})
116 1.2.2 Steps involved for raw video encoding
120 (1) Transform raw video feed (optional)
122 Here we modify the various fundamental properties of a raw video
123 stream to be compatible with the intersection of:
124 * The encoder GstCaps and
125 * The specified "Stream Restriction" of the profile/target
127 The fundamental properties that can be modified are:
129 This is done with a video scaler.
130 The DAR (Display Aspect Ratio) MUST be respected.
131 If needed, black borders can be added to comply with the target DAR.
133 * format/colorspace/depth
134 All of this is done with a colorspace converter
136 (2) Actual encoding (optional for raw streams)
138 An encoder (with some optional settings) is used.
142 A muxer (with some optional settings) is used.
144 (4) Outgoing encoded and muxed stream
147 1.2.3 Steps involved for raw audio encoding
149 This is roughly the same as for raw video, expect for (1)
151 (1) Transform raw audo feed (optional)
153 We modify the various fundamental properties of a raw audio stream to
154 be compatible with the intersection of:
155 * The encoder GstCaps and
156 * The specified "Stream Restriction" of the profile/target
158 The fundamental properties that can be modifier are:
160 * Type of raw audio (integer or floating point)
161 * Depth (number of bits required to encode one sample)
164 1.2.4 Steps involved for encoded audio/video streams
166 Steps (1) and (2) are replaced by a parser if a parser is available
167 for the given format.
170 1.2.5 Steps involved for other streams
172 Other streams will just be forwarded as-is to the muxer, provided the
173 muxer accepts the stream type.
178 2. Encoding Profile System
179 --------------------------
181 This work is based on:
182 * The existing GstPreset system for elements [0]
183 * The gnome-media GConf audio profile system [1]
184 * The investigation done into device profiles by Arista and
185 Transmageddon [2 and 3]
190 * Encoding Target Category
191 A Target Category is a classification of devices/systems/use-cases
194 Such a classification is required in order for:
195 * Applications with a very-specific use-case to limit the number of
196 profiles they can offer the user. A screencasting application has
197 no use with the online services targets for example.
198 * Offering the user some initial classification in the case of a
199 more generic encoding application (like a video editor or a
205 Intermediate Editing Format
210 * Encoding Profile Target
211 A Profile Target describes a specific entity for which we wish to
213 A Profile Target must belong to at least one Target Category.
214 It will define at least one Encoding Profile.
217 Nokia N900 (Consumer device)
218 Sony PlayStation 3 (Consumer device)
219 Youtube (Online service)
220 DNxHD (Intermediate editing format)
225 A specific combination of muxer, encoders, presets and limitations.
238 An encoding profile requires the following information:
241 This string is not translatable and must be unique.
242 A recommendation to guarantee uniqueness of the naming could be:
245 This is a translatable string describing the profile
247 This is a string containing the GStreamer media-type of the
250 This is an optional string describing the preset(s) to use on the
253 This is a boolean describing whether the profile requires several
255 * List of Stream Profile
257 2.3.1 Stream Profiles
259 A Stream Profile consists of:
262 The type of stream profile (audio, video, text, private-data)
264 This is a string containing the GStreamer media-type of the encoding
265 format to be used. If encoding is not to be applied, the raw audio
266 media type will be used.
268 This is an optional string describing the preset(s) to use on the
271 This is an optional GstCaps containing the restriction of the
272 stream that can be fed to the encoder.
273 This will generally containing restrictions in video
274 width/heigh/framerate or audio depth.
276 This is an integer specifying how many streams can be used in the
277 containing profile. 0 means that any number of streams can be
280 This is an integer which is only meaningful if the multipass flag
281 has been set in the profile. If it has been set it indicates which
282 pass this Stream Profile corresponds to.
287 The representation used here is XML only as an example. No decision is
288 made as to which formatting to use for storing targets and profiles.
290 <gst-encoding-target>
291 <name>Nokia N900</name>
292 <category>Consumer Device</category>
294 <profile>Nokia N900/H264 HQ</profile>
295 <profile>Nokia N900/MP3</profile>
296 <profile>Nokia N900/AAC</profile>
298 </gst-encoding-target>
300 <gst-encoding-profile>
301 <name>Nokia N900/H264 HQ</name>
303 High Quality H264/AAC for the Nokia N900
305 <format>video/quicktime,variant=iso</format>
309 <format>audio/mpeg,mpegversion=4</format>
310 <preset>Quality High/Main</preset>
311 <restriction>audio/x-raw-int,channels=[1,2]</restriction>
312 <presence>1</presence>
316 <format>video/x-h264</format>
317 <preset>Profile Baseline/Quality High</preset>
319 video/x-raw-yuv,width=[16, 800],\
320 height=[16, 480],framerate=[1/1, 30000/1001]
322 <presence>1</presence>
326 </gst-encoding-profile>
330 A proposed C API is contained in the gstprofile.h file in this directory.
333 2.6 Modifications required in the existing GstPreset system
334 -----------------------------------------------------------
336 2.6.1. Temporary preset.
338 Currently a preset needs to be saved on disk in order to be
341 This makes it impossible to have temporary presets (that exist only
342 during the lifetime of a process), which might be required in the
343 new proposed profile system
345 2.6.2 Categorisation of presets.
347 Currently presets are just aliases of a group of property/value
348 without any meanings or explanation as to how they exclude each
351 Take for example the H264 encoder. It can have presets for:
352 * passes (1,2 or 3 passes)
353 * profiles (Baseline, Main, ...)
354 * quality (Low, medium, High)
356 In order to programmatically know which presets exclude each other,
357 we here propose the categorisation of these presets.
359 This can be done in one of two ways
360 1. in the name (by making the name be [<category>:]<name>)
361 This would give for example: "Quality:High", "Profile:Baseline"
362 2. by adding a new _meta key
363 This would give for example: _meta/category:quality
365 2.6.3 Aggregation of presets.
367 There can be more than one choice of presets to be done for an
368 element (quality, profile, pass).
370 This means that one can not currently describe the full
371 configuration of an element with a single string but with many.
373 The proposal here is to extend the GstPreset API to be able to set
374 all presets using one string and a well-known separator ('/').
376 This change only requires changes in the core preset handling code.
378 This would allow doing the following:
379 gst_preset_load_preset (h264enc,
380 "pass:1/profile:baseline/quality:high");
382 2.7 Points to be determined
383 ---------------------------
385 This document hasn't determined yet how to solve the following
388 2.7.1 Storage of profiles
390 One proposal for storage would be to use a system wide directory
391 (like $prefix/share/gstreamer-0.10/profiles) and store XML files for
392 every individual profiles.
394 Users could then add their own profiles in ~/.gstreamer-0.10/profiles
396 This poses some limitations as to what to do if some applications
397 want to have some profiles limited to their own usage.
400 3. Helper library for profiles
401 ------------------------------
403 These helper methods could also be added to existing libraries (like
404 GstPreset, GstPbUtils, ..).
406 The various API proposed are in the accompanying gstprofile.h file.
408 3.1 Getting user-readable names for formats
410 This is already provided by GstPbUtils.
412 3.2 Hierarchy of profiles
414 The goal is for applications to be able to present to the user a list
415 of combo-boxes for choosing their output profile:
417 [ Category ] # optional, depends on the application
418 [ Device/Site/.. ] # optional, depends on the application
421 Convenience methods are offered to easily get lists of categories,
422 devices, and profiles.
424 3.3 Creating Profiles
426 The goal is for applications to be able to easily create profiles.
428 The applications needs to be able to have a fast/efficient way to:
429 * select a container format and see all compatible streams he can use
431 * select a codec format and see which container formats he can use
434 The remaining parts concern the restrictions to encoder
437 3.4 Ensuring availability of plugins for Profiles
439 When an application wishes to use a Profile, it should be able to
440 query whether it has all the needed plugins to use it.
442 This part will use GstPbUtils to query, and if needed install the
443 missing plugins through the installed distribution plugin installer.
446 I. Use-cases researched
447 -----------------------
449 This is a list of various use-cases where encoding/muxing is being
454 The goal is to convert with as minimal loss of quality any input
455 file for a target use.
456 A specific variant of this is transmuxing (see below).
458 Example applications: Arista, Transmageddon
460 * Rendering timelines
462 The incoming streams are a collection of various segments that need
464 Those segments can vary in nature (i.e. the video width/height can
466 This requires the use of identiy with the single-segment property
467 activated to transform the incoming collection of segments to a
468 single continuous segment.
470 Example applications: PiTiVi, Jokosher
472 * Encoding of live sources
474 The major risk to take into account is the encoder not encoding the
475 incoming stream fast enough. This is outside of the scope of
476 encodebin, and should be solved by using queues between the sources
477 and encodebin, as well as implementing QoS in encoders and sources
478 (the encoders emitting QoS events, and the upstream elements
479 adapting themselves accordingly).
481 Example applications: camerabin, cheese
483 * Screencasting applications
485 This is similar to encoding of live sources.
486 The difference being that due to the nature of the source (size and
487 amount/frequency of updates) one might want to do the encoding in
489 * The actual live capture is encoded with a 'almost-lossless' codec
491 * Once the capture is done, the file created in the first step is
492 then rendered to the desired target format.
494 Fixing sources to only emit region-updates and having encoders
495 capable of encoding those streams would fix the need for the first
496 step but is outside of the scope of encodebin.
498 Example applications: Istanbul, gnome-shell, recordmydesktop
502 This is the case of an incoming live stream which will be
503 broadcasted/transmitted live.
504 One issue to take into account is to reduce the encoding latency to
505 a minimum. This should mostly be done by picking low-latency
508 Example applications: Rygel, Coherence
512 Given a certain file, the aim is to remux the contents WITHOUT
513 decoding into either a different container format or the same
515 Remuxing into the same container format is useful when the file was
516 not created properly (for example, the index is missing).
517 Whenever available, parsers should be applied on the encoded streams
518 to validate and/or fix the streams before muxing them.
520 Metadata from the original file must be kept in the newly created
523 Example applications: Arista, Transmaggedon
527 Given a certain file, the aim is to extract a certain part of the
528 file without going through the process of decoding and re-encoding
530 This is similar to the transmuxing use-case.
532 Example applications: PiTiVi, Transmageddon, Arista, ...
534 * Multi-pass encoding
536 Some encoders allow doing a multi-pass encoding.
537 The initial pass(es) are only used to collect encoding estimates and
538 are not actually muxed and outputted.
539 The final pass uses previously collected information, and the output
540 is then muxed and outputted.
542 * Archiving and intermediary format
544 The requirement is to have lossless
548 Example applications: Sound-juicer
552 Example application: Thoggen
558 Some of these are still active documents, some other not
560 [0] GstPreset API documentation
561 http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstPreset.html
563 [1] gnome-media GConf profiles
564 http://www.gnome.org/~bmsmith/gconf-docs/C/gnome-media.html
566 [2] Research on a Device Profile API
567 http://gstreamer.freedesktop.org/wiki/DeviceProfile
569 [3] Research on defining presets usage
570 http://gstreamer.freedesktop.org/wiki/PresetDesign