| ELEMENTS (v4lsrc, alsasrc, osssrc) |
| -------- |
| - capturing elements should not do fps/sample rate correction themselves |
| they should timestamp buffers according to "a clock", period. |
| |
| - if the element is the clock provider: |
| - timestamp buffers based on the internals of the clock it's providing, |
| without calling the exposed clock functions |
| - do this by getting a measure of elapsed time based on the internal clock |
| that is being wrapped. Ie., count the number of samples the *device* |
| has processed/dropped/... |
| If there are no underruns, the produced buffers are a contiguous data |
| stream. |
| - possibilities: |
| - the device has a method to query for the absolute time related to |
| a buffer you're about to capture or just have captured: |
| Use that time as the timestamp on the capture buffer |
| (it's important that this time is related to the capture buffer; |
| ie. it's a time that "stands still" if you're not capturing) |
| - since you're providing the clocking, but don't have the previous method, |
| you should open the device with a given rate and continuously read |
| samples from it, even in PAUSED. This allows you to update an internal |
| clock. |
| You use this internal clock as well to timestamp the buffers going out, |
| so you again form a contiguous set of buffers. |
| The only acceptable way to continuously read samples then is in a private |
| thread. |
| - as long as no underruns happen, the flow being output is a perfect stream: |
| the flow is data-contiguous and time-contiguous. |
| - underruns should be handled like this: |
| - if the code can detect how many samples it dropped, it should just |
| send the next buffer with the new correct offset. Ie, it produced |
| a data gap, and since it provides the clock, it produces a perfect |
| data gap (the timestamp will be correctly updated too). |
| - if it cannot detect how many samples it dropped, there's a fallback |
| algorithm. The element uses another GstClock (for example, system clock) |
| on which it corrects the skew and drift continuously as long as it |
| doesn't drop. When it detected a drop, it can get the time delta |
| on the other GstClock since the last time it captured and the current |
| time, and use that delta to guesstimate the number of samples dropped. |
| |
| - if the element is not the clock provider |
| - the element should always respect the clock it is given. |
| - the element should timestamp outgoing buffers based on time given by |
| the provided clock, by querying for the time on that clock, and |
| comparing to the base time. |
| - the element should NOT drop/add frames. Rather, it should just |
| - timestamp the buffers with the current time according to the provided |
| clock |
| - set the duration according to the *theoretical/nominal* framerate |
| - when underruns happen (the device has lost capture data because our |
| element is not handling them quickly enough), this should be detectable |
| by the element through the device. On underrun, the offset of your |
| next buffer will not match the end_offset of your previous one |
| (ie, the data flow is no longer contiguous). |
| If the exact number of samples dropped is detectable, this is the |
| difference between new offset and old offset_end. |
| If it's not detectable, it should be guessed based on the elapsed time |
| between now and the last capture. |
| |
| - a second element can be responsible for making the stream time-contiguous. |
| (ie, T1 + D1 = T2 for all buffers). This way they are made |
| acceptable for gapless presentation (which is useful for audio). |
| - The element treats the incoming stream as data-contiguous but not |
| necessarily time-contiguous. |
| - If the timestamps are contiguous as well, then everything is fine and |
| nothing needs to be done. This is the case where a file is being read |
| from disk, or capturing was done by an element that provided the clock. |
| - If they are not contiguous, then this element must make them so. |
| Since it should respect the nominal framerate, it has to stretch or |
| shorten the incoming data to match the timestamps set on the data. |
| For audio and video, this means it could interpolate or add/drop samples. |
| For audio, resampling/interpolation is preferred. |
| For video, a simple mechanism that chooses the frame with a timestamp as |
| close as possible to the theoretical timestamp could be used. |
| - When it receives a new buffer that is not data-contiguous with the |
| previous one, the capture element dropped samples/frames. |
| The adjuster can correct this by sending out as much "no-signal" data |
| (for audio, e.g. silence or background noise; for video, sending out |
| black frames) as it wants, since a data discontinuity is unrepairable. |
| So it can use these to catch up more aggressively. |
| It should just make sure that the next buffer it gets again goes |
| back to respecting the nominal framerate. |
| |
| - To achieve the best possible long-time capture, the following can be done: |
| - audiosrc captures audio and provides the clock. It does contiguous |
| timestamping by default. |
| - videosrc captures video timestamped with the audiosrc's clock. This data |
| feed doesn't match the nominal framerate. If there is an encoding format |
| that supports storing the actual timestamps instead of pretending the |
| data flow respects the nominal framerate, this can be corrected after |
| recording. |
| - at the end of recording, the absolute length in time of both streams, |
| measured against a common clock, is the same or can be made the same by |
| chopping off data. |
| - the nominal rate of both audio and video is also known. |
| - given the length and the nominal rate, we have an evenly spaced list |
| of theoretical sampling points. |
| - video frames can now be matched to these theoretical sampling points by |
| interpolating or reusing/dropping frames. It can choose the best |
| possible algorithm for this to decrease the visible effects |
| (interpolating results in blur, add/drop frames results in jerkiness). |
| - with the video resampled at the theoretical framerate, and the audio |
| already correct, the recording can now be muxed correctly into a format |
| that implicitly assumes a data rate matching the nominal framerate. |
| - One possibility is to use the GDP to store the recording, because that |
| retains all of the timestamping information. |
| - The process is symmetrical; if you want to use the clock provided by |
| the video capturer, you can stretch/shrink the audio at the end of |
| recording to match. |
| |
| TERMINOLOGY |
| ----------- |
| - nominal rate |
| the framerate/samplerate |
| exposed in the caps; ie. the theoretical framerate of the |
| data flow. This is the fps reported by the device or set for the encoder, |
| or the sampling rate of the audio device. |
| - contiguous data flow |
| offset_end of old buffer matches offset of new buffer |
| for audio, this is a more important requirement, since you configure |
| output devices for a contiguous data flow. |
| - contiguous time flow |
| T1 + D1 = T2 |
| for video, this is a more important requirement, because the sampling |
| period is bigger, so it is more important to match the presentation time |
| - "perfect stream" |
| data and time are contiguous and match the nominal rate |
| videotestsrc, sinesrc, filesrc ! decoder produce this |
| |
| NETWORK |
| ------- |
| - elements can be synchronized by writing a NTP clock subclass that listens |
| to an ntp server, and tries to match its own clock against the NTP server |
| by doing gradual rate adjustment, compared with the own system clock. |
| - sending audio and video over the network using tcpserversink is possible |
| when the streams are made to be perfect streams and synchronized. |
| Since the streams are perfect and synchronized, the timestamps transmitted |
| along with the buffers can be trusted. The client just has to make |
| sure that it respects the timestamps. |
| - One good way of doing that is to make an element that provides a clock |
| based on the timestamps of the data stream, interpolating using another |
| GstClock inbetween those time points. This allows you to create |
| a perfect network stream player (one that doesn't lag (increasing buffers)) |
| or play too fast (having an empty network queue). |
| - On the client side, a GStreamer-ish way to do that is to cut the playback |
| pipeline in half, and have a decoupled element that converts |
| timestamps/durations (by resampling/interpolating/...) so that the sinks |
| consume data at the same rate the tcp sources provide it. |
| tcpclientsrc ! theoradec ! clocker name=clocker { clocker. ! xvimagesink } |
| |
| SYNCHRONISATION |
| --------------- |
| - low rate source with high rate source: |
| the high rate source can drop samples so it starts with the same phase |
| as the low rate source. This could be done in a synchronizer element. |
| example: |
| - audio, 8000 Hz, and video, 5 fps |
| - pipeline goes to playing |
| - video src does capture and receives its first frame 50 ms after playing |
| -> phase is -90 or 270 degrees |
| - to compensate, the equivalent of 150 ms of audio could be dropped so |
| that the first videoframe's timestamp coincides with the timestamp of |
| the first audio buffer |
| - this should be done in the raw audio domain since it's typically not |
| possible to chop off samples in the encoded domain |
| |
| - two low rate sources: |
| not possible to do this correctly, maybe something in the middle can be |
| found ? |
| |
| IMPROVING QUALITY |
| ----------------- |
| - video src can capture at a higher framerate than will be encoded |
| - this gives the corrector more frames to choose from or interpolate with |
| to match the target framerate, reducing jerkiness. |
| e.g. capturing at 15 fps for 5 fps framerate. |
| |
| LIVE CHANGES IN PIPELINE |
| ------------------------ |
| - case 1: video recording for some time, user wants to add audio recording on |
| the fly |
| - user sets complete pipeline to paused |
| - user adds element for audio recording |
| - new element gets same base time as video element |
| - on PLAYING, new element will be in sync and the first buffer produced |
| will have a non-zero timestamp that is the same as the first new video |
| buffer |
| |
| - case 2: video recording for some time, user wants to add in an audio file |
| from disk. |
| - two possible expectations: |
| A) user expects the audio file to "start playing now" and be muxed |
| together with the current video frames |
| B) user expects the audio file to "start playing from the point where the |
| video currently is" (ie, video is at 10 seconds, so mux with audio |
| starting from 10 secs) |
| - case A): |
| - complete pipeline gets paused |
| - filesrc ! dec added |
| - both get base_time same as video element |
| - pipeline to playing |
| - all elements receive new "now" as base_time so timestamps are reset |
| - muxer will receive synchronized data from both |
| - case B): |
| nothing gets paused |
| - filesrc ! dec added |
| - both get base_time that is the current clock time |
| - pipeline to playing |
| - core sets |
| 1) - new audio part starts sending out data with timestamp 0 from start |
| of file |
| - muxer receives a whole set of frames from the audio side that are late |
| (since the timestamps start at 0), so keeps dropping until it has |
| caught up with the current set). |
| OR |
| 2) - audio part does clock query |
| |
| THINGS TO DIG UP |
| ---------------- |
| - is there a better way to get at "when was this frame captured" then doing |
| a clock query after capturing ? |
| Imagine a video device with a hardware buffer of four frames. If you |
| haven't asked for a frame from it in a while, three frames could be |
| queued up. So three consecutive frame gets result in immediate returns |
| with pretty much the same clock query for each of them. |
| So we should find a way to get "a comparable clock time" corresponding |
| to the captured frame. |
| |
| - v4l2 api returns a gettimeofday() timestamp with each buffer. |
| Given that, you can timestamp the buffer by subtracting the delta |
| between the buffer's clock timestamp with the current system clock time, |
| from the current time reported by the provided clock. |