totalProcessingDelay

inbound-rtpinboundaudiovideo

It is the sum of the time, in seconds, each audio sample or video frame takes from the time the first RTP packet is received and to the time the corresponding sample or frame is decoded.

Description

Real number; in seconds

It is the sum of the time, in seconds, each audio sample or video frame takes from the time the first RTP packet is received (reception timestamp) and to the time the corresponding sample or frame is decoded (decoded timestamp). At this point the audio sample or video frame is ready for playout by the MediaStreamTrack. Typically ready for playout here means after the audio sample or video frame is fully decoded by the decoder.

Given the complexities involved, the time of arrival or the reception timestamp is measured as close to the network layer as possible and the decoded timestamp is measured as soon as the complete sample or frame is decoded.

In the case of audio, several samples are received in the same RTP packet, all samples will share the same reception timestamp and different decoded timestamps. In the case of video, the frame is received over several RTP packets, in this case the earliest timestamp containing the frame is counted as the reception timestamp, and the decoded timestamp corresponds to when the complete frame is decoded.

This metric is not incremented for frames that are not decoded, i.e. framesDropped. The average processing delay can be calculated by dividing the totalProcessingDelay with the framesDecoded for video (or provisional stats spec totalSamplesDecoded for audio).

See also

Notes

  • An audio sample refers to having a sample in any channel of an audio track - if multiple audio channels are used, metrics based on samples do not increment at a higher rate, simultaneously having samples in multiple channels counts as a single sample