Internet draft Packetization of H.261 Packetization of H.261 video streams Mon Mar 8, 1993 Expires: October 1993 Thierry Turletti, Christian Huitema INRIA Christian.Huitema@sophia.inria.fr Thierry.Turletti@sophia.inria.fr 1. Status of this Memo This document is an Internet draft. Internet drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts). Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress". Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Internet Draft. Distribution of this document is unlimited. Turletti, Huitema [Page 1] Internet draft Packetization of H.261 2. Purpose of this document The CCITT recommendation H.261 [1] specifies the encodings used by CCITT compliant video-conference codecs. Although these encodings were originally specified for fixed data rate ISDN circuits, experimentations [2] have shown that they can also be used over the internet. The purpose of this memo is to specify how H.261 video streams can be carried over UDP and IP, using the RTP protocol [3]. 3. Structure of the packet stream H.261 codecs produce a bit stream. In fact, H.261 and companion recommendations specifies several levels of encoding: (1) Images are first separated in blocks of 8x8 pixels. Blocks which have moved are encoded by computing the discrete cosine transform (DCT) of their coefficients, which are then quantized and Huffman encoded. (2) The bits resulting of the Huffman encoding are then arranged in 512 bits frames, containing 2 bits of synchronization, 492 bits of data and 18 bits of error correcting code. (3) The 512 bits frames are then interlaced with an audio stream and transmitted over px64 kbps circuits according to specification H.261. When transmitting over the Internet, we will directly consider the output of the Huffman encoding. We will not carry the 512 bits frames, as protection against errors can be obtained by other means. Similarly, we will not attempt to multiplex audio and video signals in the same packets, as UDP and RTP provide a much more efficient way to achieve multiplexing. Directly transmitting the result of the Huffman encoding over an unreliable stream of UDP datagrams would however have very poor error resistance characteristics. The H.261 coding is in fact organized as a sequence of images, or frames, which are themselves organized as a set of Groups of Blocks (GOB). Each GOB holds a set of 3 lines of 11 macro blocs (MB). Each MB Turletti, Huitema [Page 2] Internet draft Packetization of H.261 carries information on a group of 16x16 pixels: luminance information is specified for 4 blocks of 8x8 pixels, while chrominance information is only given by two 8x8 "red" and "blue" blocks. This grouping is used to specify informations at each level of the hierarchy: - At the frame level, one specifies informations such as the delay from the previous frame, the image format, and various indicators. - At the GOB level, one specifies the GOB number and the default quantifier that will be used for the MBs. - At the MB level, one specifies which blocks are presents and which did not change, and optionally a quantifier, as well as precisions on the codings such as distance vectors. The result of this structure is that one need to receive the informations present in the frame header to decode the GOBs, as well as the informations present in the GOB header to decode the MBs. Without precautions, this would mean that one has to receive all the packets that carry an image in order to properly decode its components. In fact, the experience as shown that: (1) It would be unrealistic to carry an image on a single packet: video images can sometime be very large. (2) GOB informations would most often be correctly sized to fit in a packet. In fact, several GOBs can often be grouped in a packet. Once we have take the decision to correlate GOB synchronization and packetization, a number of decisions remain to be taken, due to the following conditions: (1) The algorithm should be easy to implement when packetizing the output stream of an hardware codec. (2) The algorithm should not induce rendition delays -- we should not have to wait for a following packet to display an image. Turletti, Huitema [Page 3] Internet draft Packetization of H.261 (3) The algorithm should allow for efficient resynchronization in case of packet losses. (4) It should be easy to depacketize the data stream and direct it to an hardware codec's input. (5) When the hardware decoder operates at a fixed bit rate, one should be able to maintain synchronization, e.g. by adding padding bits when the packet arrival rate is slower than the bit rate. The H.261 Huffmans encoding includes a special "GOB start" pattern, composed of 15 zeroes followed by a single 1, that cannot be imitated by any other code words. That patterns mark the separation between two GOBs, and is in fact used as an indicator that the current GOB is terminated. The encoding also include a stuffing pattern, composed of seven zeroes followed by four ones; that stuffing pattern can only be entered between the encoding of MBs, or just before the GOB separator. The first conclusion of the analysis is that the packets should contain all the GOB data, including the "GOB start" pattern that separate the current block from its follower. In fact, as this pattern is well known, we could as well use a single bit in the data header to indicate it's presence. Not encoding the GOB-start pattern has two advantages: (1) It reduces the number of bits in the packets, and avoids the possibility of splitting packets in the middle of a GOB separator. (2) It authorize gateways to hardware decoders to insert the stuffing pattern in front of the GOB, in order to meet the fixed bit rate requirement. Another problem posed by the specificities of the H.261 compression is that the GOB data have no particular reason to fit in an integer number of octets. The data header will thus contain two three bits integers, EBIT and SBIT: SBIT indicates the number of bits that should be ignored in the first (start) data octet. Turletti, Huitema [Page 4] Internet draft Packetization of H.261 EBIT indicates the number of bits that should be ignored in the last (end) data octet. Although only the EBIT counter would really be needed for software coders, the SBIT counter was inserted to ease the packetization of hardware coders output. An sample packetization procedure is found in annex A. At the receiving sites, the GOB synchronization can be used in conjunction with the synchronization service of the RTP protocol. In case of losses, the decoders could become desynchronized. The "S" bit of the RTP header will be set to indicate that the packet includes the beginning of the encoding of a GOB, i.e. the quantifier common to all macro blocks. The receiver will detect losses by looking at the RTP sequence numbers. In case of losses, it will ignore all packets whose "S" bit is null. Once an S bit packet has been received, it will prepend the GOB start code to that packet, and resume decoding. A example packetization program is given in Appendix A. Turletti, Huitema [Page 5] Internet draft Packetization of H.261 4. Usage of RTP The H.261 informations are carried as data within the RTP protocol, using the following informations: _____________________________________________ | Ver | Protocol version (1). | |___________|________________________________| | Flow | Identifies one particular | | | video stream. | |___________|________________________________| | Content | H.261 encoded video (31). | |___________|________________________________| | Sequence | Identifies the packet within | | number | a stream | |___________|________________________________| | Sync | Set if the packet is | | | synchronized on an image or | | | on a group of blocks. | |___________|________________________________| | Timestamp | The date at which the | | | image was grabbed. | |___________|________________________________| The very definition of this settings implies that the beginning of an image shall always be synchronized with a packet. The RTP sequence number can be used to detect missing packets. In this case, one shall ignore all incomings packets until the next synchronization mark is received. The H.261 data will follow the RTP options, as in: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Ver| flow |F|S| content | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp (seconds) | timestamp (fraction) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . RTP options (optional) . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | H.261 options | H.261 stream... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Turletti, Huitema [Page 6] Internet draft Packetization of H.261 The H.261 options field is defined as following: 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S|SBIT |E|EBIT |C|I|V|0| FMT | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ _______________________________________________________ | S (1 bit) | Start of GOB. Set if | | | the packet is a start of GOB. | |_______________|______________________________________| | SBIT (3 bits) | Start bit position | | | number of bits that should | | | be ignored in the first | | | (start) data octet. | |_______________|______________________________________| | E (1 bit) | End of GOB. Set if | | | the packet is an end of GOB. | |_______________|______________________________________| | EBIT (3 bits) | End bit position | | | number of bits that should | | | be ignored in the last | | | (end) data octet. | |_______________|______________________________________| | C (1 bit) | Color flag. Set if | | | color is encoded. | |_______________|______________________________________| | I (1 bit) | Full Intra Image flag. | | | Set if it is the first packet | | | of a full intra image. | |_______________|______________________________________| | V (1 bit) | movement Vector flag. | | | Set if movement vectors | | | are encoded. | |_______________|______________________________________| | FMT (4 bits) | Image format: | | | QCIF, CIF or number of CIF in SCIF.| |_______________|______________________________________| The image format (4 bits) is defined as following: Turletti, Huitema [Page 7] Internet draft Packetization of H.261 _____________________________ | QCIF | 0000| |____________________|_______| | CIF | 0001| |____________________|_______| | SCIF 0 | | | upper left corner | 0100| | CIF in SCIF image | | |____________________|_______| | SCIF 1 | | | upper right corner | 0101| | CIF in SCIF image | | |____________________|_______| | SCIF 2 | | | lower left corner | 0110| | CIF in SCIF image | | |____________________|_______| | SCIF 3 | | | lower right corner | 0111| | CIF in SCIF image | | |____________________|_______| Turletti, Huitema [Page 8] Internet draft Packetization of H.261 5. Usage of RTP parameters When sending or receiving H.261 streams through the RTP protocol, the stations should be ready to: (1) process or ignore all generic RTP parameters, (2) send or receive H.261 specific "Reverse Application Data" parameters, to request a video resynchronization. This memo describes two "RAD" item types, "Full Intra Request" and "Negative Acknowledge". 5.1. Controlling the reverse flow Support of the reverse application data by the H.261 sender is optional; in particular, early experiments have shown that the usage of this feature could have very negative effects when the number of recipients is very large. Recipients learn the return address where RAD informations may be sent from the Content description (CDESC) item, which may be included as an RTP option in any of the video packets. The CDESC item includes a Return port number value. A value of zero indicates that no reverse control information should be returned. A recipient shall never send a RAD item if it has not yet received a CDESC item from the source, or if the port number received in the last CDESC item was null. Emitters should identify themselves by sending CDESC items at regular intervals. 5.2. Full Intra Request The "Full Intra Request" items are identified by the item Type "FIR" (0). 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| RAD | length = 1 | Type | Z | Flow | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Turletti, Huitema [Page 9] Internet draft Packetization of H.261 These packets indicate that a recipient has lost all video synchronization, and request the emitter to send the next image in "Intra" coding mode, i.e. without using differential coding. The various fields are defined as follow: ________________________________________________ | F | Last option bit, as defined by RTP.| |________|______________________________________| | RAD | RAD option type (65) | |________|______________________________________| | Length | One 32 bits word. | |________|______________________________________| | Type | FIR (0). | |________|______________________________________| | Z | Must be zero | |________|______________________________________| | Flow | The flow id of the incoming packets| |________|______________________________________| 5.3. Negative Acknowledge Packet losses are detected using the RTP sequence number. After a packet loss, the receiver will resynchronize on the next GOB. However, as H.261 uses differential encoding, parts of the images may remain blurred for a rather long time. As all GOB belonging to a given video image carry the same time stamp, the receiver can determine a list of GOBs which were effectively received for that time stamp, and thus identify the "missing blocks". Requesting a specific reinitialization of these missing blocks is more efficient than requesting a complete reinitialization of the image through the "Full Intra Request" item. Turletti, Huitema [Page 10] Internet draft Packetization of H.261 The format of the video-nack option is as follow: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| RAD | length = 3 | Type | Z | Flow | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | FGOBL | LGOBL | MBZ | FMT | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp (seconds) | timestamp (fraction) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The different fields have the following values: ________________________________________________________ | F | Last option bit, as defined by RTP. | |_______________|_______________________________________| | RAD | RAD option type (65) | | - | | | Length | Three 32 bits word. | |_______________|_______________________________________| | Type | NACK (1). | |_______________|_______________________________________| | MBZ | Must be zero | |_______________|_______________________________________| | Flow | The flow id of the incoming packets | |_______________|_______________________________________| | FGOBL | First GOB Lost: | | | Identifies the first GOB lost number.| |_______________|_______________________________________| | LGOBL | Last GOB Lost: | | | Identifies the last GOB lost number.| |_______________|_______________________________________| | MBZ | Must be zero | |_______________|_______________________________________| | FMT | Repeat the format indicator of the | | | received image, including the number| | | of the SCIF subimage if present. | |_______________|_______________________________________| | Timestamp | The RTP timestamp of the | | original image| | |_______________|_______________________________________| Turletti, Huitema [Page 11] Internet draft Packetization of H.261 6. References [1] Video codec for audiovisual services at p x 64 kbit/s CCITT Recommendation H.261. [2] Thierry Turletti. H.261 software codec for videoconferencing over the Internet INRIA Research Report no 1834 [3] Henning Schulzrinne A Transport Protocol for Real-Time Applications INTERNET-DRAFT, December 15, 1992. Turletti, Huitema [Page 12] Internet draft Packetization of H.261 Appendix A The following code can be used to packetize the output of an H.261 codec: #include #define BUFFER_MAX 512 int right[] = { 8,7,6,6,5,5,5,5,4,4,4,4,4,4,4,4,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3, 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}; int left[] = { 8,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, 6,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, 7,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, 6,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0}; h261_sync(F) FILE *F; { int i, ebit, sbit, start_of_group, end_of_group, c, nz; unsigned char buf[BUFFER_MAX]; int *left, *right; i = 0; ebit = 0; sbit = 0; start_of_group = 1; nz = 0; while (c = getc(F)) { buf[i++] = c; if (c == 0) { Turletti, Huitema [Page 13] Internet draft Packetization of H.261 nz += 8; } else { nz += right[c]; end_of_group = 1; if (nz >= 15) { if (right[c] == 7) { ebit = 0; send_message(buf, i - 2, sbit, ebit, end_of_group, start_of_group); sbit = 0; i = 0; } else { ebit = 7 - right[c]; send_message(buf, i - 2, sbit, ebit, end_of_group, start_of_group); i = 0; buf[i++] = c; sbit = right[c] + 1; } start_of_group = 1; } else { nz = left[c]; if (i >= BUFFER_MAX) { ebit = 0; end_of_group = 0; send_message(buf, i - 2, sbit, ebit, end_of_group, start_of_group); buf[0] = buf[i - 2]; buf[1] = buf[i - 1]; i = 2; sbit = 0; start_of_group = 0; } } } } } Turletti, Huitema [Page 14] Internet draft Packetization of H.261 Table of Contents 1 Status of this Memo ................................... 1 2 Purpose of this document .............................. 2 3 Structure of the packet stream ........................ 2 4 Usage of RTP .......................................... 6 5 Usage of RTP parameters ............................... 9 5.1 Controlling the reverse flow ........................ 9 5.2 Full Intra Request .................................. 9 5.3 Negative Acknowledge ................................ 10 6 References ............................................ 12 Appendix A ............................................. 13 Turletti, Huitema [Page 15]