C:\WINWORD\CCITTREC.DOT_______________







INTERNATIONAL  TELECOMMUNICATION  UNION







CCITT	H.261

THE  INTERNATIONAL
TELEGRAPH  AND  TELEPHONE
CONSULTATIVE  COMMITTEE











LINE  TRANSMISSION
ON  NON-TELEPHONE  SIGNALS





VIDEO  CODEC  FOR  AUDIOVISUAL  SERVICES
AT  p  ×   64  kbit/s











Recommendation  H.261







Geneva, 1990







FOREWORD

	The CCITT (the International Telegraph and Telephone Consultative 
Committee) is the permanent organ of the International Telecommu-
nication Union (ITU). CCITT is responsible for studying technical, 
operating and tariff questions and issuing Recommendations on them 
with a view to standardizing telecommunications on a worldwide 
basis.

	The Plenary Assembly of CCITT which meets every four years, 
establishes the topics for study and approves Recommendations pre-
pared by its Study Groups. The approval of Recommendations by the 
members of CCITT between Plenary Assemblies is covered by the 
procedure laid down in CCITT Resolution No. 2 (Melbourne, 1988).

	Recommendation H.261 was prepared by Study Group XV and was 
approved under the Resolution No. 2 procedure on the 14 of December 
1990.





___________________





CCITT  NOTE

	In this Recommendation, the expression “Administration” is used for 
conciseness to indicate both a telecommunication Administration and 
a recognized private operating agency.



















ãITU1990

All rights reserved. No part of this publication may be reproduced or uti-
lized in any form or by any means, electronic or mechanical, including pho-
tocopying and microfilm, without permission in writing from the ITU.

PAGE BLANCHE

Recommendation H.261

Recommendation H.261

VIDEO CODEC FOR AUDIOVISUAL SERVICES AT p ´ 64 kbit/s


(revised 1990)

	The CCITT,

considering

	(a)	that there is significant customer demand for videophone, video-
conference and other audiovisual services;

	(b)	that circuits to meet this demand can be provided by digital transmis-
sion using the B, H0 rates or their multiples up to the primary rate or H11/
H12 rates;

	(c)	that ISDNs are likely to be available in some countries that provide a 
switched transmission service at the B, H0 or H11/H12 rate;

	(d)	that the existence of different digital hierarchies and different televi-
sion standards in different parts of the world complicates the problems of 
specifying coding and transmission standards for international connections;

	(e)	that a number of audiovisual services are likely to appear using basic 
and primary rate ISDN accesses and that some means of intercommunica-
tion among these terminals should be possible;

	(f)	that the video codec provides an essential element of the infrastructure 
for audiovisual services which allows such intercommunication in the 
framework of RecommendationH.200;

	(g)	that Recommendation H.120 for videoconferencing using primary 
digital group transmission was the first in an evolving series of Recommen-
dations,

appreciating

	that advances have been made in research and development of video 
coding and bit rate reduction techniques which lead to the use of lower bit 
rates down to 64kbit/s so that this may be considered as the second in the 
evolving series of Recommendations,

and noting

	that it is the basic objective of the CCITT to recommend unique solu-
tions for international connections,

recommends

	that in addition to those codecs complying to 
RecommendationH.120, codecs having signal processing and transmission 
coding characteristics described below should be used for international 
audiovisual services.

	Note 1 – Codecs of this type are also suitable for some television services 
where full broadcast quality is not required.

	Note 2 – Equipment for transcoding from and to codecs according to 
RecommendationH.120 is under study.

1	Scope

	This Recommendation describes the video coding and decoding 
methods for the moving picture component of audiovisual services at the 
rates of p´64kbit/s, where p is in the range 1 to 30.

2	Brief specification

	An outline block diagram of the codec is given in Figure1/H.261.

FIGURE 1/H.261



2.1	Video input and output

	To permit a single Recommendation to cover use in and between 
regions using 625- and 525-line television standards, the source coder oper-
ates on pictures based on a common intermediate format (CIF). The stan-
dards of the input and output television signals, which may, for example, be 
composite or component, analogue or digital and the methods of performing 
any necessary conversion to and from the source coding format are not sub-
ject to Recommendation.

2.2	Digital output and input

	The video coder provides a self-contained digital bit stream which 
may be combined with other multi-facility signals (for example as defined 
in RecommendationH.221). The video decoder performs the reverse pro-
cess.

2.3	Sampling frequency

	Pictures are sampled at an integer multiple of the video line rate.  This 
sampling clock and the digital network clock are asynchronous.

2.4	Source coding algorithm

	A hybrid of inter-picture prediction to utilize temporal redundancy 
and transform coding of the remaining signal to reduce spatial redundancy 
is adopted. The decoder has motion compensation capability, allowing 
optional incorporation of this technique in the coder.

2.5	Bit rate

	This Recommendation is primarily intended for use at video bit rates 
between approximately 40kbit/s and 2Mbit/s.

2.6	Symmetry of transmission

	The codec may be used for bidirectional or unidirectional visual com-
munication.

2.7	Error handling

	The transmitted bit-stream contains a BCH1) (511,493) forward error cor-
rection code. Use of this by the decoder is optional.

2.8	Multipoint operation

	Features necessary to support switched multipoint operation are 
included.

3	Source coder

3.1	Source format

	The source coder operates on non-interlaced pictures occurring 
30000/1001 (approximately 29.97) times per second. The tolerance on pic-
ture frequency is ±50 ppm.

	Pictures are coded as luminance and two colour difference compo-
nents (Y,CB and CR). These components and the codes representing their 
sampled values are as defined in CCIR Recommendation601.

	Black = 16

	White = 235

	Zero colour difference = 128

	Peak colour difference = 16 and 240.

	These values are nominal ones and the coding algorithm functions 
with input values of 1 through to 254.

	Two picture scanning formats are specified.

	In the first format (CIF), the luminance sampling structure is 352 pels 
per line, 288 lines per picture in an orthogonal arrangement. Sampling of 
each of the two colour difference components is at 176pels per line, 
144lines per picture, orthogonal. Colour difference samples are sited such 
that their block boundaries coincide with luminance block boundaries as 
shown in Figure2/H.261. The picture area covered by these numbers of pels 
and lines has an aspect ratio of 4:3 and corresponds to the active portion of 
the local standard video input.

 	Note – The number of pels per line is compatible with sampling the 
active portions of the luminance and colour difference signals from 525- or 
625-line sources at 6.75 and 3.375MHz respectively. These frequencies 
have a simple relationship to those in CCIR Recommendation601.

FIGURE 2/H.261



	The second format, quarter-CIF (QCIF), has half the number of pels and 
half the number of lines stated above. All codecs must be able to operate 
using QCIF. Some codecs can also operate with CIF.

	Means shall be provided to restrict the maximum picture rate of encoders by 
having at least 0, 1, 2 or 3 non-transmitted pictures between transmitted 
ones. Selection of this minimum number and CIF or QCIF shall be by exter-
nal means (for example via RecommendationH.221).

3.2	Video source coding algorithm

	The source coder is shown in generalized form in Figure 3/H.261. The 
main elements are prediction, block transformation and quantization.

	The prediction error (INTER mode) or the input picture (INTRA 
mode) is subdivided into 8 pel by 8 line blocks which are segmented as 
transmitted or non-transmitted. Further, four luminance blocks and the two 
spatially corresponding colour difference blocks are combined to form a 
macroblock as shown in Figure10/H.261.

	The criteria for choice of mode and transmitting a block are not sub-
ject to recommendation and may be varied dynamically as part of the coding 
control strategy. Transmitted blocks are transformed and resulting coeffi-
cients are quantized and variable length coded.

3.2.1	Prediction

	The prediction is inter-picture and may be augmented by motion com-
pensation (see §3.2.2) and a spatial filter (see §3.2.3).

FIGURE 3/H.261



3.2.2	Motion compensation

	Motion compensation (MC) is optional in the encoder. The decoder 
will accept one vector per macroblock. Both horizontal and vertical compo-
nents of these motion vectors have integer values not exceeding ±15. The 
vector is used for all four luminance blocks in the macroblock. The motion 
vector for both colour difference blocks is derived by halving the component 
values of the macroblock vector and truncating the magnitude parts towards 
zero to yield integer components.

	A positive value of the horizontal or vertical component of the motion 
vector signifies that the prediction is formed from pels in the previous pic-
ture which are spatially to the right or below the pels being predicted.

 	Motion vectors are restricted such that all pels referenced by them are 
within the coded picture area.

3.2.3	Loop filter

	The prediction process may be modified by a two-dimensional spatial 
filter (FIL) which operates on pels within a predicted 8 by 8 block.

	The filter is separable into one-dimensional horizontal and vertical 
functions. Both are non-recursive with coefficients of 1/4, 1/2, 1/4 except at 
block edges where one of the taps would fall outside the block. In such cases 
the 1-Dfilter is changed to have coefficients of 0, 1, 0. Full arithmetic preci-
sion is retained with rounding to 8 bit integer values at the 2-D filter output. 
Values whose fractional part is one half are rounded up.

	The filter is switched on/off for all six blocks in a macroblock accord-
ing to the macroblock type (see §4.2.3 MTYPE).

3.2.4	Transformer

	Transmitted blocks are first processed by a separable two-dimensional 
discrete cosine transform of size 8 by 8. The output from the inverse trans-
form ranges from -256 to +255 after clipping to be represented with 9bits. 
The transfer function of the inverse transform is given by:

withu, v, x, y=0, 1, 2, ...,7

where	x,y = spatial coordinates in the pel domain,

	u,v 	= coordinates in the transform domain,

	C(u) =  for u = 0, otherwise 1,

	C(v) =  for v = 0, otherwise 1.

	Note – Within the block being transformed, x=0 and y=0 refer to the pel 
nearest the left and top edges of the picture respectively.

	The arithmetic procedures for computing the transforms are not defined, but 
the inverse one should meet the error tolerance specified in AnnexA.

3.2.5	Quantization

	The number of quantizers is 1 for the INTRA dc coefficient and 31 for 
all other coefficients. Within a macroblock the same quantizer is used for all 
coefficients except the INTRA dc one. The decision levels are not defined. 
The INTRA dc coefficient is nominally the transform value linearly quan-
tized with a stepsize of 8 and no dead-zone. Each of the other 31 quantizers 
is also nominally linear but with a central dead-zone around zero and with a 
step size of an even value in the range 2 to 62.

	The reconstruction levels are as defined in §4.2.4.

	Note – For the smaller quantization step sizes, the full dynamic range 
of the transform coefficients cannot be represented.

3.2.6	Clipping of reconstructed picture

	To prevent quantization distortion of transform coefficient amplitudes 
causing arithmetic overflow in the encoder and decoder loops, clipping 
functions are inserted. The clipping function is applied to the reconstructed 
picture which is formed by summing the prediction and the prediction error 
as modified by the coding process. This clipper operates on resulting pel 
values less than 0 or greater than 255, changing them to 0 and 255 respec-
tively.

3.3	Coding control

	Several parameters may be varied to control the rate of generation of 
coded video data. These include processing prior to the source coder, the 
quantizer, block significance criterion and temporal subsampling. The pro-
portions of such measures in the overall control strategy are not subject to 
recommendation.

	When invoked, temporal subsampling is performed by discarding 
complete pictures.

3.4	Forced updating

	This function is achieved by forcing the use of the INTRA mode of 
the coding algorithm. The update pattern is not defined. For control of accu-
mulation of inverse transform mismatch error a macroblock should be forc-
ibly updated at least once per every 132 times it is transmitted.

4	Video multiplex coder

4.1	Data structure

	Unless specified otherwise the most significant bit is transmitted first. 
This is bit 1 and is the leftmost bit in the code tables in this Recommenda-
tion. Unless specified otherwise all unused or spare bits are set to “1”. Spare 
bits must not be used until their functions are specified by the CCITT.

4.2	Video multiplex arrangement

	The video multiplex is arranged in a hierarchical structure with four 
layers. From top to bottom the layers are:

–	Picture.

–	Group of blocks (GOB).

–	Macroblock (MB).

–	Block.

	A syntax diagram of the video multiplex coder is shown in Figure4/
H.261. Abbreviations are defined in later sections.

FIGURE 4/H.261



4.2.1	Picture layer

	Data for each picture consists of a picture header followed by data for 
GOBs. The structure is shown in Figure5/H.261. Picture headers for 
dropped pictures are not transmitted.



4.2.1.1	Picture Start Code (PCS) (20 bits)

	A word of 20 bits. Its value is 0000 0000 0000 0001 0000.

4.2.1.2	Temporal Reference (TR) (5 bits)

	A 5-bit number which can have 32 possible values. It is formed by 
incrementing its value in the previously transmitted picture header by one 
plus the number of non-transmitted pictures (at 29.97Hz) since that last 
transmitted one. The arithmetic is performed with only the five LSBs.

4.2.1.3	Type Information (PTYPE) (6 bits)

	Information about the complete picture:

Bit 1		Split screen indicator, “0” off, “1” on.

Bit 2		Document camera indicator. “0” off, “1” on.

Bit 3		Freeze Picture Release. “0” off, “1” on.

Bit 4		Source Format. “0” QCIF, “1” CIF.

Bits 5 to 6	Spare.

4.2.1.4	Extra Insertion Information (PEI) (1 bit)

	A bit which when set to “1” signals the presence of the following 
optional data field.

4.2.1.5	Spare Information (PSPARE) (0/8/16 ... bits)

	If PEI is set to “1”, then 9 bits follow consisting of 8 bits of data 
(PSPARE) and then another PEI bit to indicate if a further 9bits follow and 
so on. Encoders must not insert PSPARE until specified by the CCITT. 
Decoders must be designed to discard PSPARE if PEI is set to1. This will 
allow the CCITT to specify future backward compatible additions in 
PSPARE.

4.2.2	Group of blocks layer

	Each picture is divided into groups of blocks (GOBs). A group of 
blocks (GOB) comprises one twelfth of the CIF or one third of the QCIF 
picture areas (see Figure6/H.261). A GOB relates to 176 pels by 48 lines of 
Y and the spatially corresponding 88pels by 24lines of each of CB and CR.



	Data for each group of blocks consists of a GOB header followed by 
data for macroblocks. The structure is shown in Figure7/H.261. Each GOB 
header is transmitted once between picture start codes in the CIF or QCIF 
sequence numbered in Figure6/H.261, even if no macroblock data is 
present in that GOB.

4.2.2.1	Group of blocks start code (GBSC) (16 bits)

	A word of 16 bits, 0000 0000 0000 0001.



4.2.2.2	Group number (GN) (4 bits)

	Four bits indicating the position of the group of blocks. The bits are 
the binary representation of the number in Figure6/H.261. Group 
numbers13, 14 and 15 are reserved for future use. Group number 0 is used 
in the PSC.

4.2.2.3	Quantizer Information (GQUANT) (5 bits)

	A fixed length codeword of 5 bits which indicates the quantizer to be 
used in the group of blocks until overridden by any subsequent MQUANT. 
The codewords are the natural binary representations of the values of 
QUANT (§4.2.4) which, being half the step sizes, range from 1 to 31.

4.2.2.4	Extra insertion information (GEI) (1 bit)

	A bit which when set to “1” signals the presence of the following 
optional data field.

4.2.2.5	Spare information (GSPARE) (0/8/16 ... bits)

	If GEI is set to “1”, then 9 bits follow consisting of 8 bits of data 
(GSPARE) and then another GEI bit to indicate if a further 9 bits follow and 
so on. Encoders must not insert GSPARE until specified by the CCITT. 
Decoders must be designed to discard GSPARE if GEI is set to 1. This will 
allow the CCITT to specify future “backward” compatible additions in 
GSPARE.

	Note – Emulation of start codes may occur if the future specification 
of GSPARE has no restrictions on the final GSPARE data bits.

4.2.3	Macroblock layer

	Each GOB is divided into 33 macroblocks as shown in Figure 8/
H.261. A macroblock relates to 16 pels by 16 lines of Y and the spatially 
corresponding 8pels by 8lines of each of CB and CR.



	Data for a macroblock consists of a MB header followed by data for 
blocks (see Figure 9/H.261). MQUANT, MVD and CBP are present when 
indicated by MTYPE.



4.2.3.1	Macroblock address (MBA) (Variable length)

	A variable length codeword indicating the position of a macroblock 
within a group of blocks. The transmission order is as shown in Figure 8/
H.261. For the first transmitted macroblock in a GOB, MBA is the absolute 
address in Figure8/H.261. For subsequent macroblocks, MBA is the differ-
ence between the absolute addresses of the macroblock and the last trans-
mitted macroblock. The code table for MBA is given in Table1/H.261.

	An extra codeword is available in the table for bit stuffing immedi-
ately after a GOB header or a coded macroblock (MBA stuffing). This code-
word should be discarded by decoders.

	The VLC for start code is also shown in Table 1/H.261.



	MBA is always included in transmitted macroblocks.

	Macroblocks are not transmitted when they contain no information 
for that part of the picture.

4.2.3.2	Type information (MTYPE) (Variable length)

	Variable length codewords giving information about the macroblock 
and which data elements are present. Macroblock types, included elements 
and VLC words are listed in Table2/H.261.

	MTYPE is always included in transmitted macroblocks.



4.2.3.3	Quantizer (MQUANT) (5 bits)

	MQUANT is present only if so indicated by MTYPE.

	A codeword of 5 bits signifying the quantizer to be used for this and 
any following blocks in the group of blocks until overridden by any subse-
quent MQUANT.

	Codewords for MQUANT are the same as for GQUANT.

4.2.3.4	Motion vector data (MVD) (Variable length)

	Motion vector data is included for all MC macroblocks. MVD is 
obtained from the macroblock vector by subtracting the vector of the pre-
ceding macroblock. For this calculation the vector of the preceding macrob-
lock is regarded as zero in the following three situations:

1)	evaluating MVD for macroblocks 1, 12 and 23;

2)	evaluating MVD for macroblocks in which MBA does not represent 
a difference of 1;

3)	MTYPE of the previous macroblock was not MC.

	MVD consists of a variable length codeword for the horizontal com-
ponent followed by a variable length codeword for the vertical component. 
Variable length codes are given in Table 3/H.261.

	Advantage is taken of the fact that the range of motion vector values is 
constrained. Each VLC word represents a pair of difference values. Only 
one of the pair will yield a macroblock vector falling within the permitted 
range.

4.2.3.5	 Coded block pattern (CBP) (Variable length)

	CBP is present if indicated by MTYPE. The codeword gives a pattern 
number signifying those blocks in the macroblock for which at least one 
transform coefficient is transmitted. The pattern number is given by:

	32·P1+16·P2+8·P3+4·P4+2·P5+P6

where Pn = 1 if any efficient is present for block n, else 0. Block numbering 
is given in Figure 10/H.261.

	The codewords for CBP are given in Table 4/H.261.





4.2.4	Block layer

	A macroblock comprises four luminance blocks and one of each of 
the two colour difference blocks (see Figure10/H.261).

	Data for a block consists of codewords for transform coefficients fol-
lowed by an end of block marker (see Figure11/H.261). The order of block 
transmission is as in Figure10/H.261.





4.2.4.1	Transform coefficients (TCOEFF)

	Transform coefficient data is always present for all six blocks in a 
macroblock when MTYPE indicates INTRA. In other cases MTYPE and 
CBP signal which blocks have coefficient data transmitted for them. The 
quantized transform coefficients are sequentially transmitted according to 
the sequence given in Figure12/H.261.



 

	The most commonly occurring combinations of successive zeros 
(RUN) and the following value (LEVEL) are encoded with variable length 
codes. Other combinations of (RUN, LEVEL) are encoded with a 20-bit 
word consisting of 6bits ESCAPE, 6bits RUN and 8bits LEVEL. For the 
variable length encoding there are two code tables, one being used for the 
first transmitted LEVEL in INTER, INTER+MC and INTER+MC+FIL 
blocks, the second for all other LEVELs except the first one in INTRA 
blocks which is fixed length coded with 8 bits.

	Codes are given in Table 5/H.261.

	The most commonly occurring combinations of zero-run and the fol-
lowing value are encoded with variable length codes as listed in the table 
below. End of block (EOB) is in this set. Because CBP indicates those 
blocks with no coefficient data, EOB cannot occur as the first coefficient. 
Hence EOB can be removed from the VLC table for the first coefficient.

	The last bit “s” denotes the sign of the level, “0” for positive and “1” 
for negative.



	REC = 0; level = 0

	Note–QUANT ranges from 1 to 31 and is transmitted by either 
GQUANT or MQUANT.





Reconstruction levels (REC)





Leve
l


1


2


3


4


×

QU
AN
T
8


9


×


17


18


×


30


31

-127

-
25
5

-
50
9

-
76
5

-
101
9

×

-
2039

-
20
48

×

-
20
48

-
20
48

×

-
204
8

-
204
8

-126

-
25
3

-
50
5

-
75
9

-
101
1

×

-
2023

-
20
48

×

-
20
48

-
20
48

×

-
204
8

-
204
8

×

×

×

×

×

×

×

×

×

×

×

×

×

×

00-2

00-
5

00-
9

0-
15

00-
19

×

00-
39

00-
45

×

00-
85

00-
89

×

0-
149

00-
155

00-1

00-
3

00-
5

00-
9

00-
11

×

00-
23

00-
27

×

00-
51

00-
53

×

00-
89

00-
93

0000

00
00

00
00

00
00

000
00

×

0000
0

00
00
0

×

00
00
0

00
00
0

×

000
00

000
00

0001

00
03

00
05

00
09

000
11

×

0002
3

00
02
7

×

00
05
1

00
05
3

×

000
89

000
93

0002

00
05

00
09

00
15

000
19

×

0003
9

00
04
5

×

00
08
5

00
08
9

×

001
49

001
55

0003

00
07

00
13

00
21

000
27

×

0005
5

00
06
3

×

00
11
9

00
12
5

×

002
09

002
17

0004

00
09

00
17

00
27

000
35

×

0007
1

00
08
1

×

00
15
3

00
16
1

×

002
69

002
79

0005

00
11

00
21

00
33

000
43

×

0008
7

00
09
9

×

00
18
7

00
19
7

×

003
29

003
41

×

×

×

×

×

×

×

×

×

×

×

×

×

×

0056

01
13

02
25

03
39

004
51

×

0090
3

01
01
7

×

01
92
1

02
03
3

×

020
47

020
47

0057

01
15

02
29

03
45

004
59

×

0091
9

01
03
5

×

01
95
5

02
04
7

×

020
47

020
47

0058

01
17

02
33

03
51

004
67

×

0093
5

01
05
3

×

01
98
9

02
04
7

×

020
47

020
47

0059

01
19

02
37

03
57

004
75

×

0095
1

01
07
1

×

02
02
3

02
04
7

×

020
47

020
47

0060

01
21

02
41

03
63

004
83

×

0096
7

01
08
9

×

02
04
7

02
04
7

×

020
47

020
47

×

×

×

×

×

×

×

×

×

×

×

×

×

×

0125

02
51

05
01

07
53

010
03

×

0200
7

02
04
7

×

02
04
7

02
04
7

×

020
47

020
47

126

02
53

05
05

07
59

010
11

×

0202
3

02
04
7

×

02
04
7

02
04
7

×

020
47

020
47

127

02
55

05
09

07
65

010
19

×

0203
9

02
04
7

×

02
04
7

02
04
7

×

020
47

020
47

Note–Reconstruction levels are symmetrical with respect to the sign of 
level except for 2047/-2048.





	For INTRA blocks the first coefficient is nominally the transform dc  value 
linearly quantized with a step size of 8 and no dead-zone. The resulting val-
ues are represented with 8 bits. A nominally black block will give 0001 
0000 and a nominally white one 1110 1011. The code 0000 0000 is not 
used. The code 1000 0000 is not used, the reconstruction level of 1024 being 
coded as 1111 1111 (see Table6/H.261).

	Coefficients after the last non-zero one are not transmitted. EOB (end of 
block code) is always the last item in blocks for which coefficients are trans-
mitted.

4.3	Multipoint considerations

	The following facilities are provided to support switched multipoint 
operation.

4.3.1	Freeze picture request

	Causes the decoder to freeze its displayed picture until a freeze pic-
ture release signal is received or a timeout period of at least six seconds has 
expired. The transmission of this signal is via external means (for example 
by RecommendationH.221).





4.3.2	Fast update request

	Causes the encoder to encode its next picture in INTRA mode with 
coding parameters such as to avoid buffer overflow. The transmission 
method for this signal is via external means (for example by 
RecommendationH.221).

4.3.3	Freeze picture release

	A signal from an encoder which has responded to a fast update 
request and allows a decoder to exit from its freeze picture mode and display 
decoded pictures in the normal manner. This signal is transmitted by bit 3 of 
PTYPE (see§4.2.1) in the picture header of the first picture coded in 
response to the fast update request.

5	Transmission coder

5.1	Bit rate

	The transmission clock is provided externally (for example from an 
I.420 interface).

5.2	Video data buffering

	The encoder must control its output bitstream to comply with the 
requirements of the hypothetical reference decoder defined in AnnexB.

	When operating with CIF the number of bits created by coding any 
single picture must not exceed 256·Kbits. K=1024.

	When operating with QCIF the number of bits created by coding any 
single picture must not exceed 64·Kbits.

	In both the above cases the bit count includes the picture start code 
and all other data related to that picture including PSPARE, GSPARE and 
MBA stuffing. The bit count does not include error correction framing bits, 
fill indicator (Fi), fill bits or error correction parity information described in 
§5.4 below.

	Video data must be provided on every valid clock cycle. This can be 
ensured by the use of either the fill bit indicator (Fi) and subsequent fill all 
1's bits in the error corrector block framing (see Figure13/H.261) or MBA 
stuffing (§4.2.3) or both.

FIGURE 13/H.261



5.3	Video coding delay

	This item is included in this Recommendation because the video 
encoder and video decoder delays need to be known to allow audio compen-
sation delays to be fixed when H.261 is used to form part of a conversational 
service. This will allow lip synchronization to be maintained. AnnexC rec-
ommends a method by which the delay figures are established. Other delay 
measurement methods may be used but they must be designed in a way to 
produce similar results to the method given in AnnexC.

5.4	Forward error correction for coded video signal

5.4.1	Error correcting code

	The transmitted bitstream contains a BCH (511,493) forward error 
correction code. Use of this by the decoder is optional.

5.4.2	Generator polynomial

	g(x)=(x9+x4+1)(x9+x6+x4+x3+1)

	Example: for the input data of “01111...11” (493 bits) the resulting 
correction parity bits are “011011010100011011” (18bits).

5.4.3	Error correction framing

	To allow the video data and error correction parity information to be 
identified by a decoder an error correction framing pattern is included. This 
consists of a multiframe of eight frames, each frame comprising 1 bit fram-
ing, 1bit fill indicator (Fi), 492bits of coded data (or fill all 1s) and 18bits 
parity.  The frame alignment pattern is:

	(S1S2S3S4S5S6S7S8) = (00011011).

	See Figure 13/H.261 for the frame arrangement. The parity is calcu-
lated against the 493-bits including fill indicator (Fi).

	The fill indicator (Fi) can be set to zero by an encoder. In this case 
only 492 consecutive fill bits (fill all 1s) plus parity are sent and no coded 
data is transmitted. This may be used to meet the requirement in §5.2 to 
provide video data on every valid clock cycle.

5.4.4	Relock time for error corrector framing

	Three consecutive error correction framing sequences (24 bits) should 
be received before frame lock is deemed to have been achieved. The decoder 
should be designed such that frame lock will be re-established within 
34000bits after an error corrector framing phase change.

	Note – This assumes that the video data does not contain three cor-
rectly phased emulations of the error correction framing sequence during 
the relocking period.

ANNEX A

(to Recommendation H.261)

Inverse transform accuracy specification

A.1	Generate random integer pel data values in the range -L to +H according 
to the random number generator given below (“C” version). Arrange into 
8by8blocks. Data set of 10000 blocks should each be generated for 
(L=256, H=255), (L=H=5) and (L=H=300).

A.2	For each 8 by 8 block, perform a separable, orthonormal, matrix multi-
ply, forward discrete cosine transform using at least 64-bit floating point 
accuracy.

withu, v, x, y=0, 1, 2,...,7

where	x,y 	= spatial coordinates in the pel domain,

	u,v 	= coordinates in the transform domain,

	C(u) =  for u = 0, otherwise 1,

	C(v) =  for v = 0, otherwise 1.

A.3	For each block, round the 64 resulting transformed coefficients to the 
nearest integer values. Then clip them to the range -2048 to +2047. This is 
the 12-bit input data to the inverse transform.

A.4	For each 8 by 8 block of 12-bit data produced by §A.3, perform a sepa-
rable, orthonormal, matrix multiply, inverse discrete transform (IDCT) 
using at least 64-bit floating point accuracy. Round the resulting pels to the 
nearest integer and clip to the range -256 to +255. These blocks of 8´8 pels 
are the reference IDCT input data.

A.5	For each 8 by 8 block produced by §A.3, apply the IDCT under test and 
clip the output to the range -256 to +255. These blocks of 8´8 pels are the 
test IDCT output data.

A.6	For each of the 64 IDCT output pels, and for each of the 10,000 block 
data sets generated above, measure the peak, mean and mean square error 
between the reference and the test data.

A.7	For any pel, the peak error should not exceed 1 in magnitude.

	For any pel, the mean square error should not exceed 0.06.

	Overall, the mean square error should not exceed 0.02.

	For any pel, the mean error should not exceed 0.015 in magnitude.

	Overall, the mean error should not exceed 0.0015 in magnitude.

A.8	All zeros in must produce all zeros out.

A.9	Re-run the measurements using exactly the same data values of step 1, 
but change the sign on each pel.

	"C" program for random number generation

	/* L and H must be long, that is 32 bits */
	longrand		(L,H)
	long			L,H;
	{

		staticlongrandx=1;	/*longis32bits*/
		staticdoublez= (double) 0x7fffffff;

		longi,j;
		doublex;				/* double is64bits*/

		randx=(randx*1103515245)+12345;
		i=randx&0x7ffffffe;	/*keep30bits*/
		x=( (double)i )/z;		/*range0to0.99999...*/
		x*=(L+H+1);		/*range0to<L+H+1*/
		j=x;				/*truncatetointeger*/
		return( j-L);		/*range-LtoH*/
	}

ANNEX B

(to Recommendation H.261)

Hypothetical reference decoder

	The Hypothetical reference decoder (HRD) is defined as follows:

B.1	The HRD and the encoder have the same clock frequency as well as the 
same CIF rate, and are operated synchronously.

B.2	The HRD receiving buffer size is (B+256 · K bits). The value of B is 
defined as follows:

	B=4Rmax/29.97 where Rmax is the maximum video bit rate to be used in 
the connection.

B.3	The HRD buffer is initially empty.

B.4	The HRD buffer is examined at CIF intervals . If at least one complete 
coded picture is in the buffer then all the data for the earliest picture is 
instantaneously removed (e.g. at tn+1 in FigureB-1/H.261). Immediately 
after removing the above data the buffer occupancy must be less thanB. 
This is a requirement on the coder output bitstream including coded picture 
data and MBA stuffing but not error correction framing bits, fill indicator 
(Fi), fill bits or error correction parity information described in§5.4.

FIGURE B-1/H.261



	To meet this requirement the number of bits for the (n+1)th coded picture 
dn+1 must satisfy:

where:

	bn is buffer occupancy just after the time tn,

	tn is the time the nth coded picture is removed from the HRD buffer,

	R(t) is the video bit rate at the time t.

 

ANNEX C

(to Recommendation H.261)

Codec delay measurement method



	The video encoder and video decoder delays will vary depending on imple-
mentation. The delay will also depend on the picture format (QCIF, CIF) 
and data rate in use. This annex specifies the method by which the delay fig-
ures are established for a particular design. To allow correct audio delay 
compensation the overall video delay needs to be established from a user 
perception point of view under typical viewing conditions.

FIGURE C.1/H.261



	Point A is the video input to the video coder. Point B is the channel output 
from the video terminal (i.e.including any FEC, channel framing, etc.). 
PointC is the video output from the decoder.

	A video sequence lasting more than 100 seconds is connected to the video 
coder input (point A) in FigureC-1/H.261 above. The video sequence 
should have the following characteristics:

–	it should contain a typical moving scene consistent with the intended 
purpose of the video codec;

–	it should produce a minimum coded picture rate of 7.5 Hz at the bit 
rate in use;

–	it should contain a visible identification mark at intervals throughout 
the length of the sequence. The visible identification should 
change every 97 video input frames and be located within the pic-
ture area represented by the first GOB in the picture. For example, 
the first block in the picture could change from black to white at 
intervals of 97 video frame periods. The identification mark should 
be chosen so that it can be detected at pointB and does not signifi-
cantly contribute to the overall coding performance.

 	The codec and video sequence should be arranged so that the bit-
stream contains less than 10% stuffing (MBA stuffing+error correction fill 
bits).

	The encoder delay is obtained by measuring the time from when the 
visible identification changes at point A to the time that the change is 
detected at point B. Similarly, the decoder delay is obtained by taking mea-
surements at pointsB andC.

	Several measurements should be made during the sequence length 
and the average period obtained. Several tests should be made to ensure that 
a consistent average figure can be obtained for both encoder and decoder 
delay times.

	Average results should be obtained for each combination of picture 
format and bit rate within the capability of the particular codec design.

	Note – Due to pre- and post-temporal processing it may be necessary 
to take a mid-level for establishing the transition of the identification mark 
at pointsBandC.