5i'

       2 Codecs not requiring  separate  television  standards  conversion
       when used on interregional connections


               A codec for 525-line, 60 fields/s and 1544 kbit/s transmis-
       sion  for  intra-regional  use and capable of interworking with the
       codec of S 1


       2.1         Introduction


            Section 2 indicates the changes and additions  which  must  be
       made to the text of S 1 in order to define the version of the codec
       for  use  with  525-line,  60 fields/s  television  standards   and
       transmission  at  1544   kbit/s.  The  two  versions are capable of
       interworking via a  re-multiplexing  unit  which  can  convert  the
       Recommendation G.704,  S 2.1 compatible frame structure on one side
       to the Recommendation G.704  ,  S 2.3  compatible  frame  structure
       (with 6 time slots empty) on the other side.

            The two versions of the codec are identical in most  respects,
       the important differences (apart from the obvious ones arising from
       different input and output signals) being confined to  the  digital
       pre-  and  post-filters  and  the  signals  for  the control of the
       buffers.  Moreover,  the  detailed  algorithms  of  the  pre-   and
       post-filters  do  not  need to be specified to permit interworking.
       Only an outline of their mode of operation together  with  the  few
       necessary specifications are therefore provided.


       2.2         Brief specification



       2.2.1         Video input/output


            The video input and output are standard 525-line,  60 fields/s
       colour  or monochrome television signals. The colour signals are in
       component form. Colour and monochrome operation are fully  compati-
       ble.


       2.2.2         Digital output/input


            The digital output and input are  at  1544 kbit/s,  compatible
       with the frame structure of Recommendation G.704.


       2.2.3         Sampling frequency


            The video sampling frequency and 1544 kbit/s network clock are









       asynchronous.


       2.2.4         Coding techniques


            Conditional  replenishment  coding  supplemented  by  adaptive
       digital  filtering, differential PCM and variable-length coding are
       used to achieve low bit-rate transmission.


       2.2.5         Audio channel


            An audio channel using 64 kbit/s is included. At present, cod-
       ing  is  A-law  according to Recommendation G.711, but provision is
       made for future use of more efficient coding.


       2.2.6         Mode of operation


            The normal mode of operation is full duplex.


       2.2.7         Codec-to-network signalling


            An  optional  channel  for  codec-to-network   signalling   is
       included.


       2.2.8         Data channels


            Optional 2 x 64 kbit/s and 1 x 32  kbit/s  data  channels  are
       available. These are used for video if not required for data.



       2.2.9         Forward error correction


            Optional  forward  error  correction  is  available.  This  is
       required  only  if the long-term error rate of the channel is worse
       than 1 in  106.


       2.2.10         Additional facilities


            Provision is made in  the  digital  frame  structure  for  the
       future  introduction  of  encryption, a graphic mode and multipoint
       facilities.

            2.2.11 When the coder buffer is empty and the  decoder  buffer
       full,  the  coder  delay  is  31 _ 5 ms  and  the  decoder delay is









       176 _ 31 ms



       2.3         Video interface


            The normal video input is a 525-line,  60 fields/s  signal  in
       accordance  with CCIR Report 624. When colour is being transmitted,
       the input (and output) video signals are  in  component  form.  The
       luminance  and  colour-difference  components, E`Y, (E`R - E`Y) and
       (E`B - E`Y) are as defined in CCIR Report 624. The video  interface
       is as recommended in CCIR Recommendation 567.


       2.4         Source coder



       2.4.1         Luminance component or monochrome



       2.4.1.1         Analogue-to-digital conversion


            The signal is sampled  to  produce  256  picture  samples  per
       active  line  (320 samples per complete line). The sampling pattern
       is orthogonal and line,  field  and  picture  repetitive.  For  the
       525-line  input,  the  sampling frequency is 5.0 MHz, locked to the
       video waveform.

            Uniformly quantized PCM with 8 bits/sample is used.

            Black level corresponds to level 16 (00010000).

            White level corresponds to level 239 (11101111).

            PCM code words outside this range  are  forbidden  (the  codes
       being  used for other purposes). For the purposes of prediction and
       interpolation, the  final  picture  element  in  each  active  line
       (i.e. picture  element 255) is set to level 128 in both encoder and
       decoder.

            In all arithmetic operations, 8-bit arithmetic is used and the
       bits  below  the  binary point are truncated at each stage of divi-
       sion.


       2.4.1.2         Pre- and post-filtering



       _________________________
       These are typical figures. The delays depend  upon  the
       detailed implementation used.










       2.4.1.2.1         Spatial filtering


            A digital filter reduces the 2421/2 active lines-per-field  of
       the  525-line  signal to 143 lines-per-field, the same number as in
       the 625-line version of the codec.  In  the  decoder,  the  digital
       post-filter  uses  interpolation to restore the signal to 525-lines
       per picture.


       2.4.1.2.2         Temporal filtering


            A  recursive  temporal  pre-filter  with  non-linear  transfer
       characteristics  is used in the coder to reduce noise in the signal
       and increase coding efficiency. The frame store used in this filter
       can  also  be  used  as the storage element of a frame interpolator
       with variable coefficients which is used to reduce the  transmitted
       frame  rate to a value less than that of the input video signal. In
       525-line to 525-line transmission, the transmitted frame  frequency
       is  locked  to  the  video  clock  and  is  approximately  29.67 Hz
       (29.97 Hz times 3057/3088) instead of the  nominal  video  rate  of
       29.97 Hz.  In  525-line  to  625-line transmission, the transmitted
       frame frequency is nominally 25 Hz and is  locked  to  the  channel
       clock.


            Because the (television) frames are  leaving  the  coder  more
       slowly  than they are entering, the coding process is suspended for
       one frame every N  th  input  frame.  N  is  approximately  100 for
       525-line  to 525-line operation and approximately 6 for 525-line to
       625-line operation.

            In the decoder, the digital post-filter incorporates  a  frame
       store  in  some  versions of the 625-line codec where it is used in
       the line interpolation process. In the 525-line version,  in  addi-
       tion  to  its  use for line interpolation, it is used as a temporal
       interpolator with variable coefficients to provide an extra  output
       frame  during  those  periods  when  the  decoding  is  temporarily
       suspended.


       2.5         Video multiplex coding



       2.5.1         Buffer store


            The size of the buffer store is defined  at  the  transmitting
       end  only and is 160 kbits. Of this, 96 kbits is used for smoothing
       the video data in the face-to-face mode and the remainder  is  used
       to  accomodate  the action of the frame interpolator (see S 2.5.1.1
       below) and the requirements of the graphics mode.

            At the receiving end, the buffer must be at least this  length
       but in some implementations of the decoder, it may be longer.









       2.5.1.1         Buffer control


            The amount to which the transmitting buffer is filled is  used
       to  control  various  coding  algorithms (subsampling, etc.) and is
       signalled to the decoder to enable it correctly  to  interpret  the
       received  signals.  In the 525-line codec, the transmission rate is
       less than the video input rate and hence the buffer tends  to  fill
       more  rapidly  than would be determined by the movement in the pic-
       ture, only to empty again when the interpolator suspends the coding
       process.

            To  avoid  incorrect  changes  in   coding   algorithms,   the
       buffer-state   signal  is  modified to take account of the progres-
       sively changing coefficients of the interpolator in the pre-filter.
       The  buffer then operates as though the data is coming from a video
       source whose frame rate is uniform and the same as the  transmitted
       frame rate.


       2.6         Transmission coding


            The transmission coder assembles the video, audio,  signalling
       and optional data channels into a 1544 kbit/s frame structure which
       is compatible with Recommendation G.704.


       2.6.1         Serial data


            See S 1.6.1.


       2.6.2         Audio


            See S 1.6.2.


       2.6.3         Transmission framing


            The frame structure, compatible with Recommendation G.704  and
       also  compatible with that of the 625-line version in S 1, is given
       in S 2 of Recommendation H.130.


       2.6.3.1         General


            See S 1.6.3.1.


       2.6.3.2         Use of certain bits in each octet in the odd frames
       of time slot 2










            The use of certain of the bits in time slot  2  (odd)  differs
       slightly  from that given for the codec in S 1. The differences are
       as follows:

               Bit 1 - For clock justification

               This bit is disregarded in 525-line decoders.


               To permit interworking with the 626-line codecs of S 1, the
       525-line  coders must transmit a fixed bit-pattern which is used to
       control the frequency of the video clock in 625-line decoders.  The
       exact  form  of the repetitive pattern need not be specified but it
       must contain seven "ones" and four "zeros" in 11 bits, e.g.:

                              1 0 1 1 0 1 0 1 1 0 1


               Bit 2 - To signal buffer state

               The degree to which the encoder  buffer  is  filled,  after
       correction  for  the  interpolator  (see S 2.5.1.1), is measured in
       increments of 1 K (1 K = 1024 bits), and signalled using  an  8-bit
       binary  code.  When working to a 525-line decoder, the buffer state
       is sampled every 3057 channel-clock  periods.  When  working  to  a
       625-line decoder, the buffer state is sampled 10 times during every
       525-line field period.  When the buffer input is  suspended  for  a
       frame period, the buffer sampling is stopped. The sampled values of
       the buffer state are stored prior to transmission.  The  store  may
       hold  between  zero  and 23 values which have been modified to take
       account of the interpolator coefficients at the times of  sampling.
       The  modified sample values are read out [as bit 2 of TS2 (odd)] at
       a uniform rate; the most significant bit (MSB) in  frame 1  of  the
       multiframe, the second MSB in frame 2, etc.

               Bit 3.7 - Fast update request

               On receipt of this bit set to 1, the transmitter buffer  is
       forced  to  decrease  its full and stabilise to a modified state of
       less than 6 K by preventing coded picture  elements  from  entering
       the  buffer.   Bit A is set to 1 in the next FST. The two following
       fields are treated as complete moving areas and the encoder uses an
       arrangement  for  control  of  the  sub-sampling  modes to make the
       buffer overflow condition unlikely.


       3 A codec for 525-lines, 60 fields/s and 1544  kbit/s  transmission
       for intra-regional use



       3.1         Introduction


            A 1.5 Mbit/s interframe codec described under S 3, is  capable
       of  transmitting and receiving a single NTSC video signal and audio
       signal  using  an  adaptive  predictive   coding   technique   with









       motion-compensated   prediction   ,   background   prediction   and
       intraframe prediction

            The aim of this codec is to effectively transmit  video  tele-
       phone  and  video  conferencing signals which have relatively small
       movements.  The  video  interface  of  the  codec  is  a  525-line,
       60 fields/s  standard  analogue  television signal corresponding to
       the "Class a " standard of Recommendation H.100.


       3.2         Outline of codec


            The essential parts of the codec block diagram  are  shown  in
       Figure 7/H.120.  The  coder  consists  of  three  basic  functional
       blocks, that is, pre-processing, video source coding and  transmis-
       sion coding.

            In the pre-processor, the input analogue NTSC video signal  is
       digitized  and  colour decoded into one luminance component and two
       chrominance components. These three components  are  time  division
       multiplexed  into a digital video form, whose noise and unnecessary
       signal components are removed by the pre-filter.

            In the video source coder, the digital video signal is fed  to
       the  predictive  coder  where  interframe and intraframe predictive
       coding techniques are  fully  utilized  for  minimizing  prediction
       errors   to  be  transmitted.  The  prediction error signal is next
       entropy-coded using its statistical properties to reduce  redundan-
       cies. Since the coded error information is generated in irregularly
       spaced bursts, a buffer is used. If the buffer  becomes  full,  the
       number  of  prediction  error quantizing levels and/or picture ele-
       ments to be coded is reduced to prevent any overflow.

            In the transmission coder, coded video and audio  signals  are
       first  encrypted  on  an  optional basis. The coded video signal is
       then forward error correction coded and scrambled. The  three  sig-
       nals, coded video, coded audio and optional data signals are multi-
       plexed into a 1544 kbit/s digital format with a frame structure  as
       defined in Recommendation H.130.

            The decoder carries out a reverse operation.



                                                        Figure 7/H.120, p.



       3.3         Brief specification



       3.3.1         Video input/output


            NTSC signals are used for the video input/output signal,  with









       monochrome signals being additionally applicable.


       3.3.2         Digital output/input


            The interface conditions for the digital  output/input  signal
       satisfy  Recommendation G.703  specifications. The signal transmis-
       sion rate is 1544 kbit/s.


       3.3.3         Sampling frequency


            The  video  sampling  frequency  is  four  times  the   colour
       sub-carrier  frequency  (fS\dC)  and asynchronous with the 1544 kHz
       network clock.


       3.3.4         Time division multiplexed  (TDM) digital video format


            An NTSC signal is separated into a luminance component (Y) and
       two  chrominance components (C1and C2). A time division multiplexed
       signal composed of Y and time-compressed C1and C2is employed in the
       source coding as the standard digital video format.



       3.3.5         Coding algorithm


            Adaptive   predictive   coding   supplemented   by    variable
       word-length  coding  is  used to achieve low bit rate transmission.
       The following three predictions are carried out adaptively on a pel
       -by-pel  basis:

               a)         motion-compensated interframe prediction  for  a
       still or slowly moving area,

               b)         background prediction  for  an  uncovered  back-
       ground area, and

               c)         intraframe prediction for a rapidly moving area.

            Prediction errors for video signals  and  motion  vectors  are
       both entropy-coded using the following two techniques:

               i)         variable word-length coding for non-zero errors,
       and

               ii)         run-length coding for zero errors.


       3.3.6         Audio channel











            An audio channel using 64 kbit/s is included. The audio coding
       algorithm complies with Recommendation G.722.


       3.3.7         Data channel


            An optional 64 kbit/s data channel is available, which is used
       for video if not required for data.


       3.3.8         Mode of operation


            The normal mode of operation is full duplex, with other modes,
       e.g. the  one-way  broadcasting  operation  mode,  also  taken into
       account.


       3.3.9         Transmission error protection


            A BCH error correcting  code  is  used  along  with  a  demand
       refreshing  method to prevent uncorrected errors from degrading the
       picture quality.


       3.3.10         Additional facilities


            Provision is made in  the  digital  frame  structure  for  the
       future  introduction  of  such  facilities  as encryption, graphics
       transmission and multipoint communication.


       3.3.11         Processing delay


            The coder plus decoder delay is about 165 ms without that of a
       pre-filter and a post-filter.


       3.4         Video interface


            The video input/output signal of the codec is an analogue NTSC
       signal (System M) in accordance with CCIR Report 624.


       3.5         Pre- and post-processing



       3.5.1         Analogue-to-digital and  digital-to-analogue  conver-
       sion











            An NTSC signal band-limited to 4.5 MHz is sampled at a rate of
       14.3 MHz,  four times the colour sub-carrier frequency (fS\dC), and
       converted to an 8-bit linear PCM  signal.  The  sampling  clock  is
       locked  to the horizontal synchronization of the NTSC signal. Since
       the sampling frequency is asynchronous with the network clock,  the
       justification  information  is coded and transmitted from the coder
       to the decoder.

            The digital video data is expressed in two's complement  form.
       The input level to the A/D converter is defined as follows:

               -         sinc tip level  (-40  IRE)  corresponds  to  -124
       (10000100);

               -          white  level  (100  IRE)   corresponds   to   72
       (01001000).

               (IRE: Institute of Radio Engineers)


            As a national option, a pad can be  inserted  before  the  A/D
       converter  if  a  level fluctuation should be taken into account at
       analogue  transmission  lines  connecting  terminal  equipment  and
       codec.

            At the decoder, the NTSC signal is  reproduced  by  converting
       the 8-bit PCM signal to an analogue signal.


       3.5.2         Colour decoding and encoding


            The digitized NTSC signal is separated into the luminance com-
       ponent  (Y) and the carrier band chrominance component (C) by digi-
       tal filtering. The two baseband chrominance signals (C1and C2)  are
       obtained  by  digitally  demodulating  the  separated  carrier band
       chrominance  component.  The  effective  sampling  frequency  after
       colour  decoding  is  converted  to  7.2 MHz  (2 fS\dC) and 1.2 MHz
       (1/3 fS\dC)  for  the  luminance  signal  and  chrominance  signals
       respectively.

            The replica of the NTSC signal is obtained by digitally  modu-
       lating  the  C1and C2signals  and  adding  to  the  Y signal at the
       decoder.

            Filter characteristics for colour decoding  and  encoding  are
       left  to  each  hardware  implementation  since  they do not affect
       interworking between different design codecs.  Examples  of  recom-
       mended characteristics are described in Annex E.


       3.5.3         TDM signal


            A time division multiplexing (TDM) signal is constructed  from
       the separated component signals.










            First, the C1and C2signals are time-compressed to  1/6.  Next,
       each  of the time compressed C1and C2signals, with their horizontal
       blanking parts removed, is inserted into  the  Y signal  horizontal
       blanking  interval  on  alternate lines. C1is inserted on the first
       line  of  the  first  field  and  on  every  other  line  following
       throughout  the  frame,  while  C2is inserted on the second line of
       the first field and on every other line following  throughtout  the
       frame.

            Active samples for the  Y  signal  are  384  samples/line  and
       64 samples/line  for  the  C1and  C2signals. The TDM signal is con-
       structed with these active samples and 7 colour burst samples  (B),
       which are inserted into the top of the TDM signal.

            As shown in Figure 8/H.120, the C1and C2signal sampling points
       coincide with that of the Y signal on every sixth sample. The C1and
       C2signals of only the odd lines are transmitted to the decoder.

            At the decoder, each component signal is  again  demultiplexed
       from  the  TDM  signal, and time-expansion processing of 6 times is
       carried out for the C1and C2signals.

            Note  - When a pad is inserted before  the  A/D  converter  as
       described  in S 3.5.1, pre-emphasis (de-emphasis) with a compensat-
       ing gain for the C1, 2and colour burst signals  is  recommended  at
       the  source  coder  input (decoder output) to obtain better picture
       reproduction in coloured parts.


       3.5.4         Pre- and post-filtering


            In addition to conventional anti-aliasing filtering  prior  to
       analogue-to-digital   conversion,   the   following  two  filtering
       processes should be used as pre-filtering for source coding:

               a)          temporal  filtering  to  reduce  random   noise
       included in the input video signal;

               b)         spatial filtering to reduce aliasing  distortion
       in subsampling.

            At the decoder, the following three filtering processes should
       be  used  as  post-filtering  in  addition to conventional low pass
       filtering after digital-to-analogue conversion:

               i)         spatial filtering  to  interpolate  the  omitted
       picture elements in subsampling;

               ii)         spatio-temporal filtering  to  interpolate  the
       omitted fields in field repetition;

               iii)         temporal filtering to reduce  noise  generated
       in the course of source coding.

            Although these filtering processes are important for improving
       reproduced  picture  quality, their characteristics are independent









       of interworking between different design codecs.  Hence,  pre-  and
       post-filtering is left to each hardware implementation.



                                                        Figure 8/H.120, p.



       3.6         Source coding



       3.6.1         Configuration of source coder and decoder


            The video source coder and decoder configuration of this codec
       is outlined in Figure 9/H.120.

            The predictive encoder converts the input video signal x  into
       the  prediction  error  signal e , using the motion vector v . This
       conversion is controlled by the coding mode m .

            The variable word-length (VWL) coder codes e and  v  into  the
       compressed  data C  using  the  variable  length coding method. The
       transmission buffer memory (BM) smoothes out the irregularly spaced
       data C . The coding mode m is also coded.


            The frame memory parity information p is  used  to  check  the
       identity  of coder and decoder frame memory contents. If any parity
       error is detected, frame memories of both  coder  and  decoder  are
       reset by the demand refresh information (DR) and the demand refresh
       confirmation information (DDR).

            At the decoder, the variable word-length (VWL) decoder decodes
       e  ,  v , m and p , and the predictive decoder reproduces the video
       signal x `.


                                                        Figure 9/H.120, p.



       3.6.2         Predictive coding



       3.6.2.1         Coding modes


            Five coding modes as summarized in Table 3/H.120 are provided.
       All  of the samples are coded and transmitted in normal mode, while
       half of the samples are  omitted  in  subsampling  mode.  In  field
       repetition mode, one or more consecutive fields are omitted (called
       multi-field repetition, see Note 1). If field repetition  mode  and
       subsampling mode are used in combination, only a quarter or less of









       the original picture elements are coded and transmitted.


            Subsampling is carried  out  in  a  quincunx  way,  namely  by
       transmitting  only  odd-numbered  pels  on  odd-numbered  lines and
       even-numbered pels on even-numbered lines in each  block-line  (see
       Note 2).

            In field repetition mode, either the odd or  even  fields  are
       omitted.  For  the  omitted fields, both the prediction error e and
       the motion vector  v are set to 0.

            Note 1  - If odd fields and even fields are mixed after  field
       omission, a severe picture degradation takes place. Hence, 1 out of
       2, 3 out of 4 or 5 out of 6 field omission is recommended.

            Note 2  - Each block-line consists of 8 lines  as  defined  in
       S 3.6.2.5.


                                 H.T. [T7.120]
                                  TABLE 3/H.120
                                  Coding modes

       _________________________________________________________________________
        Coding modes     Abbreviation     Operation
       _________________________________________________________________________
             1         Normal                NRM    Full sampling
       _________________________________________________________________________
             2         Field repetition      FRP    One or more fields omission
       _________________________________________________________________________
             3         Subsampling           SBS    2: 1 per omission
       _________________________________________________________________________
             4         Stop                  STP    Suspension of coding
       _________________________________________________________________________
             5         Refresh               RFS    Renewal of frame memory
       _________________________________________________________________________

      |
      |
      |
      |
      |
      |
      |
      |
      |
      |
      |
      |










                    |
                    |
                    |
                    |
                    |
                    |
                    |
                    |
                    |
                    |
                    |
                    |










                                       |
                                       |
                                       |
                                       |
                                       |
                                       |
                                       |
                                       |
                                       |
                                       |
                                       |
                                       |










                                                  |
                                                  |
                                                  |
                                                  |
                                                  |
                                                  |
                                                  |
                                                  |
                                                  |
                                                  |
                                                  |









                                                                               |
                                                                               |
                                                                               |
                                                                               |
                                                                               |
                                                                               |
                                                                               |
                                                                               |
                                                                               |
                                                                               |
                                                                               |
                                                                               |












                                              Tableau 3/H.120 [T7.120], p.





       3.6.2.2         Adaptive prediction


            Prediction functions are adaptively selected on  a  pel-by-pel
       basis  as shown in Figure 10/H.120. The selection is carried out so
       as to minimize probable prediction  errors.  This  is  accomplished
       using  the  two  prediction status signals, which are determined by
       prediction reference signals, for the preceding pels located on the
       previous and the present lines.

            When subsampling and/or field repetition are operated, omitted
       pels are interpolated in the prediction loop.










            The notations defined for the i-numbered pel are as follows:

               Xi |         local decoder output,

               Yi |         interpolator output,

               Mi  |          motion  compensated  interframe   prediction
       value,

               Bi |         background prediction value,

               Ii |         intraframe prediction value,

               * |         logical product, and

               + |         logical sum.



                                                       Figure 10/H.120, p.



       3.6.2.2.1                   Motion-compensated           interframe
       prediction/background prediction


            Prediction status signal S1i | for pel i
        | is determined as

       where prediction reference signal R1(i ) is

            Based on S1i, prediction signal X1i | is given as

            If pel i  | is either omitted due to subsampling and/or  field
       repetition   or  forced  intraframe  coded  or  in  burst B  ,  its
       corresponding Ri (i ) is set to 0 regardless of equation (3-2).



       3.6.2.2.2         Interframe prediction/intraframe prediction


            Prediction status signal S2i | for pel i
        | is determined as

       where prediction reference signal R2 (i ) is

            Based on S2i, prediction signal X2i | is given as

            If pel (i  - 1) is omitted due to subsampling, R2(i   - 2)  is
       used  instead  of R2 (i  - 1). On the other hand, if pel (i  - 455)
       is omitted, R2(i  - 454)  *  R2 (i  - 456) is used instead  of  R2
       (i   - 455). If pel i is forced intraframe-coded, its corresponding
       R2 (i ) is set to 1 regardless of equation  (3-5).

             If  pel  i   |  is  omitted  due  to  field  repetition,  its









       corresponding  R2 (i  )  is  set to 0 regardless of equation (3-5).
       When pel i is not forced-intraframe coded, R2 (i ) in burst B    is
       set to 0.


       3.6.2.3         Background generation


            The background prediction value is generated scene  adaptively
       as

       where
       [Formula Deleted]