All drawings appearing in this Recommendation have been done in Autocad.
         Recommendation E.550
             GRADE-OF-SERVICE AND NEW PERFORMANCE CRITERIA UNDER FAILURE
                   CONDITIONS IN INTERNATIONAL TELEPHONE EXCHANGES
         1      Introduction
         1.1    This Recommendation is confined to failures in a single exchange and their
         impact on calls within that exchange - network impacts are not covered  in  these
         Recommendations.
                  1.2    This Recommendation from the viewpoint of exchange Grade of Service (GOS) has been established.
         1.3   In conformity with Recommendation E.543 for transit exchanges
         under normal operation, this Recommendation  applies  primarily  to
         international  digital  exchanges.  However,  Administrations   may
         consider these Recommendations for their national networks.
         1.4   The GOS seen  by  a  subscriber  (blocking  and/or  delay  in
         establishing calls) is not  only  affected  by  the  variations  in
         traffic loads but also by the partial or complete faults of network
         components. The concept of customer-perceived GOS is not restricted
         to specific fault and  restoration  conditions.  For  example,  the
         customer is usually not aware of the fact that  a  network  problem
         has occurred, and he is unable to distinguish a  failure  condition
         from a number of other conditions such as peak traffic  demands  or
         equipment shortages due to  routine  maintenance  activity.  It  is
         therefore necessary that  suitable  performance  criteria  and  GOS
         objectives for international telephone exchanges be formulated that
         take account of the impact of partial and  total  failures  of  the
         exchange. Further, appropriate definitions, models and  measurement
         and calculation methods need  to  be  developed  as  part  of  this
         activity.
         1.5   From the subscriber's point of view, the GOS should not  only
         be defined by the level of unsatisfactory service but also  by  the
         duration of the intervals in which the GOS is unsatisfactory and by
         the frequency with which it occurs. Thus, in its most general  form
         the performance criteria should take into account such factors  as:
         intensity of failures and duration  of  resulting  faults,  traffic
         demand at time of failures, number of subscribers affected  by  the
         failures and the distortions in  traffic  patterns  caused  by  the
         failures.
              However, from a practical viewpoint, it will be  desirable  to
         start with simpler criteria that could be  gradually  developed  to
         account for all the factors mentioned above.
         1.6   Total or partial failures within the international part of the 
         network have a much more severe effect than similar failures in the
         national networks because the failed  components  in  the  national
         networks can be isolated and affected traffic can be rerouted.
              Failures  in  the  international  part  of  the  network   may
         therefore lead to degraded service in terms of  increased  blocking
         delays and even complete denial  of  service  for  some  time.  The
         purpose of this Recommendation is to set  some  service  objectives
         for international  exchanges  so  that  the  subscribers  demanding
         international connections are assured a certain level of service.
              It should be noted however that where there are  multi-gateway
         exchanges providing access to and from a country, with diversity of
         circuits and provision for restoration,  the  actual  GOS  will  be
         better than that for the single exchange.
         2      General considerations
         2.1    The new performance criteria being sought involve concepts from the  field
         of "availability" (intensity of failures and duration  of  faults)  and  "traffic
         congestion" (levels of blocking and/or delay). It is therefore necessary that the
         terminology, definitions and models considered  should  be  consistent  with  the
         appropriate CCITT Recommendations on terminology and vocabulary.
         2.2    During periods of heavy congestion, caused either by traffic peaks or  due
         to malfunction in the exchange, a significant increase in  repeated  attempts  is
         likely to occur. Further, it is expected that due to accumulated demands during a
         period of complete faults, the exchange will  experience  a  heavy  traffic  load
         immediately after a failure condition has been removed and service restored.  The



                                                        Fascicle II.3 - Rec. E.550   PAGE1

       potential effects of these phenomena on the proposed GOS under failure conditions
       should be taken into account (for further study).
       3      Exchange performance characteristics under fault situations
       3.1    The exchange is considered to be in a fault situation if  any  failure  in
       the exchange (hardware, software, human errors) reduces its throughput when it is
       needed to handle traffic. The following  four  classes  of  exchange  faults  are
       included in this Recommendation:
             a)  complete exchange faults;
             b)  partial faults resulting in capacity reduction in all traffic flows to
               the same extent;
             c)  partial faults in which traffic flows to or from a particular point are 
               restricted or totally isolated from their intended route;
             d)  intermittent fault affecting a certain proportion of calls.
       3.2    To the extent practical, an  exchange  should  be  designed  so  that  the
       failure of a unit (or units)  within  the  exchange  should  have  as  little  as
       possible adverse affect on its throughput. In addition, the  exchange  should  be
       able to take measures  within  itself  to  lessen  the  impact  of  any  overload
       resulting from failure of any of  its  units.  Units  within  an  exchange  whose
       failure reduces the exchange throughput  by  greater  amounts  than  other  units
       should have proportionally higher availability (Recommendation Q.504, S 4).
       3.3    When a failure reduces exchange  throughput  and  congestion  occurs,  the
       exchange should be able to  initiate  congestion  control  indications  to  other
       exchanges and network management systems so as to help control the  offered  load
       to the exchange, (Recommendations E.410 and Q.506).
       4      GOS and applicable models
       4.1    In this section, the terms "accessible" and "inaccessible" are used in the
       sense defined in Recommendation G.106 (Red Book). The  GOS  for  exchanges  under
       failure conditions can be formulated at the following two conceptual levels  from
       a subscriber's viewpoint:
       4.1.1  Instantaneous service accessibility (inaccessibility)
             At this  level,  one  focuses  on  the  probability  that  the  service  is
       accessible (not accessible) to the subscriber at the instant he places a demand.
       4.1.2  Mean service accessibility (inaccessibility)
             At this level, one extends the concept of "downtime" used  in  availability
       specifications for exchanges to include  the  effects  of  partial  failures  and
       traffic overloads over a long period of time.
       4.2    Based on the GOS concept  outlined  in  S  4.1,  the  GOS  parameters  for
       exchanges under failure conditions are defined as follows:
       4.2.1  instantaneous exchange inaccessibility is the probability that 
       the exchange in question cannot perform the required function (i.e.
       cannot successfully process calls) under stated conditions  at  the
       time a request for service is placed.
       4.2.2 mean exchange  service  inaccessibility  is  the  average  of
       instantaneous exchange service inaccessibility over a  prespecified
       observation period (e.g. one year).
       4.2.3 Note 1 - The GOS model in the case of instantaneous  exchange
       inaccessibility parallels the concept of  the  call  congestion  in
       traffic theory and  needs  to  be  extended  to  include  the  call
       congestion caused by exchange failures classified in S 3.1. The GOS
       value can then be assigned on a  basis  similar  to  Recommendation
       E.543 for transit exchanges under normal operation.
            Note  2  -  A  model  for   estimating   the   mean   exchange
       inaccessibility is provided in Annex A. Though the model provides a
       simple and hence attractive approach, some practical issues related
       to measurement and monitoring and the potential effects of  network
       management controls and  scheduled  maintenance  on  the  GOS  need
       further study.












       PAGE8   Fascicle II.3 - Rec. E.550

         4.3   The model in Figure 1/E.550 outlines the change in the nature
         of traffic offered under failure conditions.
                                        Figure 1/E.550 - T0200870-87

               In normal conditions the congestion factor B is low  and  there  should  be
         few repeat attempts: as a consequence the traffic At approximates Ao.
               Under failure  conditions  there  is  a  reduction  in  resources  and  the
         congestion factor B increases. This provokes the phenomenon  of  repeat  attempts
         and hence the load At on the exchange becomes greater than the original Ao.
               Therefore it is necessary to evaluate the congestion with the new  load  At
         assuming system stability exists, which may not always be the case.
               Recommendation  E.501  furnishes  the  appropriate  models  to  detect  the
         traffic offered from the carried traffic taking into account the repeat attempts.
         4.4    The impact on the GOS  for  each  of  the  exchange  fault  modes  can  be
         characterized by:
               -   load in Erlangs (At) and busy hour call attempts (BHCA);
               -    inaccessibility  (instantaneous  and  mean),  congestion  and   delay
                  parameters (call set-up, through-connection, etc.);
               -   fault duration;
               -   failure intensity.
         5      GOS standards and inaccessibility
         5.1    Exchange fault situations can create similar effects to  overload  traffic
         conditions applied to an exchange under fault free conditions.
               In general, digital exchanges operating in the network  should  be  capable
         of taking action to ensure maximum throughput when  they  encounter  an  overload
         condition, including any that have been caused by a fault  condition  within  the
         exchange.
               Calls that have  been  accepted  for  processing  by  the  exchange  should
         continue to be processed  as  expeditiously  as  possible,  consistent  with  the
         overload protection strategies recommended in S 3 of Recommendation Q.543.
         5.2    One of the actions the exchange  may  take  to  preserve  call  processing
         capacity is to initiate  congestion  controls  and/or  other  network  management
         actions, to control the load offered  to  the  exchange  (Recommandations  E.410,
         E.413 and Q.506). The most obvious impact from the caller's viewpoint  may  be  a
         lowering of the probability that the network as a whole will be able to  complete
         some portion of the call attempts that the exchange is unable  to  accept  during
         the failure condition.
         5.3    International exchanges occupy a prominent place in the network and it  is
         important that their processing capacity have high availability. There are likely
         to be many  variations  in  exchange  architectures  and  sizes  that  will  have
         different impacts in  the  categories  of  failure  and  the  resulting  loss  of
         capacity.
               In general, failures that cause large proportions of exchange  capacity  to
         be lost must have a low probability of occurring and  a  short  downtime.  It  is
         important  that  maintenance   procedures   to   achieve   appropriate   exchange
         availability performance be adopted.
         5.4     The  formal  expression  of  the  criterion  of  mean  exchange   service
         inaccessibility is as follows:
         Let:
               y(t):  Intensity of call attempts  gaining  access  through  the  exchange
                  assuming no failures.
               s(t):  Intensity of  call  attempts  actually  given  access  through  the
                  exchange, taking into account the fault conditions which occur  in  the
                  exchange.
         Then the mean exchange service inaccessibility during a period of time T is given
         by
                      P = eq \f( 1, T)    \i(0,T, ) eq \f( y(t) - s(t), y(t)) dt
               Annex A describes a practical implementation of this criterion.
               For periods in which the exchange experiences a complete fault,  i.e.  s(t)
         = 0, the expression:
                            eq \f(y(t) - s(t), y(t))     is equal to 1.
               The contribution of such periods to the  total  criterion  P  may  then  be
         expressed simply as the fraction Ptotal of the evaluation peri   T  during  which
         complete exchange outage due to failure occurred.
               The objective for Ptotal is given as Ptotal not more than 0.4 hours per year.




                                                        Fascicle II.3 - Rec. E.550   PAGE1

               For the period of partial failure, it is convenient  to  also  express  the
         objective as equivalent hours per year - the term equivalent is used because  the
         duration of partial faults is weighted by the fraction:
                                      eq \f(y(t) - s(t), y(t))
         of call attempts denied access. The objectives for the contribution of period  of
         partial exchange faults to the total criterion P is given by:
               Ppartial not more than 1.0 equivalent hours per year.
               Note that by definition P = Ptotal + Ppartial
               The inaccessibility criterion does not cover:
               -   planned outages
               -   faults with duration of less than 10 seconds
               -   accidental damage to equipment during maintenance
               -   external failures such as power failures, etc.
               It does cover failures resulting from both hardware and software faults.
               In addition, the objectives relate to the exchange under  normal  operating
         conditions and do not include failures just after cutover of an exchange or those
         during the end of the period it is in service, i.e. the  well  known  "bath  tub"
         distribution.
         6      Performance monitoring
               Certain failure conditions [i.e. the type mentioned in S 3.1,  b)]  usually
         will be reflected in the  normal  GOS  performance  measurements  called  for  in
         Recommendation E.543.
               Other failure conditions [i.e. the type mentioned in S 3.1, c)] can  result
         in a reduced performance for a portion of traffic flows but  with  little  or  no
         impact on measured exchange GOS. For example if  a  trunk  module  in  a  digital
         exchange fails, the traffic normally associated with that  module  is  completely
         blocked, but since the attempts are also not measured the failure does not change
         the monitoring of the exchange GOS.
               For this second situation,  the  mean  inaccessibility  can  be  calculated
         using direct measurement of unit outages to provide mi  and  ti  information  and
         estimates of bi together with  the  model  of  Annex  A.  (See  Annex  A  for  an
         explanation of these symbols.)
               The estimates of bi can incorporate both fixed factors  based  on  exchange
         architecture and variable factors based on traffic measurements just prior to the
         time of failure.
                                                   ANNEX A
                                     (to Recommendation E.550)
                       A model for mean exchange inaccessibility
         A.1    Let P be the probability that a call attempt is not  processed  due  to  a
         fault in the exchange, then:
                                     eq P = \i\su(i=1,N, )pi bi                       (A-1)
         where:
               pi  is the probability of fault mode i. Each fault mode denotes a specific
                  combination of faulty exchange components
               N   is the number of the fault mode
               bi  is the average proportion of traffic which cannot be processed due  to
                  the fault mode i. It is a function of the specific  fault  present  and
                  the offered traffic load at the time of the failure condition.
               During a period of time T, the fault probability pi may be estimated by:
                            pi = eq \f( mi . ti, T)   i = 1, 2, . . . N               (A-2)
         where:
               mi  is the number of occurrences of fault mode i during the period T
               ti  is the average duration of occurrences of fault mode i
               As a practical matter, one may wish to exclude from the calculation  faults
         of duration less than 15 seconds.
               Note 1 - A given fault mode causes the exchange to enter the  corresponding
         fault state, which is characterized by a given mean duration and  a  function  bi
         giving the proportion of offered traffic affected.  In  principle,  the  possible
         number of fault modes can be very large because of  the  number  of  combinations
         which can occur. In practice this number can be reduced by considering all  fault
         modes with the same bi and ti as equivalent.
               Note 2 - bi should take into account the distribution of traffic  during  a
         day and the probability of fault mode i occurring in a  given  time  period.  The
         value assigned in the above model should be the average bi value  for  all  hours
         considered in these distributions. For example, a partial fault affecting 20%  of




         PAGE8   Fascicle II.3 - Rec. E.550

          the exchange traffic throughput in the busy hour and 2 similar  hours,  could  be
          evaluated to effect a 10% reduction in 4 other moderately busy hours and to  have
          negligible impact during all other hours. If  this  fault  is  considered  to  be
          equally probable in time, the average value of bi can be obtained as follows:
                       bi = Sum ofeq \b\bc\( (\f( Percentage of traffic affected x number 
          of relevant busy hours, 24 hours)) =
                       =eq \f( 0.2 x 3, 24) +eq \f( 0.1 x 4, 24) +eq \f( 0.0 x 17, 24) = 
          0.025 + 0.0167 = 0.0417
                Note 3 - The probability that a call attempt is not  processed  relates  to
          the category of traffic affected by the fault. Other traffic  will  experience  a
          different GOS depending on system architecture which is not taken into account in
          this Recommendation. For example, partial faults which remove from service blocks
          of trunks connected to an exchange have the effect of reducing the total  traffic
          offered to the exchange. The traffic flows not using the failed trunks could thus
          have a slightly improved GOS.
          A.2    Example for calculating the inaccessibility, P
                See Table A-1/E.550.
                                                TABLE A.1/E.550
                      An example of using the model for calculating the inaccessibility P
                                           (T = 1 year = 8760 hours)
                   bi                    mi                    ti                  pi . bi
           Average proportion     Number of failures    Average duration of    Probability that a 
            of traffic which              of               failure type i      call attempt is not 
           cannot be processed      type i per year            (hours)           processed (x 10-5)
                  1.00                    2                     0.2                   4.56
                  0.40                    3                    0.22                  3.01
                  0.20                    4                     0.3                   2.74
                  0.10                    6                     0.4                   2.74
                  0.05                   10                    0.5                   2.85








































                                                         Fascicle II.3 - Rec. E.550   PAGE1

               The value of  P  is  the  sum  of  the  individual  pi.bi  terms  in  Table
         A-1/E.550. In this example P = 15.90 x 10-5 which is equivalent to 1.39 hours  of
         inaccessibility per year (1.39 = 15.90 x 10-5 x 8760). P decomposes as follows:
               Ptotal  = 0.40 hours per year (4.56 x 10-5 x 8760)
               Ppartial = 0.99 hours per year (the remaining part of P)
         A.3    As a further example consider a circuit group where exchange failures  may
         occur which disable one or more circuits (see Figure A-1/E.550). It  is  possible
         to expand the formula (A-1).
                                       Figure A-1/E.550 - T0200880-87

               The average proportion of traffic b(n, k, A),  which  cannot  be  processed
         due to failures on circuits is now a function of:
               -   n, the size of the circuit group;
               -   k, number of circuits out of order because of the failure;
               -   A, the mean traffic offered to the circuit group, in  the  absence  of
                  faults.
         Let the throughput of a circuit group of size n with a traffic offered A be Cn(A)
         - then the throughput of the same circuit group is Cn-k(A) where k  circuits  are
         out of order - hence the average proportion of traffic b(n, k, A) which cannot be
         processed because of the failure is given by:
                          b(n, k, A) = eq \f( [Cn(A) - Cn-k (A)], Cn(A))              (A-3)
         Let
               f(k, A) be the probability for having k circuits in a fault  condition  and
         the mean offered traffic A. The probability, Pn,  that  a  call  attempt  is  not
         processed due to a failure on a circuit group of size n, is given by:
                      Pn = eq \i\su(k\l(,)A, , ) f (k, A) . b(n, k, A)      k = 1, 2, . . 
         . n    (A-4)
               If k and A are independent then
                                     f (k, A) = f1(k) . f2(A)                         (A-5)
         where  f1  (k)  may  satisfy  a  binomial  distribution  and  f2(A)   a   Poisson
         distribution.
               Suppose the traffic follows an Erlang distribution, Cn(A)  is  proportional
         to A . (1 - En(A)), where En(A) is the  blocking  probability  expressed  by  the
         Erlang loss formula. Hence:
                         b (n, k, A) = eq \f( En-k (A) - En (A), 1 - En (A))            (A-6)
         can be found by using the  Erlang  tables  and  then  inserting  the  value  into
         equation (A-4).
































         PAGE8   Fascicle II.3 - Rec. E.550