All drawings appearing in this Recommendation have been done in Autocad. Recommendation E.550 GRADE-OF-SERVICE AND NEW PERFORMANCE CRITERIA UNDER FAILURE CONDITIONS IN INTERNATIONAL TELEPHONE EXCHANGES 1 Introduction 1.1 This Recommendation is confined to failures in a single exchange and their impact on calls within that exchange - network impacts are not covered in these Recommendations. 1.2 This Recommendation from the viewpoint of exchange Grade of Service (GOS) has been established. 1.3 In conformity with Recommendation E.543 for transit exchanges under normal operation, this Recommendation applies primarily to international digital exchanges. However, Administrations may consider these Recommendations for their national networks. 1.4 The GOS seen by a subscriber (blocking and/or delay in establishing calls) is not only affected by the variations in traffic loads but also by the partial or complete faults of network components. The concept of customer-perceived GOS is not restricted to specific fault and restoration conditions. For example, the customer is usually not aware of the fact that a network problem has occurred, and he is unable to distinguish a failure condition from a number of other conditions such as peak traffic demands or equipment shortages due to routine maintenance activity. It is therefore necessary that suitable performance criteria and GOS objectives for international telephone exchanges be formulated that take account of the impact of partial and total failures of the exchange. Further, appropriate definitions, models and measurement and calculation methods need to be developed as part of this activity. 1.5 From the subscriber's point of view, the GOS should not only be defined by the level of unsatisfactory service but also by the duration of the intervals in which the GOS is unsatisfactory and by the frequency with which it occurs. Thus, in its most general form the performance criteria should take into account such factors as: intensity of failures and duration of resulting faults, traffic demand at time of failures, number of subscribers affected by the failures and the distortions in traffic patterns caused by the failures. However, from a practical viewpoint, it will be desirable to start with simpler criteria that could be gradually developed to account for all the factors mentioned above. 1.6 Total or partial failures within the international part of the network have a much more severe effect than similar failures in the national networks because the failed components in the national networks can be isolated and affected traffic can be rerouted. Failures in the international part of the network may therefore lead to degraded service in terms of increased blocking delays and even complete denial of service for some time. The purpose of this Recommendation is to set some service objectives for international exchanges so that the subscribers demanding international connections are assured a certain level of service. It should be noted however that where there are multi-gateway exchanges providing access to and from a country, with diversity of circuits and provision for restoration, the actual GOS will be better than that for the single exchange. 2 General considerations 2.1 The new performance criteria being sought involve concepts from the field of "availability" (intensity of failures and duration of faults) and "traffic congestion" (levels of blocking and/or delay). It is therefore necessary that the terminology, definitions and models considered should be consistent with the appropriate CCITT Recommendations on terminology and vocabulary. 2.2 During periods of heavy congestion, caused either by traffic peaks or due to malfunction in the exchange, a significant increase in repeated attempts is likely to occur. Further, it is expected that due to accumulated demands during a period of complete faults, the exchange will experience a heavy traffic load immediately after a failure condition has been removed and service restored. The Fascicle II.3 - Rec. E.550 PAGE1 potential effects of these phenomena on the proposed GOS under failure conditions should be taken into account (for further study). 3 Exchange performance characteristics under fault situations 3.1 The exchange is considered to be in a fault situation if any failure in the exchange (hardware, software, human errors) reduces its throughput when it is needed to handle traffic. The following four classes of exchange faults are included in this Recommendation: a) complete exchange faults; b) partial faults resulting in capacity reduction in all traffic flows to the same extent; c) partial faults in which traffic flows to or from a particular point are restricted or totally isolated from their intended route; d) intermittent fault affecting a certain proportion of calls. 3.2 To the extent practical, an exchange should be designed so that the failure of a unit (or units) within the exchange should have as little as possible adverse affect on its throughput. In addition, the exchange should be able to take measures within itself to lessen the impact of any overload resulting from failure of any of its units. Units within an exchange whose failure reduces the exchange throughput by greater amounts than other units should have proportionally higher availability (Recommendation Q.504, S 4). 3.3 When a failure reduces exchange throughput and congestion occurs, the exchange should be able to initiate congestion control indications to other exchanges and network management systems so as to help control the offered load to the exchange, (Recommendations E.410 and Q.506). 4 GOS and applicable models 4.1 In this section, the terms "accessible" and "inaccessible" are used in the sense defined in Recommendation G.106 (Red Book). The GOS for exchanges under failure conditions can be formulated at the following two conceptual levels from a subscriber's viewpoint: 4.1.1 Instantaneous service accessibility (inaccessibility) At this level, one focuses on the probability that the service is accessible (not accessible) to the subscriber at the instant he places a demand. 4.1.2 Mean service accessibility (inaccessibility) At this level, one extends the concept of "downtime" used in availability specifications for exchanges to include the effects of partial failures and traffic overloads over a long period of time. 4.2 Based on the GOS concept outlined in S 4.1, the GOS parameters for exchanges under failure conditions are defined as follows: 4.2.1 instantaneous exchange inaccessibility is the probability that the exchange in question cannot perform the required function (i.e. cannot successfully process calls) under stated conditions at the time a request for service is placed. 4.2.2 mean exchange service inaccessibility is the average of instantaneous exchange service inaccessibility over a prespecified observation period (e.g. one year). 4.2.3 Note 1 - The GOS model in the case of instantaneous exchange inaccessibility parallels the concept of the call congestion in traffic theory and needs to be extended to include the call congestion caused by exchange failures classified in S 3.1. The GOS value can then be assigned on a basis similar to Recommendation E.543 for transit exchanges under normal operation. Note 2 - A model for estimating the mean exchange inaccessibility is provided in Annex A. Though the model provides a simple and hence attractive approach, some practical issues related to measurement and monitoring and the potential effects of network management controls and scheduled maintenance on the GOS need further study. PAGE8 Fascicle II.3 - Rec. E.550 4.3 The model in Figure 1/E.550 outlines the change in the nature of traffic offered under failure conditions. Figure 1/E.550 - T0200870-87 In normal conditions the congestion factor B is low and there should be few repeat attempts: as a consequence the traffic At approximates Ao. Under failure conditions there is a reduction in resources and the congestion factor B increases. This provokes the phenomenon of repeat attempts and hence the load At on the exchange becomes greater than the original Ao. Therefore it is necessary to evaluate the congestion with the new load At assuming system stability exists, which may not always be the case. Recommendation E.501 furnishes the appropriate models to detect the traffic offered from the carried traffic taking into account the repeat attempts. 4.4 The impact on the GOS for each of the exchange fault modes can be characterized by: - load in Erlangs (At) and busy hour call attempts (BHCA); - inaccessibility (instantaneous and mean), congestion and delay parameters (call set-up, through-connection, etc.); - fault duration; - failure intensity. 5 GOS standards and inaccessibility 5.1 Exchange fault situations can create similar effects to overload traffic conditions applied to an exchange under fault free conditions. In general, digital exchanges operating in the network should be capable of taking action to ensure maximum throughput when they encounter an overload condition, including any that have been caused by a fault condition within the exchange. Calls that have been accepted for processing by the exchange should continue to be processed as expeditiously as possible, consistent with the overload protection strategies recommended in S 3 of Recommendation Q.543. 5.2 One of the actions the exchange may take to preserve call processing capacity is to initiate congestion controls and/or other network management actions, to control the load offered to the exchange (Recommandations E.410, E.413 and Q.506). The most obvious impact from the caller's viewpoint may be a lowering of the probability that the network as a whole will be able to complete some portion of the call attempts that the exchange is unable to accept during the failure condition. 5.3 International exchanges occupy a prominent place in the network and it is important that their processing capacity have high availability. There are likely to be many variations in exchange architectures and sizes that will have different impacts in the categories of failure and the resulting loss of capacity. In general, failures that cause large proportions of exchange capacity to be lost must have a low probability of occurring and a short downtime. It is important that maintenance procedures to achieve appropriate exchange availability performance be adopted. 5.4 The formal expression of the criterion of mean exchange service inaccessibility is as follows: Let: y(t): Intensity of call attempts gaining access through the exchange assuming no failures. s(t): Intensity of call attempts actually given access through the exchange, taking into account the fault conditions which occur in the exchange. Then the mean exchange service inaccessibility during a period of time T is given by P = eq \f( 1, T) \i(0,T, ) eq \f( y(t) - s(t), y(t)) dt Annex A describes a practical implementation of this criterion. For periods in which the exchange experiences a complete fault, i.e. s(t) = 0, the expression: eq \f(y(t) - s(t), y(t)) is equal to 1. The contribution of such periods to the total criterion P may then be expressed simply as the fraction Ptotal of the evaluation peri T during which complete exchange outage due to failure occurred. The objective for Ptotal is given as Ptotal not more than 0.4 hours per year. Fascicle II.3 - Rec. E.550 PAGE1 For the period of partial failure, it is convenient to also express the objective as equivalent hours per year - the term equivalent is used because the duration of partial faults is weighted by the fraction: eq \f(y(t) - s(t), y(t)) of call attempts denied access. The objectives for the contribution of period of partial exchange faults to the total criterion P is given by: Ppartial not more than 1.0 equivalent hours per year. Note that by definition P = Ptotal + Ppartial The inaccessibility criterion does not cover: - planned outages - faults with duration of less than 10 seconds - accidental damage to equipment during maintenance - external failures such as power failures, etc. It does cover failures resulting from both hardware and software faults. In addition, the objectives relate to the exchange under normal operating conditions and do not include failures just after cutover of an exchange or those during the end of the period it is in service, i.e. the well known "bath tub" distribution. 6 Performance monitoring Certain failure conditions [i.e. the type mentioned in S 3.1, b)] usually will be reflected in the normal GOS performance measurements called for in Recommendation E.543. Other failure conditions [i.e. the type mentioned in S 3.1, c)] can result in a reduced performance for a portion of traffic flows but with little or no impact on measured exchange GOS. For example if a trunk module in a digital exchange fails, the traffic normally associated with that module is completely blocked, but since the attempts are also not measured the failure does not change the monitoring of the exchange GOS. For this second situation, the mean inaccessibility can be calculated using direct measurement of unit outages to provide mi and ti information and estimates of bi together with the model of Annex A. (See Annex A for an explanation of these symbols.) The estimates of bi can incorporate both fixed factors based on exchange architecture and variable factors based on traffic measurements just prior to the time of failure. ANNEX A (to Recommendation E.550) A model for mean exchange inaccessibility A.1 Let P be the probability that a call attempt is not processed due to a fault in the exchange, then: eq P = \i\su(i=1,N, )pi bi (A-1) where: pi is the probability of fault mode i. Each fault mode denotes a specific combination of faulty exchange components N is the number of the fault mode bi is the average proportion of traffic which cannot be processed due to the fault mode i. It is a function of the specific fault present and the offered traffic load at the time of the failure condition. During a period of time T, the fault probability pi may be estimated by: pi = eq \f( mi . ti, T) i = 1, 2, . . . N (A-2) where: mi is the number of occurrences of fault mode i during the period T ti is the average duration of occurrences of fault mode i As a practical matter, one may wish to exclude from the calculation faults of duration less than 15 seconds. Note 1 - A given fault mode causes the exchange to enter the corresponding fault state, which is characterized by a given mean duration and a function bi giving the proportion of offered traffic affected. In principle, the possible number of fault modes can be very large because of the number of combinations which can occur. In practice this number can be reduced by considering all fault modes with the same bi and ti as equivalent. Note 2 - bi should take into account the distribution of traffic during a day and the probability of fault mode i occurring in a given time period. The value assigned in the above model should be the average bi value for all hours considered in these distributions. For example, a partial fault affecting 20% of PAGE8 Fascicle II.3 - Rec. E.550 the exchange traffic throughput in the busy hour and 2 similar hours, could be evaluated to effect a 10% reduction in 4 other moderately busy hours and to have negligible impact during all other hours. If this fault is considered to be equally probable in time, the average value of bi can be obtained as follows: bi = Sum ofeq \b\bc\( (\f( Percentage of traffic affected x number of relevant busy hours, 24 hours)) = =eq \f( 0.2 x 3, 24) +eq \f( 0.1 x 4, 24) +eq \f( 0.0 x 17, 24) = 0.025 + 0.0167 = 0.0417 Note 3 - The probability that a call attempt is not processed relates to the category of traffic affected by the fault. Other traffic will experience a different GOS depending on system architecture which is not taken into account in this Recommendation. For example, partial faults which remove from service blocks of trunks connected to an exchange have the effect of reducing the total traffic offered to the exchange. The traffic flows not using the failed trunks could thus have a slightly improved GOS. A.2 Example for calculating the inaccessibility, P See Table A-1/E.550. TABLE A.1/E.550 An example of using the model for calculating the inaccessibility P (T = 1 year = 8760 hours) bi mi ti pi . bi Average proportion Number of failures Average duration of Probability that a of traffic which of failure type i call attempt is not cannot be processed type i per year (hours) processed (x 10-5) 1.00 2 0.2 4.56 0.40 3 0.22 3.01 0.20 4 0.3 2.74 0.10 6 0.4 2.74 0.05 10 0.5 2.85 Fascicle II.3 - Rec. E.550 PAGE1 The value of P is the sum of the individual pi.bi terms in Table A-1/E.550. In this example P = 15.90 x 10-5 which is equivalent to 1.39 hours of inaccessibility per year (1.39 = 15.90 x 10-5 x 8760). P decomposes as follows: Ptotal = 0.40 hours per year (4.56 x 10-5 x 8760) Ppartial = 0.99 hours per year (the remaining part of P) A.3 As a further example consider a circuit group where exchange failures may occur which disable one or more circuits (see Figure A-1/E.550). It is possible to expand the formula (A-1). Figure A-1/E.550 - T0200880-87 The average proportion of traffic b(n, k, A), which cannot be processed due to failures on circuits is now a function of: - n, the size of the circuit group; - k, number of circuits out of order because of the failure; - A, the mean traffic offered to the circuit group, in the absence of faults. Let the throughput of a circuit group of size n with a traffic offered A be Cn(A) - then the throughput of the same circuit group is Cn-k(A) where k circuits are out of order - hence the average proportion of traffic b(n, k, A) which cannot be processed because of the failure is given by: b(n, k, A) = eq \f( [Cn(A) - Cn-k (A)], Cn(A)) (A-3) Let f(k, A) be the probability for having k circuits in a fault condition and the mean offered traffic A. The probability, Pn, that a call attempt is not processed due to a failure on a circuit group of size n, is given by: Pn = eq \i\su(k\l(,)A, , ) f (k, A) . b(n, k, A) k = 1, 2, . . . n (A-4) If k and A are independent then f (k, A) = f1(k) . f2(A) (A-5) where f1 (k) may satisfy a binomial distribution and f2(A) a Poisson distribution. Suppose the traffic follows an Erlang distribution, Cn(A) is proportional to A . (1 - En(A)), where En(A) is the blocking probability expressed by the Erlang loss formula. Hence: b (n, k, A) = eq \f( En-k (A) - En (A), 1 - En (A)) (A-6) can be found by using the Erlang tables and then inserting the value into equation (A-4). PAGE8 Fascicle II.3 - Rec. E.550