All drawings appearing in this Recommendation have been done in Autocad. Recommendation E.5071) MODELS FOR FORECASTING INTERNATIONAL TRAFFIC 1 Introduction Econometric and time series model development and forecasting requires familiarity with methods and techniques to deal with a range of different situations. Thus, the purpose of this Recommendation is to present some of the basic ideas and leave the explanation of the details to the publications cited in the reference list. As such, this Recommendation is not intended to be a complete guide to econometric and time series modelling and forecasting. The Recommendation also gives guidelines for building various forecasting models: identification of the model, inclusion of explanatory variables, adjustment for irregularities, estimation of parameters, diagnostic checks, etc. In addition the Recommendation describes various methods for evaluation of forecasting models and choice of model. 2 Building the forecasting model This procedure can conveniently be described as four consecutive steps. The first step consists in finding a useful class of models to describe the actual situation. Examples of such classes are simple models, smoothing models, autoregressive models, autoregressive integrated moving average (ARIMA) models or econometric models. Before choosing the class of models, the influence of external variables should be analyzed. If special external variables have significant impact on the traffic demand, one ought to include them in the forecasting models, provided enough historical data are available. The next step is to identify one tentative model in the class of models which have been chosen. If the class is too extensive to be conveniently fitted directly to data, rough methods for identifying subclasses can be used. Such methods of model identification employ data and knowledge of the system to suggest an appropriate parsimonious subclass of models. The identification procedure may also, in some occasions, be used to yield rough preliminary estimates of the parameters in the model. Then the tentative model is fitted to data by estimating the parameters. Usually, maximum likelihood estimators or least square estimators are used. The next step is to check the model. This procedure is often called diagnostic checking. The object is to find out how well the model fits the data and, in case the discrepancy is judged to be too severe, to indicate possible remedies. The outcome of this step may thus be acceptance of the model if the fit is acceptable. If on the other hand it is inadequate, it is an indication that new tentative models may in turn be estimated and subjected to diagnostic checking. In Figure 1/E.507 the steps in the model building procedure are illustrated. Figure 1/E.507 - CCITT 64250 3 Various forecasting models The objective of S 3 is to give a brief overview of the most important forecasting models. In the GAS 10 Manual on planning data and forecasting methods [5], a more detailed description of the models is given. 3.1 Curve fitting models In curve fitting models the traffic trend is extrapolated by calculating the values of the parameters of some function that is expected to characterize the growth of international traffic over time. The numerical calculations of some curve fitting models can be performed by using the least squares method. The following are examples of common curve fitting models used for forecasting international traffic: Linear: Yt = a + bt (3-1) Parabolic: Yt = a + bt + ct2 (3-2) Exponential: Yt = aebt (3-3) Logistic: Yt = eq \f( M,1 + aebt) (3-4) 1) The old Recommendation E.506 which appeared in the Red Book was split into two Recommendations, revised E.506 and new E.507, and considerable new material was added to both. Fascicle II.3 - Rec. E.507 PAGE1 Gompertz: Yt = M(a)bt (3-5) where Yt is the traffic at time t, a, b, c are parameters, M is a parameter describing the saturation level. The various trend curves are shown in Figures 2/E.507 and 3/E.507. The logistic and Gompertz curves differ from the linear, parabolic and exponential curves by having saturation or ceiling level. For further study see [10]. FIGURE 2/E.507 - T0200660-87 FIGURE 3/E.507 - T0200670-87 3.2 Smoothing models By using a smooth process in curve fitting, it is possible to calculate the parameters of the models to fit current data very well but not necessarily the data obtained from the distant past. The best known smoothing process is that of the moving average. The degree of smoothing is controlled by the number of most recent observations included in the average. All observations included in the average have the same weight. In addition to moving average models, there exists another group of smoothing models based on weighting the observations. The most common models are: - simple exponential smoothing, - double exponential smoothing, - discounted regression, - Holt's method, and - Holt-Winters' seasonal models. For example, in the method of exponential smoothing the weight given to previous observations decreases geometrically with age according to the following equation: eq \o(\s\up4(^),mt) = (1 - a)Yt + aeq \o(\s\up4(^),m)t-1 (3-6) where: Yt is the measured traffic at time t, mt is the estimated level at time t, and a is the discount factor [and (1 - a) is the smoothing parameter]. The impact of past observations on the forecasts is controlled by the magnitude of the discount factor. Use of smoothing models is especially appropriate for short-term forecasts. For further studies see [1], [5] and [9]. 3.3 Autoregressive models If the traffic demand, Xt, at time t can be expressed as a linear combination of earlier equidistant observations of the past traffic demand, the process is an autoregressive process. Then the model is defined by the expression: Xt = F1Xt-1 + F2Xt-2 + . . . + FpXt-p + at (3-7) where at is white noise at time t; Fk, k = 1, . . . p are the autoregressive parameters. The model is denoted by AR(p) since the order of the model is p. By use of regression analysis the estimates of the parameters can be found. Because of common trends the exogenous variables (Xt-1, Xt-2, . . . Xt-p) usually strongly correlated. Hence the parameter estimates will be correlated. Furthermore, significance tests of the estimates are somewhat difficult to perform. Another possibility is to compute the empirical autocorrelation coefficients and then use the Yule-Walker equations to estimate the parameters [Fk]. This procedure can be performed when the time series [Xt] are stationary. If, on the other hand, the time series are non stationary, the series can often be transformed to stationarity e.g., by differencing the series. The estimation procedure is given in Annex A, S A.1. 3.4 Autoregressive integrated moving average (ARIMA) models An extention of the class of autoregressive models which include the moving average models is called autoregressive moving average models (ARMA models). A moving average model of order q is given by: Xt = at - q1at-1 - q2at-2 . . . - qqat-q (3-8) PAGE20 Fascicle II.3 - Rec. E.507 where at is white noise at time t; [qk] are the moving average parameters. Assuming that the white noise term in the autoregressive models in S 3.3 is described by a moving average model, one obtains the so-called ARMA (p, q) model: Xt = F1Xt-1 + F2Xt-2 + . . . + FpXt-p + at - q1at-1 - q2at-2 . . . - qqat-q(3- 9) The ARMA model describes a stationary time series. If the time series is non-stationary, it is necessary to difference the series. This is done as follow: Let Yt be the time series and B the backwards shift operator, then Xt = (1 - B)dYt (3-10) where d is the number of differences to have stationarity. The new model ARIMA (p, d, q) is found by inserting equation (3-10) into equation (3-9). The method for analyzing such time series was developed by G. E. P. Box and G. M. Jenkins [3]. To analyze and forecast such time series it is usually necessary to use a time series program package. As indicated in Figure 1/E.507 a tentative model is identified. This is carried out by determination of necessary transformations and number of autoregressive and moving average parameters. The identification is based on the structure of the autocorrelations and partial autocorrelations. The next step as indicated in Figure 1/E.507 is the estimation procedure. The maximum likelihood estimates are used. Unfortunately, it is difficult to find these estimates because of the necessity to solve a nonlinear system of equations. For practical purposes, a computer program is necessary for these calculations. The forecasting model is based on equation (3-9) and the process of making forecasts l time units ahead is shown in S A.2. The forecasting models described so far are univariate forecasting models. It is also possible to introduce explanatory variables. In this case the system will be described by a transfer function model. The methods for analyzing the time series in a transfer function model are rather similar to the methods described above. Detailed descriptions of ARIMA models are given in [1], [2], [3], [5], [11], [15] and [17]. 3.5 State space models with Kalman Filtering State space models are a way to represent discrete-time process by means of difference equations. The state space modelling approach allows the conversion of any general linear model into a form suitable for recursive estimation and forecasting. A more detailed description of ARIMA state space models can be found in [1]. For a stochastic process such a representation may be of the following form: Xt+1 = FXt + Zt + wt (3-11) and Yt = HXt + nt (3-12) where Xt is an s-vector of state variables in period t, Zt is an s-vector of deterministic events, F is an sxs transition matrix that may, in general, depend on t, wt is an s-vector of random modelling errors, Yt is a d-vector of measurements in period t, H is a dxs matrix called the observation matrix, and nt is a d-vector of measurement errors. Both wt in equation (3-11) and nt in equation (3-12) are additive random sequences with known statistics. The expected value of each sequence is the zero vector and wt and nt satisfy the conditions: E eq \b\bc\[(wtw\s(T,j)) = Qt dtj for all t, j, (3-13) E eq \b\bc\[(ntn\s(T,j)) = Rt dtj for all t, j, where Qt and Rt are nonnegative definite matrices,2) and dtj is the Kronecker delta. 2) A matrix A is nonnegative definite, if and only if, for all vectors z, zTAz ³ 0. Fascicle II.3 - Rec. E.507 PAGE1 Qt is the covariance matrix of the modelling errors and Rt is the covariance matrix of the measurement errors; the wt and the nt are assumed to be uncorrelated and are referred to as white noise. In other words: E eq \b\bc\[(nt w\s(T,j)) = 0 for all t, j, (3-14) and E eq \b\bc\[(nt X\s(T,0)) = 0 for all t. (3-15) Under the assumptions formulated above, determine Xt,t such that: eq E \b\bc\[((Xt,t - Xt)T(Xt,t - Xt)) = minimum, (3-16) where Xt,t is an estimate of the state vector at time t, and Xt is the vector of true state variables. The Kalman Filtering technique allows the estimation of state variables recursively for on-line applications. This is done in the following manner. Assuming that there is no explanatory variable Zt, once a new data point becomes available it is used to update the model: Xt,t = Xt,t-1 + Kt(Yt - HXt,t-1) (3-17) where Kt is the Kalman Gain matrix that can be computed recursively [18]. Intuitively, the gain matrix determines how much relative weight will be given to the last observed forecast error to correct it. To create a k-step ahead projection the following formula is used: Xt+k,t = FkXt,t (3-18) where Xt+k,t is an estimate of Xt+k given observations Y1, Y2, . . ., Yt. Equations (3-17) and (3-18) show that the Kalman Filtering technique leads to a convenient forecasting procedure that is recursive in nature and provides an unbiased, minimum variance estimate of the discrete time process of interest. For further studies see [4], [5], [16], [18], [19] and [22]. The Kalman Filtering works well when the data under examination are seasonal. The seasonal traffic load data can be represented by a periodic time series. In this way, a seasonal Kalman Filter can be obtained by superimposing a linear growth model with a seasonal model. For further discussion of seasonal Kalman Filter techniques see [6] and [20]. 3.6 Regression models The equations (3-1) and (3-2) are typical regression models. In the equations the traffic, Yt, is the dependent (or explanatory) variable, while time t is the independent variable. A regression model describes a linear relation between the dependent and the independent variables. Given certain assumptions ordinary least squares (OLS) can be used to estimate the parameters. A model with several independent variables is called a multiple regression model. The model is given by: Yt = ß0 + ß1X1t + ß2X2t + . . . + ßkXkt + ut (3-19) where Yt is the traffic at time t, ßi, i = 0, 1, . . ., k are the parameters, Xit, ie = 1, 2, . . ., k is the value of the independent variables at time t, ut is the error term at time t. Independent or explanatory variables which can be used in the regression model are, for instance, tariffs, exports, imports, degree of automation. Other explanatory variables are given in S 2 "Base data for forecasting" in Recommendation E.506. Detailed descriptions of regression models are given in [1], [5], [7], [15] and [23]. 3.7 Econometric models Econometric models involve equations which relate a variable which we wish to forecast (the dependent or endogenous variable) to a number of socio-economic variables (called independent or explanatory variables). The form of the equations should reflect an expected casual relationship between the variables. Given an assumed model form, historical or cross sectional data are used to estimate coefficients in the equation. Assuming the model remains valid over time, estimates of future values of the independent variables can be used to give forecasts of the variables of interest. An example of a typical econometric model is given in Annex C. PAGE20 Fascicle II.3 - Rec. E.507 There is a wide spectrum of possible models and a number of methods of estimating the coefficients (e.g., least squares, varying parameter methods, nonlinear regression, etc.). In many respects the family of econometric models available is far more flexible than other models. For example, lagged effects can be incorporated, observations weighted, ARIMA residual models subsumed, information from separate sections pooled and parameters allowed to vary in econometric models, to mention a few. One of the major benefits of building an econometric model to be used in forecasting is that the structure or the process that generates the data must be properly identified and appropriate causal paths must be determined. Explicit structure identification makes the source of errors in the forecast easier to identify in econometric models than in other types of models. Changes in structures can be detected through the use of econometric models and outliers in the historical data are easily eliminated or their influence properly weighted. Also, changes in the factors affecting the variables in question can easily be incorporated in the forecast generated from an econometric model. Often, fairly reliable econometric models may be constructed with less observations than that required for time series models. In the case of pooled regression models, just a few observations for several cross-sections are sufficient to support a model used for predictions. However, care must be taken in estimating the model to satisfy the underlying assumptions of the techniques which are described in many of the reference works listed at the end of this Recommendation. For example the number of independent variables which can be used is limited by the amount of data available to estimate the model. Also, independent variables which are correlated to one another should be avoided. Sometimes correlation between the variables can be avoided by using differenced or detrended data or by transformation of the variables. For further studies see [8], [12], [13], [14] and [21]. 4 Discontinuities in traffic growth 4.1 Examples of discontinuities It may be difficult to assess in advance the magnitude of a discontinuity. Often the influence of the factors which cause discontinuties is spread over a transitional period, and the discontinuity is not so obvious. Furthermore, discontinuities arising, for example, from the introduction of international subscriber dialling are difficult to identify accurately, because changes in the method of working are usually associated with other changes (e.g. tariff reductions). An illustration of the bearing of discontinuities on traffic growth can be observed in the graph of Figure 4/E.507. Discontinuities representing the doubling - and even more - of traffic flow are known. It may also be noted that changes could occur in the growth trend after discontinuities. In short-term forecasts it may be desirable to use the trend of the traffic between discontinuities, but for long-term forecasts it may be desirable to use a trend estimate which is based on long-term observations, including previous discontinuities. In addition to random fluctuations due to unpredictable traffic surges, faults, etc., traffic measurements are also subject to systematic fluctuations, due to daily or weekly traffic flow cycles, influence of time differences, etc. 4.2 Introduction of explanatory variables Identification of e y variables for an econometric model is probably the most difficult aspect of econometric model building. The explanatory variables used in an econometric model identify the main sources of influence on the variable one is concerned with. A list of explanatory variables is given in Recommendation E.506, S 2. Figure 4/E.507 - CCITT 34721 Economic theory is the starting point for variable selection. More specifically, demand theory provides the basic framework for building the general model. However, the description of the structure or the process generating the data often dictate what variables enter the set of explanatory variables. For instance, technological relationships may need to be incorporated in the model in order to appropriately define the structure. Fascicle II.3 - Rec. E.507 PAGE1 Although there are some criteria used in selecting explanatory variables [e.g., eq \x\to(R)2, Durbin-Watson (D-W) statistic, root mean square error (RMSE), ex-post forecast performance, explained in the references], statistical problems and/or availability of data (either historical or forecasted) limit the set of potential explanatory variables and one often has to revert to proxy variables. Unlike pure statistical models, econometric models admit explanatory variables, not on the basis of statistical criteria alone but, also, on the premise that causality is, indeed, present. A completely specified econometric model will capture turning points. Discontinuities in the dependent variable will not be present unless the parameters of the model change drastically in a very short time period. Discontinuities in the growth of telephone traffic are indications that the underlying market or technological structure have undergone large changes. Sustained changes in the growth of telephone demand can either be captured through varying parameter regression or through the introduction of a variable that appears to explain the discontinuity (e.g., the introduction of an advertising variable if advertising is judged to be the cause of the structural change). Once-and-for-all, or step-wise discontinuities, cannot be handled by the introduction of explanatory changes: dummy variables can resolve this problem. 4.3 Introduction of dummy variables In econometric models, qualitative variables are often relevant; to mea impact of qualitative variables, dummy variables are used. The dummy variable technique uses the value 1 for the presence of the qualitative attribute that has an impact on the dependent variable and 0 for the absence of the given attribute. Thus, dummy variables are appropriate to use in the case where a discontinuity in the dependent variable has taken place. A dummy variable, for example, would take the value of zero during the historical period when calls were operator handled and one for the period for which direct dial service is available. Dummy variables are often used to capture seasonal effects in the dependent variable or when one needs to eliminate the effect of an outlier on the parameters of a model, such as a large jump in telephone demand due to a postal strike or a sharp decline due to facility outages associated with severe weather conditions. Indiscriminate use of dummy variables should be discouraged for two reasons: 1) dummy variables tend to absorb all the explanatory power during discontinuties, and 2) they result in a reduction in the degrees of freedom. 5 Assessing model specification 5.1 General In this section methods for testing the significance of the parameters and also methods for calculating confidence intervals are presented for some of the forecasting models given in S 3. In particular the methods relating to regression analysis and time series analysis will be discussed. All econometric forecasting models presented here are described as regression models. Also the curve fitting models given in S 3.1 can be described as regression models. An exponential model given by Zt = aebt . ut (5-1) may be transformed to a linear form ln Zt = ln a + bt + ln ut (5-2) or Yt = ß0 + ß1Xt + at (5-3) where Yt = ln Zt ß0 = ln a ß1 = b Xt = t at = ln ut (white noise). 5.2 Autocorrelation A good forecasting model should lead to small autocorrelated residuals. If the residuals are significantly correlated, the estimated parameters and also the PAGE20 Fascicle II.3 - Rec. E.507 forecasts may be poor. To check whether the errors are correlated, the autocorrelation function rk, k = 1, 2, . . . is calculated. rk is the estimated autocorrelation of residuals at lag k. A way to detect autocorrelation among the residuals is to plot the autocorrelation function and to perform a Durbin-Watson test. The Durbin-Watson statistic is: D-W = eq \f(\i\su(t=2,N, ) (et - et-1)2,\i\su(t=1,N, ) e\s(t2)) (5-4) where et is the estimated residual at time t, N is the number of observations. 5.3 Test of significance of the parameters One way to evaluate the forecasting model is to analyse the impact of different exogenous variables. After estimating the parameters in the regression model, the significance of the parameters has to be tested. In the example of an econometric model in Annex C, the estimated values of the parameters are given. Below these values the estimated standard deviation is given in parentheses. As a rule of thumb, the parameters are considered as significant if the absolute value of the estimates exceeds twice the estimated standard deviation. A more accurate way of testing the significance of the parameters is to take into account the distributions of their estimators. The e correlation coefficient (or coefficient of determination) may be used as a criterion for the fitting of the equation. The multiple correlation coefficient, R2, is given by: eq R2 = \f(\i\su(i=1,N, )(\o(\s\up4(^),Yj) - \x\to(Y))2,\i\su(i=1,N, )(Yi - \x\to(Y))2) (5-5) If the multiple correlation coefficient is close to 1 the fitting is satisfactory. However, a high R2 does not imply an accurate forecast. In time series analysis, the discussion of the model is carried out in another way. As pointed out in S 3.4, the number of autoregressive and moving average parameters in an ARIMA model is determined by an identification procedure based on the structure of the autocorrelation and partial autocorrelation function. The estimation of the parameters and their standard deviations is performed by an iterative nonlinear estimation procedure. Hence, by using a time series analysis computer program, the estimates of the parameters can be evaluated by studying the estimated standard deviations in the same way as in regression analysis. An overall test of the fitting is based on the statistic QN-d = eq \i\su(i=1,N, ) ri2 (5-6) where ri is the estimated autocorrelation at lag i and d is the number of parameters in the model. Wh n the model is adequate, QN-d is approximately chi-square distributed with N - d degrees of freedom. To test the fitting, the value QN-d can be compared with fractiles of the chi-square distribution. 5.4 Validity of exogenous variables Econometric forecasting models are based on a set of exogenous variables which explain the development of the endogenous variable (the traffic demand). To make forecasts of the traffic demand, it is necessary to make forecasts of each of the exogenous variables. It is very important to point out that an exogenous variable should not be included in the forecasting model if the prediction of the variable is less confident than the prediction of the traffic demand. Suppose that the exact development of the exogenous variable is known which, for example, is the case for the simple models where time is the explanatory variables. If the model fitting is good and the white noise is normally distributed with expectation equal to zero, it is possible to calculate confidence limits for the forecasts. This is easily done by a computer program. On the other hand, the values of most of the explanatory variables cannot be predicted exactly. The confidence of the prediction will then decrease with the number of periods. Hence, the explanatory variables will cause the confidence interval of the forecasts to increase with the number of the forecast periods. In these situations it is difficult to calculate a confidence interval around the forecasted values. If the traffic demand can be described by an autoregressive moving average model, no explanatory variables are included in the model. Hence, if there are no explanatory variable in the model, the confidence limits of the forecasting values can be calculated. This is done by a time series analysis program package. Fascicle II.3 - Rec. E.507 PAGE1 5.5 Confidence intervals Confidence intervals, in the context of forecasts, refer to statistical constructs of forecast bounds or limits of prediction. Because statistical models have errors associated with them, parameter estimates have some variability associated with their values. In other words, even if one has identified the correct forecasting model, the influence of endogenous factors will cause errors in the parameter estimates and the forecast. Confidence intervals take into account the uncertainty associated with the parameter estimates. In causal models, another source of uncertainty in the forecast of the series under study are the predictions of the explanatory variables. This type of uncertainty cannot be handled by confidence intervals and is usually ignored, even though it may be more significant than the uncertainty associated with coefficient estimates. Also, uncertainty due to possible outside shocks is not reflected in the confidence intervals. For a linear, static regression model, the confidence interval of the forecast depends on the reliability of the regression coefficients, the size of the residual variance, and the values of the explanatory variables. The 95% confidence interval for a forecasted value YN+1 is given by: eq \o(\s\up4(^),Y)N(1) - 2eq \o(\s\up4(^),s) YN+1 eq \o( \s\up4(^),Y)N(1) + 2eq \o(\s\up4(^),s)(5-7) where eq \o(\s\up4(^),Y)N(1) is the forecast one step ahead and s¢ is the standard error of the forecast. This says that we expect, with a 95% probability, that the actual value of the series at time N + 1 will fall within the limits given by the confidence interval, assuming that there are no errors associated with the forecast of the explanatory variables. 6 Comparison of alternative forecasting models 6.1 Diagnostic check - Model evaluation Tests and diagnostic checks are important elements in the model building procedure. The quality of the model is characterized by the residuals. Good forecasting models should lead to small autocorrelated residuals, the variance of the residuals should not decrease or increase and the expectation of the residuals should be zero or close to zero. The precision of the forecast is affected by the size of the residuals which should be small. In addition the confidence limits of the parameter estimates and the forecasts should be relatively small. And in the same way, the mean square error should be small compared with results from other models. 6.2 Forecasts of levels versus forecasts of changes Many econometric models are estimated using levels of the dependent and independent variables. Since economic variables move together over time, high coefficients of determination are obtained. The collinearity among the levels of the explanatory variables does not present a problem when a model is used for forecasting purposes alone, given that the collinearity pattern in the past continues to exist in the future. However, when one attempts to measure structural coefficients (e.g., price and income elasticities) the collinearity of the explanatory variables (known as multicollinearity) renders the results of the estimated coefficients unreliable. To avoid the multicollinearity problem and generate benchmark coefficient estimates and forecasts, one may use changes of the variables (first difference or first log difference which is equivalent to a percent change) to estimate a model and forecast from that model. Using changes of variables to estimate a model tends to remove the effect of multicollinearity and produce more reliable coefficient estimates by removing the common effect of economic influences on the explanatory variables. By generating forecasts through levels of and changes in the explanatory variables, one may be able to produce a better forecast through a reconciliation process. That is, the models are adjusted so that the two sets of forecasts give equivalent results. 6.3 Ex-post forecasting Ex-post forecasting is the generation of a forecast from a model estimated over a sub-sample of the data beginning with the first observation and ending several periods prior to the last observation. In ex-post forecasting, actual values of the explanatory variables are used to generate the forecast. Also, if forecasted values of the explanatory variables are used to produce an ex-post forecast, one can then measure the error associated with incorrectly forecasted PAGE20 Fascicle II.3 - Rec. E.507 explanatory variables. The purpose of ex-post forecasting is to evaluate the forecasting performance of the model by comparing the forecasted values with the actuals of the period after the end of the sub-sample to the last observation. With ex-post forecasting, one is able to assess forecast accuracy in terms of: 1) percent deviations of forecasted values from actual values, 2) turning point performance, 3) systematic behaviour of deviations. Deviations of forecasted values from actual values give a general idea of the accuracy of the model. Systematic drifts in deviations may provide information for either re-specifying the model or adjusting the forecast to account for the drift in deviations. Of equal importance in evaluating forecast accuracy is turning point performance, that is, how well the model is able to forecast changes in the movement of the dependent variable. More criteria for evaluating forecast accuracy are discussed below. 6.4 Forecast performance criteria A model might fit the historical data very well. However, when the forecasts are compared with future data that are not used for estimation of parameters, the fit might not be so good. Hence comparison of forecasts with actual observations may give additional information about the quality of the model. Suppose we have the time series, Y1, Y2, . . . ., YN, YN+1, . . . ., YN+M. The M last observations are removed from the time series and the model building procedure. The one-step-ahead forecasting error is given by: eN+t = YN+t - eq \o(\s\up4(^),Y)N+t-1(1) t = 1, 2, . . . , M(6-1) where eq \o(\s\up4(^),Y)N+t-1(1) is the one-step-ahead forecast. Mean error The mean error, ME, is defined by ME = eq \f(1,M) \i\su(t=1,M, )eN+t (6-2) ME is a criterium for forecast bias. Since the expectation of the residuals should be zero, a large deviation from zero indicates bias in the forecasts. Mean percent error The mean percent error, MPE, is defined by MPE = eq \f(100,M) \i\su(t=1,M, ) \f( en+t, YN+t) (6-3) This statistic also indicates possible bias in the forecasts. The criterium measures percentage deviation in the bias. It is not recommended to use MPE when the observations are small. Fascicle II.3 - Rec. E.507 PAGE1 Root mean square error The root mean square error, RMSE, of the forecast is defined as RMSE = eq \b\bc\[(\f(1,M) \i\su(t=1,M, )e\s(2,N+t))\s\up12(1/2) (6-4) RMSE is the most commonly used measure for forecasting precision. Mean absolute error The mean absolute error, MAE, is given by MAE = eq \f(1,M) \i\su(t=1,M, ) \x\le\ri(eN+t) (6-5) Theil's inequality coefficient Theil's inequality coefficient is defined as follows: U = eq \b\bc\[(\i\su(t=1,M, ) \f(e\s(2,N+t),Y\s(2,N+t)))\s\up20(1/2)( 6-6) Theil's U is preferred as a measure of forecast accuracy because the error between forecasted and actual values can be broken down to errors due to: 1) central tendency, 2) unequal variation between predicted and realized changes, and 3) incomplete covariation of predicted and actual changes. This decomposition of prediction errors can be used to adjust the model so that the accuracy of the model can be improved. Another quality that a forecasting model must possess is ability to capture turning points. That is, a forecast must be able to change direction in the same time period that the actual series under study changes direction. If a model is estimated over a long period of time which contains several turning points, ex-post forecast analysis can generally detect a model's inability to trace closely actuals that display turning points. 7 Choice of forecasting model 7.1 Forecasting performance Although the choice of a forecasting model is usually guided by its forecasting performance, other considerations must receive attention. Thus, the length of the forecast period, the functional form, and the forecast accuracy of the explanatory variables of an econometric model must be considered. The length of the forecast period affects the decision to use one type of a model versus another, along with historical data limitations and the purpose of the forecasting model. For instance, ARIMA models may be appropriate forecasting models for short-term forecasts when stability is not an issue, when sufficient historical data are available, and when causality is not of interest. Also, when the structure that generates the data is difficult to identify, one has no choice but to use a forecasting model which is based on historical data of the variable of interest. The functional form of the model must also be considered in a forecasting model. While it is true that a more complex model may reduce the model specification error, it is also true that it will, in general, considerably increase the effect of data errors. The model form should be chosen to recognize the trade-off between these sources of error. Availability of forecasts for explanatory variables and their reliability record is another issue affecting the choice of a forecasting model. A superior model using explanatory variables which may not be forecasted accurately can be inferior to an average model whose explanatory variables are forecasted accurately. When market stability is an issue, econometric models which can handle structural changes should be used to forecast. When causality matters, simple models or ARIMA models cannot be used as forecasting tools. Nor can they be used when insufficient historical data exist. Finally, when the purpose of the model is to forecast the effects associated with changes in the factors that influence the variable in question, time series models may not be appropriate (with the exception, of course, of transfer function and multiple time series models). 7.2 Length of forecast period For normal extensions of switching equipment and additions of circuits, a forecast period of about six years is necessary. However, a longer forecast period may be necessary for the planning of new cables or other transmission media or for major plant installations. Estimates in the long term would necessarily be less accurate than short-term forecasts but that would be acceptable. In forecasting with a statistical model, the length of the forecast period is entirely determined by: PAGE20 Fascicle II.3 - Rec. E.507 a) the historical data available, b) the purpose or use of the forecast, c) the market structure that generates the data, d) the forecasting model used, e) the frequency of the data. The historical data available depends upon the period over which it has been collected and the frequency of collection (or the length of the period over which data is aggregated). A small historical data base can only support a short prediction interval. For example, with 10 or 20 observations a model can be used to forecast 4-5 periods past the sample (i.e. into the future). On the other hand, with 150-200 observations, potentially reliable forecasts can be obtained for 30 to 50 periods past the sample - other things being equal. Certainly, the purpose of the forecast affects the number of predicted periods. Long range facility planning requires forecasts extending 15-20 or more years into the future. Rate change evaluations may only require forecasts for 2-3 years. Alteration of routing arrangements could only require forecasts extending a few months past the sample. Stability of a market, or lack thereof, also affect the length of the forecast period. With a stable market structure one could conceivably extend the forecast period to equal the historical period. However, a volatile market does not afford the same luxury to the forecaster; the forecast period can only consist of a few periods into the future. The forecasting models used to generate forecasts do, by their nature, influence the decision on how far into the future one can reasonably forecast. Structural models tend to perform better than other models in the long run, while for short-run predictions all models seem to perform equally well. It should be noted that while the purpose of the forecast and the forecasting model affect the length of the forecast, the number of periods to be forecasted play a crucial role in the choice of the forecasting model and the use to which a forecast is put. ANNEX A (to Recommendation E.507) Description of forecasting procedures A.1 Estimation of autoregressive parameters The empirical autocorrelation at lag k is given by: rk = eq \f( vk,v0) (A-1) where vk = eq \f( 1, N - 1) N-kt = 1 (Xt - \x\to(X)) (Xt+k - \x\to(X)) (A-2) and eq \x\to(X) = eq \f(1,N) \i\su(t=1,N, ) Xt (A-3) N being the total number of observations. The relation between [rk] and the estimates [eq \o(\s\up4(^),F)k] of [Fk] is given by the Yule-Walker equations: eq \a\al(r1 = \o(\s\up4(^),F)1 + \o(\s\up4(^),F)2r1 + . . . + \o(\s\up4(^),F)prp-1 ,r2 = \o(\s\up4(^),F)1r1 + \o(\s\up4(^),F)2r2 . . . \o(\s\up4(^),F)prp-2,.,.,.,rp = \o(\s\up4(^),F)1rp-1 + \o(\s\up4(^),F)2rp-2 + . . . + \o(\s\up4(^),F)p) (A-4) Hence the estimators [eq \o(\s\up4(^),F)k] can be found by solving this system of equations. For computations, an alternative to directly solving the equations is the following recursive procedure. Let [eq \o(\s\up4(^),F)k, j]j be estimators of the parameters at lag j = 1, 2, . . ., given that the total number of parameters are k. The estimators [eq \o(\s\up4(^),F)k+1, j]j are then found by eq \o(\s\up4(^),F)k+1, k+1 = \f(rk+1 \i\su(j=1,k, ) \o(\s\up4(^),F)k;j r k+1-j,1 - \i\su(j=1,k, ) \o(\s\up4(^),F)k;j rj) (A-5) eq \o(\s\up4(^),F)k+1, j = \o(\s\up4(^),F)kj - \o(\s\up4(^),F)k+1, k+1 \o(\s\up4(^),F)k,k-j+1 j = 1, 2, . . ., k (A-6) Defining eq \o(\s\up4(^),F)p, j = \o(\s\up4(^),F)j, j = 1, 2, . . ., p, forecast of the traffic demand at time t+1 is expressed by: eq Xt+1 = \o(\s\up4(^),F)1Xt + \o(\s\up4(^),F)2Xt-1 + . . . + \o(\s\up4(^),F)pXt-p (A-7) Fascicle II.3 - Rec. E.507 PAGE1 A.2 Forecasting with ARIMA models The forecast l time units ahead is given by: eq \a\ac(\o(\s\up4(^),X)t(l) = \o(\s\up4(^),F)1 [Xt+l-1] + \o(\s\up4(^),F)2 [Xt+l-2] ,+ . . . + \o(\s\up4(^),F)p[Xt+l-p], + [at+l] - \o(\s\up4(^),q)1 [at+l-1],- \o(\s\up4(^),q)2[at+l-2] - . . . - \o(\s\up4(^),q)q[at+l-q]) (A-8) where eq \o(\s\up4(^),[X)j]= eq \a\al(\o(\s\up4(^),X)t(j -t) if j > t,Xj if j £ t) (A-9) [aj] = eq \a\al(0 if j > t ,Xj - \o(\s\up4(^),X)j if j £ t) (A-10) which means that [Xj] is defined as a forecast when j > t and otherwise as an actual observation and that [aj] is defined as 0 when j > t since white noise has expectation 0. If the observations are known (j £ t), then [aj] is equal to the residual. ANNEX B (to Recommendation E.507) Kalman Filter for a linear trend model To model telephone traffic, it is assumed that there are no deterministic changes in the demand pattern. This situation can be modelled by setting the deterministic component Zt to zero. Then the general state space model is: Xt+1 = Xt + wt (B-1) Yt = HXt + nt where Xt is an s-vector of state variables in period t, Yt is a vector of measurements in year t, j is an sxs transition matrix that may, in general, depend on t, and wt is an s-vector of random modelling errors, nt is the measurement error in year t. For modelling telephone traffic demand, adapt a simple two-state, one-data variable model defined by: Xt+1 = eq \b\bc\[(\a(xt+1,\o(\s\up4(·),x)t+1)) = eq \b\bc\[(\a(1 0,1 1)) eq \b\bc\[(\a(xt,\o(\s\up4(·),x)t)) + eq \b\bc\[(\a(wt,\o(\s\up4(·),w)t)) (B-2) and yt = xt + nt (B-3) where xt is the true load in year t, eq \o(\s\up4(·),xt) is the true incremental growth in year t, yt is the measured load in year t, nt is the measurement error in year t. Thus, in our model j = eq \b\bc\[(\a(1 1,0 1)) , and H = 1. (B-4) The one-step-ahead projection is written as follows: Xt+1,t = eq \b\bc\[(\a(xt+1.t,\o(\s\up4(·),x)1.t)) = eq \b\bc\[(\a(1 1,0 1)) eq \b\bc\[(\a(xt.t,\o(\s\up4(·),x)t.t)) = eq \b\bc\[(\a(1 0,1 1)) eq \b\bc\[(\a(xt.t-1 + at(yt - xt.t-1),\o(\s\up4(·),x)t.t-1 + ßt(yt - xt\,t-1))) (B-5) where Xt+1,t is the projection of the s e variable in period t + 1 given observations through year t. The at and ßt coefficients are the Kalman gain matrices in year t. Rewriting the above equation yields: xt,t = (1-at)xt,t-1 + atyt (B-6) and eq \o(\s\up4(·),x)t,t = (1-ßt)eq \o(\s\up4(·),xt,t - 1) + ßt(yt - xt-1,t-1) (B-7) The Kalman Filter creates a linear trend for each time series being forecast based on the current observation or measurement of traffic demand and the previous year's forecast of that demand. The observation and forecasted traffic load are combined to produce a smoothed load that corresponds to the level of the process, and a smoothed growth increment. The Kalman gain values at and ßt can be either fixed or adaptive. In [16] Moreland presents a method for selecting fixed, robust parameters that provide adequate performance independent of system noise, measurement error, and initial conditions. For further details on the proper selection of these parameters see [6], [20] and [22]. PAGE20 Fascicle II.3 - Rec. E.507 ANNEX C (to Recommendation E.507) Example of an econometric model To illustrate the workings of an econometric model, we have chosen the model of United States billed minutes to Brazil. This model was selected among alternative models for three reasons: a) to demonstrate the introduction of explanatory variables, b) to point out difficulties associated with models used for both the estimation of the structure and forecasting purposes, and c) to show how transformations may affect the results. The demand of United States billed minutes to Brazil (MIN) is estimated by a log-linear equation which includes United States billed messages to Brazil (MSG), a real telephone price index (RPI), United States personal income in 1972 prices (YP72), and real bilateral trade between the United States and Brazil (RTR) as explanatory variables. This model is represented as: ln(MIN)t = ß0 + ß1 ln(MSG)t + ß2 ln(RPI)t + ß3 ln(YP72)t + ß4 ln(RTR)t + ut (C-1) where ut is the error term of the regression and where, ß1 > 0, ß2 < 0, ß3 > 0 and ß4 > 0 are expected values. Using ridge regression to deal with severe multicollinearity problems, we estimate the equation over the 1971 : 1 (i.e. first quarter of 1971) to 1979 : 4 interval and obtain the following results: ln(MIN)t = -3.489 + (0.619) ln(MSG)t - (0.447) ln(RPI)t + (1.166) ln(YP72)t + (0.281) ln(RTR)t In(MIN)t = -3.489 + (0.035) ln(MSG)t - (0.095) ln(RPI)t + (0.269) ln(YP72)t + (0.084) (C-2) eq \x\to(R)2 = 0.985, SER = 0.083, D-W = 0.922, k = 0.10 (C-3) where eq \x\to(R)2 is the adjusted coefficient of determination, SER is the standard error of the regression, D-W is the Durbin-Watson statistic, and k is the ridge regression constant. The values in parentheses under the equation are the estimated standard deviation of the estimated parameters eq \o(\s\up4(^),ß)1, eq \o(\s\up4(^),ß)2, eq \o(\s\up4(^),ß)3, eq \o(\s\up4(^),ß)4. The introduction of messages as an explanatory variable in this model was necessitated by the fact that since the mid-seventies transmission quality has improved and completion rates have risen while, at the same time, the strong growth in this market has begun to dissipate. Also, the growth rates for some periods could not have been explained by rate activity on either side or real United States personal income. The behaviour of the message variable in the minute equation was able to account for all these factors. Because the model serves a dual purpose - namely, structure estimation and forecasting - at least one more variable is introduced than if the model were to be used for forecasting purposes alone. The introduction of additional explanatory variables results in severe multicollinearity and necessitates employing ridge regression which lowers eq \x\to(R)2 and the Durbin-Watson statistic. Consequently, the predictive power of the model is reduced somewhat. The effect of transforming the variables of a model are shown in the ex-post forecast analysis performed on the model of United States billed minutes to Brazil. The deviations using levels of the variables are larger than those of the logarithms of the variables which were used to obtain a better fit (the estimated RMSE for the log-linear regression model is 0.119 827). The forecast results in level and logarithmic form are shown in Table C-1/E.507. TABLE C-1/E.507 Logarithms Levels Forecast Actual % Forecast Actual % deviation deviation 1980: 1 14.858 14.938 -0.540 2 836 269 3 073 697 - 7.725 2 14.842 14.972 -0.872 2 791 250 Fascicle II.3 - Rec. E.507 PAGE1 3 180 334 -12.234 3 14.916 15.111 -1.296 3 005 637 3 654 092 -17.746 4 14.959 15.077 -0.778 3 137 698 3 529 016 -11.089 1981: 1 15.022 15.102 -0.535 3 341 733 3 621 735 - 7.731 2 14.971 15.141 -1.123 3 175 577 3 762 592 -15.601 3 15.395 15.261 PAGE20 Fascicle II.3 - Rec. E.507 -0.879 4 852 478 4 244 178 14.333 4 15.405 15.302 -0.674 4 901 246 4 421 755 -10.844 1982: 1 15.365 15.348 -0.110 4 709 065 4 630 238 - 1.702 2 15.326 15.386 -0.387 4 528 947 4 807 901 - 5.802 References [1] ABRAHAM (A.) and LEDOLTER (J.): Statistical methods for forecasting J. Wiley, New York, 1983. [2] ANDERSON (O. D.): Time series analysis and forecasting. The Box-Jenkins approach. Butterworth, London, 1976. [3] BOX (G. E. P.) and JENKINS (G. M.): Time Series Analysis: Forecasting and Control, Holden-Day, San Francisco, 1976. [4] BROWN (R. G.): Introduction to random signal analysis and Kalman Filtering. John Wiley & Sons, New York, 1983. [5] CCITT: Manual planning data and forecasting methods, Vol. I and II, ITU, Geneva, 1988. [6] CHEMOUIL (P.) and GARNIER (B.): An Adaptive Short-Term Traffic Forecasting Procedure Using Kalman Filtering. ITC 11, Tokyo, 1985. [7] DRAPER (N.) and SMITH (H.): Applied Regression Analysis, Second Edition, Fascicle II.3 - Rec. E.507 PAGE1 John Wiley & Sons, New York, 1981. [8] DUTTA (M.): Econometric Methods, South-Western Publishing Co., Cincinnati, 1975. [9] GARDNER (E. S. Jr.): Exponential smoothing the state of art. Journal of forecasting, 4, pp. 1-28, 1985. [10] GILCHRIST W.: Statistical forecasting. John Wiley & Sons, New York, 1976. [11] GRANGER (C. W. J.) and NEWBOLD (P.): Forecasting Economic Time Series, Academic Press, New York, 1977. [12] JOHNSTON (J.): Econometric Methods, Second Edition, McGraw-Hill, New York, 1972. [13] JUDGE (G. G.) et al.: The Theory and Practice of Econometrics, John Wiley & Sons, New York, 1980. [14] KMENTA (J.): Elements of Econometrics, Macmillan Publishing Company, New York, 1971. [15] MAKRIDAKIS (S.), WHEELWRIGHT (S. C.), McGEE (V. .E.): Forecasting methods and applications Second Edition. John Wiley & Sons, New York, 1983. [16] MORELAND (J. P.): A robust sequential projection algorithm for traffic load forecasting. The Bell Technical Journal, Vol. 61, No. 1, 1982. [17] NELSON (C. R.): Applied Time Series Analysis for Managerial Forecasting, Holden-Day, San Francisco, 1973. [18] PACK (C. D.) and WHITAKER (B. A.): Kalman Filter models for network forecasting. The Bell Technical Journal, Vol. 61, No. 1, pp. 1-9, 1982. [19] SORENSON (H. W.): Kalman filtering techniques. Advances in control systems theory and applications. Academic Press, Vol. 3, pp. 219-292, 1966. [20] SZELAG (C. R.): A short-term forecasting algorithm for trunk demand servicing. The Bell Technical Journal, Vol. 61, No. 1, pp. 67-96, 1982. [21] THEIL (H.): Principles of Econometrics, John Wiley & Sons, New York, 1971. [22] TOME (F. M.) and CUNHA (J. A.): Traffic forecasting with a state space model. ITC 11, Tokyo, 1985. [23] WONNACOTT (T. H.) and WONNACOTT (R. J.): Regression. John Wiley & Sons, New York, 1981. Bibliography PINDYCK (R. S.) and RUBINFELD (D. F.): Econometric Models and Econometric Forecasts, McGraw-Hill, New York, 1981. SASTRI, (T.): A state space modelling approach for time series forecasting. Management Science, Vol. 31, No. 11, pp. 1451-1470, 1985. PAGE20 Fascicle II.3 - Rec. E.507