FOREWORD The CCITT (the International Telegraph and Telephone Consultative Committee) is the permanent organ of the International Telecommunication Union (ITU). CCITT is responsible for studying technical, operating and tariff questions and issuing recommendations on them with a view to standardizing telecommunications on a worldwide basis. The Plenary Assembly of CCITT which meets every four years, establishes the topics for study and approves Recommendations prepared by its Study Groups. The approval of Recommendations by the members of CCITT between Plenary Assemblies is covered by the procedure laid down in CCITT Resolution No. 2 (Melbourne, 1988). Recommendation V.42 bis was prepared by Study Group XVII and was approved under the Resolution No. 2 procedure on the 31st of January 1990. F ITU 1990 All rights reserved. No part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from the ITU. Recommendation V.42 bis DATA COMPRESSION PROCEDURES FOR DATA CIRCUIT TERMINATING EQUIPMENT (DCE) USING ERROR CORRECTING PROCEDURES The CCITT, considering (a) that the use of V-Series DCEs for transmission of asynchronous data on the general switched telephone network (GSTN) is widespread; (b) that Recommendation V.42 [1] defines error correction procedures providing improved error performance; (c) that improved throughput is possible through the use of data compression procedures; (d) that there is a need to interwork with DCEs not providing data compression; declares the view that the data compression procedures to be followed by DCEs using the error correcting procedures defined in Recommendation V.42 be as specified in this Recommendation. 1 Scope 1.1 General This Recommendation describes a data compression procedure for use wi h V- Series DCEs. The principal characteristics of the data compression procedure are: a) a compression procedure based on an algorithm which encodes strings of characters received from data terminal equipment (DTE) b) a decoding procedure which recovers the strings of characters from received codewords; c) an automatic transparent mode of operation when uncompressible data is detected. An exploration of the parameters used in this Recommendation is given in S 10. 1.2 Requirements for error correcting procedures Recommendation V.42 bis PAGE7 For correct operation of the data compression function it is necessary that an error correcting procedure be implemented between the two entities using this Recommendation. In the case of V-series Recommendations, this requires that the LAPM (link access procedure for modems) error correcting procedures defined in Recommendation V.42 or the error correcting procedures in Recommendation V.120 [2] be implemented. Note - Undetected bit errors will cause misoperation of the data compression function. Use of a 32-bit frame check sequence (FCS) as defined in ISO 3309 [3] substantially reduces the possibility of such errors. It may therefore be desirable to use this 32-bit FCS (which is an option in V.42 LAPM) in environments with severe impairments. 1.3 A DCE employing data compression The data compression function may be used with an error-correcting DCE, as shown in Figure 1/V.42 bis. The elements of an error correcting V-series DCE are specified in Recommendation V.42. PAGE6 Recommendation V.42 bis FIGURE 1/V.42 bis DCE employing data compression and error control 2 Definitions 2.1 character Single data element, encoded using a predefined number of bits (N3 = 8). 2.2 start-stop or asynchronous format Start-stop or asynchronous format is defined in Recommendations V.7 [4] and V.14 [5]. 2.3 ordinal value The ordinal value of a character is the numerical equivalent of the binary encoding of the character. For example, the character "A" when encoded as 01000001 would have an ordinal value of 6510. 2.4 alphabet Set of all possible characters which may be sent or received across the DTE/DCE interface. It is assumed in this Recommendation that the ordinal values of the alphabet are contiguous from 0 to N4 - 1, where N4 is the number of characters. 2.5 codeword A codeword, within the context of this Recommendation, is a binary number in the range 0 to N2 - 1 which represents a string of characters in compressed form. A codeword is encoded using a number of bits C2, where C2 is initially 9 (i.e. N3 + 1) and increases to a maximum of N1 bits (see S 7). 2.6 control codeword A control codeword is reserved for use in DCE-to-DCE signalling of control information related to the compression function whilst in the compressed mode of operation (see S 9). 2.7 command code Octet which is used for DCE-to-DCE signalling of control information related to the compression function whilst in the transparent mode of operation. Command codes are distinguished from normal characters by a preceding escape character (see S 2.13). Recommendation V.42 bis PAGE7 2.8 tree structure Abstract data structure which is used in this Recommendation to represent a set of strings with the same initial character (see Figure 2/V.42 bis and S 6.1). 2.9 leaf node Point on a tree which represents, within the context of this Recommendation, the last character in a string (see S 6.1). 2.10 root node A root node is a point on a tree which represents, within the context of this Recommendation, the first character in a string (see Figure 2/V.42 bis and S 6.1). 2.11 compressed operation Compressed operation has two modes. Transitions between the modes may be automatic based on the content of the data received from the DTE (see S 7.1). 2.11.1 compressed mode A mode of operation in which data from the DTE is transmitted in codewords. 2.11.2 transparent mode A mode of operation in which compression has been selected but data is being transmitted in uncompressed form. Transparent mode command code sequences may be inserted into the data stream. 2.12 uncompressed operation A mode of operation in which compression has not been selected. The data compression function is inactive. 2.13 escape character Within the context of this Recommendation, the escape character is a character which, in transparent mode, indicates the beginning of a command code sequence. This has an initial value of zero, and is adjusted on each appearance of the escape character in the data stream from the DTE, whether in transparent mode or compressed mode (see S 9.2). 3 Abbreviations The abbreviations introduced in this Recommendation are: EID Escape in Data, a command code defined in S 9. ETM Enter Transparent Mode, a control codeword defined in S 9. ECM Enter Compressed Mode, a command code defined in S 9. 4 Overview of the operation of a DCE incorporating a data compression function 4.1 General A DCE employing data compression, as depicted in Figure 1/V.42 bis, contains the following components: a) DTE/DCE interchange circuits; b) a signal converter; c) a control function; PAGE6 Recommendation V.42 bis d) an error control function; and e) a data compression function. The control function shall have additional capabilities beyond those needed for an error correcting DCE as described in Recommendation V.42. The additional capabilities of the control function are described in S 5, and the operation of the data compression function is described in SS 6 to 9. The remainder of this section provides an overview of the control function and data compression function. 4.2 Overview of the control function The control function shall perform, in addition to the functions defined in S 6.2 of Recommendation V.42, the following aspects of operation; a) negotiation of the presence of the data compression function in the remote DCE, and of parameters associated with the operation of the data compression function; b) initialization or re-initialization of the data compression function; c) coordination of the establishment of an error controlled connection for use by the peer data compression functions; d) coordination of the delivery of data between the DTE/DCE interface and the data compression function, in accordance with the procedures defined in Recommendation V.42, SS 6.2 and 8.4, including the provision of the flow control procedures defined therein; e) coordination of the delivery of data between the data compression function and the error control function; f) action on detection of an exception condition. 4.3 Overview of the data compression function The data compression function shall implement the procedures defined in this Recommendation, which result in the efficient encoding of data prior to transmission over the error controlled connection, and shall have the following capabilities: a) initialization of the data compression function; b) data compression encoding and decoding; c) a mechanism for switching between compressed and transparent modes of operation. 4.4 Communication between the control function and the data compression function Communication between the control function and the data compression function is modelled as a set of abstract primitives of the form X-NAME-TYPE which represent the logical exchange of information and control to accomplish a task or service. In the context of this Recommendation the control function is viewed as the "service user" while the data compression function is viewed as the "service provider". The types of primitive are request, indication, response and confirm. The services expected by the control function are shown in Table 1/V.42 bis. 5 Operation of the control function 5.1 Negotiation of the data compression function The use of the data compression function and the associated parameters shall be negotiated at link establishment via a protocol (for example, using the XID procedure defined in Recommendation V.42), following which they remain unchanged for the duration of the error corrected connection. Recommendation V.42 bis PAGE7 TABLE 1/V.42 bis Services expected by the control function Service Primitive S Initialize the data compression function C-INIT 5.2, 5.6 Indicate an error to the control function C-ERROR 5.8 Transfer uncompressed data to/from the data C-DATA 5.4 compression function Transfer compressed data to/from the data C-TRANSFER 5.5 compression function Flush remaining untransmitted data from C-FLUSH 5.7 the encoder Parameter P0 specifies whether or not compression is to be used. This parameter also specifies the directions (transmit only, receive only, or both directions). The default value of P0 is 0, indicating no compression in either direction. If compression is proposed for only one direction, then the only valid response is for the proposed direction or no compression. If compression is proposed for both directions, then valid responses are for both directions, for either single direction, or for no compression. Parameter P1 represents a proposed value of N2 the total number of codewords. P1 shall have a default value of 512, which is its minimum value; a maximum value is not specified within this Recommendation. Any attempt to specify less than the minimum value shall be considered a procedural error and result in disconnection. When values of P1 a exchanged during the negotiation procedure in one or both directions of transmission, the lower value shall be selected and assigned to N2 in both DCEs. Note - See Appendix II for guidance on the choice of value of N2, and its effect on performance. Parameter P2 is the proposed value for N7, the maximum string length. The default value of P2 is 6, and the permitted range is from 6 to 250. The values outside this range are invalid; and attempt to specify such values shall be regarded as a procedural error and result in disconnection. When values of P2 are exchanged during the negotiation procedure, the lower value shall be selected and assigned to N7 in both DCEs. 5.2 Initialization of the data compression function Following successful negotiation of data compression parameters, the control function shall issue the C-INIT request primitive to the data compression function. The primitive shall indicate the values of the negotiated parameters. 5.3 Connection establishment On receipt of the C-INIT confirm primitive from the data compression function, the control function shall indicate to the DTE that data transfer may commence. data data compression function On completion of connection establishment, the control function shall request encoding of the data received on the DTE/DCE interface. PAGE6 Recommendation V.42 bis To encode data, the control function shall issue a C-DATA request primitive to the data compression function. This primitive shall indicate the data to be encoded. On receipt of a C-DATA indication primitive from the data compression function, the control function shall deliver the decoded data to the DTE/DCE interface. Flow control procedures will be necessary in order to avoid potential loss of data due to buffer overflow. When the procedures defined in this Recommendation are used in conjunction with those defined in Recommendation V.42, the flow control procedures defined in Recommendation V.42, SS 7.3.1 and 8.4.2, shall be applied. 5.5 Coordination of the transfer of data between the data compression function and error control function On receipt of a C-TRANSFER indication primitive from the data compression function, the control function shall issue an L-DATA request primitive to the error control function. On receipt of an L-DATA indication primitive from the error control function, the control function shall issue a C-TRANSFER request primitive to the data compression function. 5.6 Reinitialization of the data compression function The control function shall issue a C-INIT request to the data compression function on the following conditions: a) L-ESTABLISH indication or confirm; b) L-SIGNAL indication or confirm, where the primitive indicates a destructive form. It is the responsibility of the control functions to ensure that C-INIT request primitives are issued only when no data is in transit between the data compression functions (e.g. in the error control functions) to ensure synchronization between the encoders and decoders. 5.7 Expedited data transfer Certain conditions, the specification of which is outside the scope of this Recommendation, may arise which require that any partially encoded data is transferred immediately, for example if the error control function is in an idle condition. If such a condition arises, the control function shall issue a C-FLUSH request primitive to the data compression function, and shall then transfer remaining data in accordance with S 5.5. 5.8 Action on detection of C-ERROR The C-ERROR indication is used to inform the control function that an error (for example, a procedural error or loss of synchronization) has been detected by the data compression function. The control function shall take appropriate recovery action, including re-establishment of the error corrected connection. The following conditions recognized by the decoder result in the generation of a C-ERROR indication primitive: a) receipt of a STEPUP codeword when it would cause the value of C2 to exceed N1; b) receipt of a codeword, at any time, equal to C1; c) receipt of a codeword representing an empty dictionary entry; d) receipt of a reserved command code. 6 Procedures for dictionary use and maintenance 6.1 General The data compression function employs an algorithm in which a string of characters read from the DTE is encoded as a fixed length codeword. The process employs dictionaries, in which the strings are stored, and which are dynamically updated during normal operation. Recommendation V.42 bis PAGE7 The data compression function contains two dictionaries, one maintained by the data compression encoder for use in compression of data received from the DTE, and one maintained by the data compression decoder for use in the decoding of data received from the error control function. The dictionary functions are: a) string matching, in which a sequence of characters is read from the DTE, and the dictionary searched for the resulting string (see S 6.3); b) updating, in which a new string is added to the dictionary (see S 6.4); c) the deletion of infrequently used strings in order that storage capacity may be reused (S 6.5). The dictionary used to store strings for use in the encoding and decoding process may be logically represented using an abstract data structure. The dictionary can be considered to contain a set of trees, as shown in Figure 2/V.42 bis, each with a root corresponding to a character in the alphabet. Wi h the 8- bit character format there will be 256 trees. A tree represents the set of known strings beginning with one specific character, and each node or point in the tree represents one of this set of strings. The trees in Figure 2/V.42 bis represent the strings A, B, BA, BAG, BAR, BAT, BI, BIN, C, D, DE, DO and DOG. A node that has no dependant nodes, represented by the hierarchically lower level in the tree, is a leaf node. A leaf node represents the last character in a string. A node that has no parent, represented by the hierarchically higher level in the tree, is a root node. A root node represents the first character in a string. Associated with each node is a codeword used to uniquely identify the node. The assignment of codewords within the encoder dictionary of a data compression function, and the corresponding assignment of codewords within the decoder dictionary of the peer data compression function in the remote DCE are equivalent, and the codeword thus provides a reversible encoding of a string. 6.2 Dictionary initialization procedure On receipt of a C-INIT request primitive from the control function, the data compression function shall reset the encoder and decoder dictionaries to the initial condition. In the initial condition, each tree in the dictionary shall consist only of a root node. The codeword associated with each root node shall be N6 (the number of control codewords) plus the ordinal value of the character represented by the node. The counter C1, used in the allocation of new nodes (see S 6.5), shall be set to N5. 6.3 String matching procedure This procedure has the function of matching a sequence of characters (string) with a dictionary entry. The procedure shall commence with a single character representing the first character in the string. The following steps are then applied: a) a string shall be formed from the first character; b) if the string matches a dictionary entry, and the entry is not that entry created by the last invocation of the string matching procedure, then the next character shall be read and appended to the string and this step repeated; c) If the string does not match a dictionary entry or matches the entry created by the last invocation of the string matching procedure, the last character appended to the string shall be removed. The string thus shortened represents the longest matched string and the last character represents the unmatched character. This procedure will normally match the longest string of characters. However, there are two cases in which step b) shall be terminated before a longest match is found: i) if an exception condition occurs such as a C-INIT request primitive or C-FLUSH request primitive (while in compressed mode only); ii) when a transition between transparent and compressed modes of operation occurs. PAGE6 Recommendation V.42 bis FIGURE 2/V.42 bis Tree based representation of the dictionary When in transparent mode the encoder shall use only the criteria specified above for terminating the string matching procedure. When in compressed mode, however, the encoder may employ other criteria for terminating the procedure (e.g. a timeout). If the string matching procedure is terminated before a longest match is found, the next character from the DTE shall be considered to be the "unmatched character" for the purposes of updating the dictionary and restarting the string matching procedure. 6.4 Procedure for adding strings to the dictionary In order to maintain efficient compression, the dictionary is adapted by the addition of new strings. A new string shall be formed by appending a single character to an existing string, thereby adding a new node onto a tree. The single character shall be the unmatched character resulting from the string matching operation, or the prefix character resulting from the string decoding operation. Following this procedure, the single character required to restart the string matching procedure will be the unmatched character. There are two conditions under which a new string shall not be added: a) if this would result in the maximum string length, N7, being exceeded; b) if the string is already in the dictionary. Immediately after the creation of a dictionary entry, the procedure for recovering a dictionary entry shall be applied. 6.5 Procedure for recovering a dictionary entry This section defines a systematic procedure for recovering dictionary entries for re-use when all available entries have been filled. When the last available dictionary entry has been assigned, this procedure recovers a single entry, maintaining the association between the empty entry and its codeword. A counter C1 indicates the codeword associated with the next empty dictionary entry, and is maintained in the range N5 to N2 - 1. Counter C1 shall be set to N5 initially. Recommendation V.42 bis PAGE7 The procedure shall be applied only after the creation of a new dictionary entry, and shall consist of the following steps: a) counter C1 shall be incremented; b) if the value of C1 exceeds N2 - 1 then C1 shall be set to N5; c) if the node identified by the codeword with value C1 is in use and not a leaf node, then go to step a); d) if the node is a leaf node, then it shall be detached from its parent. 7 Operation of the encoding function 7.1 General The encoding function has five principal operations: a) string matching, in which a sequence of characters from the DTE is matched with a dictionary entry (see S 7.3); b) encoding, in which the codeword of the matched dictionary entry is represented as a binary value of length C2 bits (see S 7.4); c) transfer, in which either the codeword(s) in compressed mode or the characters in transparent mode are passed to the control function (see S 7.5); d) dictionary updating, in which a new dictionary entry is created, using the matched dictionary entry and the unmatched character (see S 7.6); e) nod recovery, in which a dictionary entry is recovered for use in the next dictionary update (see S 7.7). The encoding function operates in one of two modes, transparent mode and compressed mode, switching between these modes on the basis of the test applied in f) below. The sequence of operations, and the cycling of the escape character (see S 9) are identical in the two modes of operation. The encoder shall support two further operations, which shall be applied only during the string matching procedure in accordance with S 6.3: f) data compressibility testing, in which the efficiency of the encoding process is estimated and transparent mode or compressed mode selected to maximize efficiency (see S 7.8); g) flush, in which a C-FLUSH request from the control function indicates that all outstanding data shall be sent (see S 7.9). 7.2 Initial conditions On receipt of a C-INIT request the data compression function shall initialize the encoder to the following state: a) the dictionary shall be set to the initial condition described in S 6.2; b) the codeword size C2 shall be set to N3 + 1; c) the threshold C3 shall be set to N4 X 2; d) the function shall be set to transparent mode; e) the escape character shall be assigned the ordinal value 0. 7.3 String matching On receipt of a C-DATA request the data compression function shall apply the string matching procedure defined in S 6.3. The initial character required shall be the unmatched character resulting from the most recent invocation of this procedure. PAGE6 Recommendation V.42 bis 7.4 Encoding This procedure is used when in the compressed mode of operation. Its purposes is to represent the codeword as a sequence of C2 bits; the order and numbering of the bits is shown in Figure 3/V.42 bis. If the codeword corresponding to the matched dictionary entry is numerically equal to or greater than the threshold C3 then: a) the STEPUP control codeword shall be encoded and transferred using the current codeword size (C2); b) the codeword size C2, shall be increased by 1; c) multiply C3 by 2; d) if the codeword is still numerically greater than or equal to C3, steps a) to c) shall be repeated. The codeword is then transferred to the control function, in accordance with the procedures defined in S 7.5. FIGURE 3/V.42 bis Mapping of codewords into octets 7.5 Transfer In transparent mode, characters shall be passed to the control function for transmission in octet aligned form, using a C-TRANSFER indication. They may be transferred individually during the string matching procedure, or as a sequence following completion of the string matching procedure. In compressed mode, the matched string shall be encoded according to the procedure defined in S 7.4 and passed to the control function in packed form, with the least significant bit of a codeword immediately following the most significant bit of the preceding codeword. When the encoder changes state from transparent to compressed mode, the least significant bit of the first codeword to be transferred shall be bit 1 of the next octet position. Following transfer of a FLUSH control codeword, or when the encoder changes state from compressed to transparent mode following transfer of the ETM control codeword (see S 9) in the sequence, sufficient 0 bits shall be transmitted to ensure that the next transmitted character is octet aligned. Figure 3/V.42 bis provides an example of the data stream passed to the error control function during a transition from compressed to transparent mode. Two 11-bit codewords A and B are transmitted in compressed form followed by a transition to transparent mode. In this example, the transition requires insertion of seven 0 bits in order that the first uncompressed character, C sent in transparent mode, is octet aligned. 7.6 Dictionary updating A new dictionary node shall be created from the match string and corresponding unmatched character returned by the string matching procedure, using the procedures defined in S 6.4. Recommendation V.42 bis PAGE7 7.7 Node recovery Following the creation of a new dictionary node, the node recovery procedure defined in S 6.5 shall be applied. 7.8 Data compressibility test The data compression function shall periodically apply a test to determine the compressibility of the data. The nature of the test is not specified in this Recommendation; however it would consist of a comparison of the number of bits required to represent a segment of the data stream before and after compression. 7.8.1 Transition to compressed mode If the data compression function is in the transparent mode and determines that data compression would be effective, it shall: a) perform the dictionary update procedure using the current accumulated string and the next character to be processed by the string matching procedure (which will be the first character of the string represented by the first codeword transmitted in compressed mode); b) indicate to the peer data compression function that a transition to compressed mode is required, using the ECM transparent mode command sequence (see S 9.1); c) enter compressed mode. 7.8.2 Transition to transparent mode If the data compression function is in the compressed mode and determines that the data stream is currently not compressible, it shall: a) ensure that the codeword representing any partially encoded data has been transferred in accordance with the procedure given in SS 7.4 and 7.5; b) perform the dictionary update procedure using the current accumulated string and the next character to be processed by the string matching procedure (which will be the first character transmitted in transparent mode); c) indicate to the peer data compression function by transferring the ETM control codeword (see S 9) a transition to transparent mode; d) transmit sufficient 0 bits to recover octet alignment (see S 7.5); e) change the state to transparent mode. 7.8.3 RESET function In transparent mode the RESET command code may be used to indicate to the peer data compression function that the encoder dictionary is about to be re-initialized according to the procedures given in SS 6.2 and 7.2. The RESET command code is sent using the escape character value before re-initialization takes place. The circumstances under which the encoder requests a dictionary reset are not defined in this Recommendation, but would generally result from the encoder determining that some improvement in performance would result from resetting the dictionary. The procedures for requesting re-initialization of the dictionary on link establishment, or on detection by the control function of an error condition, are defined in SS 5.2 and 5.6. The RESET command code is not sent when a C-INIT request primitive is received from the control function. 7.9 Action on receipt of C-FLUSH request Upon receipt of a C-FLUSH request from the control function, if the encoder is in compressed mode and there is a partially-matched string being processed, the data compression function shall: PAGE6 Recommendation V.42 bis a) ensure that the codeword representing any partially-matched string is transferred in accordance with the procedures defined in SS 7.4 and 7.5; b) perform the dictionary update procedure using the current accumulated string and, when available, the next character to be processed by the string matching procedure; c) if step a) leaves extra bits pending for transmission (octet alignment not yet achieved), then: i) transfer the FLUSH codeword (see S 9); ii) if necessary, transfer sufficient 0 bits to recover octet alignment (see S 7.5). If the encoder is in transparent mode, on receipt of a C-FLUSH request from the control function, the data compression function shall transfer all outstanding data; the string matching procedure is not terminated, and the dictionary update procedure is not performed. 8 Operation of the decoding function The decoding function shall be capable of operation in both compressed and transparent modes, and shall operate in a manner consistent with that defined in SS 6, 7 and 9. On receipt of a C-INIT request from the control function or a RESET command code from the peer data compression function, the data compression function shall initialize the decoding function in accordance with the procedures defined in SS 6.2 and 7.2. In transparent mode, the decoding function shall apply the string matching procedure given in S 6.3, in order that the decoder dictionary may be maintained in a compatible state to the peer (remote) encoder dictionary. On receipt of the ECM or EID command code, the decoding function shall operate in a manner consistent with the encoder operations defined in SS 7.8.1 and 9.2. New dictionary entries shall be created in a manner consistent with the procedures defined in SS 6.4 and 7.3. In compressed mode the decoding function shall recover the encoded strings. On receipt of the ETM or FLUSH codewords the decoder shall operate in a manner consistent with the encoder operations defined in SS 7.8.2 and 7.9. New dictionary entries shall be created using the procedure defined in S 6.4, with the first (prefix) character of the most recently decoded string being appended to the previous decoded string. The decoder shall regard the STEPUP control codeword as an indication that the encoder has increased the codeword size in accordance with the procedures defined in S 7.4. 9 Communications between peer data compression functions 9.1 Control codewords and command codes The control codewords and command codes allocated for communication between peer data compression functions are given in Table 2/V.42 bis. 9.2 Procedures for use of the escape sequence A transparent mode command sequence shall consist of the escape character followed by one of the command codes listed in S 9.1 above. To reduce data expansion resulting from the escape mechanism defined below, if the current escape character is detected within the data stream from the DTE, the data compression function shall: a) if in transparent mode, transfer the detected escape character, and transmit the EID code, and then b) in both transparent and compressed modes, modify the value of the escape character by adding to it the decimal value 51, the addition to be performed modulo 256. Recommendation V.42 bis PAGE7 TABLE 2/V.42 bis Control code words (used in compressed mode) Code word Name Description 0 ETM Enter transparent mode 1 FLUSH Flush data 2 STEPUP Step up codeword size Command codes (used in transparent mode) Value Name Description 0 ECM Enter compression mode 1 EID Escape character in data 2 RESET Force reinitialization 3 to 255 Reserved 10 Parameters The following parameters are required by the data compression function. N1 to N7 and P0 to P2 apply to both directions of transmission, whilst a separate set of C1, C2, C3 variables must be provided in the encoder and decoder. N1 Maximum codeword size (bits); N2 Total number of codewords; N3 Character size (bits): N3 = 8; N4 Number of characters in the alphabet: eq N\s\do5(4) = 2\s\up5(N3); N5 index number of first dictionary entry used to store a string: N5 = N4 + N6; N6 Number of control codewords: N6 = 3; N7 Maximum string length; C1 Next empty dictionary entry; C2 Current codeword size; C3 Threshold for codeword size change; P0 V.42 bis data compression request; PAGE6 Recommendation V.42 bis P1 Number of codewords (negotiation parameter); P2 Maximum string size (negotiation parameter). Recommendation V.42 bis PAGE7 ANNEX A (to Recommendation V.42 bis) Procedures for negotiating V.42 bis when used with V.42 When using this data compression Recommendation with V.42 error control, the XID negotiation procedure shall be used (see SS 7.6, 8.10, 10 of Rec. V.42 and ISO 8885 - 1987 [6, 7].) A data link layer subfield in addition to those defined in Recommendation V.42 shall be used for this purpose. It shall appear in the XID frame immediately before the user data subfield and shall be encoded as in Table A-1/V.42 bis. During the protocol establishment phase, the presence of parameter P0 in the Private Parameter Set Data Link Layer Subfield of the XID frame shall indicate a request for data compression. Note - The incorporation of the contents of this Annex into Recommendation V.42 is under consideration by Study Group XVII. TABLE 1/V.42 bis Bit 8 . . . 1 Group ID 11110000 Private parameter ser (ISO 8885, Addendum 3) Group Length nnnnnnnn (MSB) Length of parameter field (excludes group ID and length) nnnnnnnn (LSB) Parameter ID 00000000 Parameter set identifier Parameter length 00000011 Length of string Parameter value 01010110 V 00110100 4 00110010 2 Parameter ID 00000001 Rec. V.42 bis - Data Compression request (P0) Parameter length 00000001 Length of field Parameter value 000000nn Request for compression in: 00 neither direction 01 negotiation initiator-responder direction only 10 negotiation responder-initiator direction only 11 both directions Parameter ID 00000010 Rec.V.42 bis - Number of codewords Parameter length 00000010 (P1) Parameter value nnnnnnnn 16-bit integer nnnnnnnn (MSB) Value of parameter P1 Parameter ID 00000011 (LSB) Parameter length 00000001 Rec. V.42 bis - Maximum string length Parameter value nnnnnnnn (P2) 8-bit integer Value of parameter P2 MSB:Most significant bit LSB: Least significant bit PAGE6 Recommendation V.42 bis APPENDIX I (to Recommendation V.42 bis) SDL description of encoder (Z.100 to Z.104) [8] Figure I-1/V.42 bis shows the set of SDL symbols used in the diagrams of this Appendix. The following diagrams provide an illustration of the operation of the encoder: a) Figure I-2/V.42 bis: Encoder (see SS 7.2, 7.3, 7.8, 7.9). This diagram illustrates the operation of the main elements of the encoder. b) Figure I-3/V.42 bis: Process of character (see SS 6.3, 6.4, 6.5). This diagram illustrates the operation of the string matching procedure, the conditions under which it is terminated and the action taken. c) Figure I-4/V.42 bis: Check codeword size (see S 7.4). This diagram illustrates the codeword size step up mechanism. d) Figure I-5/V.42 bis: Test compression (see S 7.8). This diagram illustrates the procedures for changing between transparent and compressed modes of operation and for use of RESET. e) Figure I-6/V.42 bis: Flush (see S 7.9). This diagram illustrates the action taken on receipt of a C-FLUSH request. f) Figure I-7/V.42 bis: Exception process next character (see SS 7.8.1 a, 7.8.2 b, 7.9 b). This diagram illustrates the means by which the compressed mode character processing is performed following a transition to compressed mode or a flush operation. g) Figure I-8/V.42 bis: Escape character procedure (see S 9.2). h) Figure I-9/V.42 bis: Signal reset procedure (S 7.8.3). i) Figure I-10/V.42 bis: Add string + character to dictionary procedure (SS 6.4 and 6.5). FIGURE I-1/V.42 bis SDL Symbols used Recommendation V.42 bis PAGE7 FIGURE I-2/V.42 bis Encoder PAGE6 Recommendation V.42 bis FIGURE I-3/V.42 bis Process character procedure Recommendation V.42 bis PAGE7 FIGURE I-4/V.42 bis Check codeword size procedure PAGE6 Recommendation V.42 bis FIGURE I-5/V.45 bis Test compression procedure Recommendation V.42 bis PAGE7 FIGURE I-6/V.42 bis Flush procedure PAGE6 Recommendation V.42 bis FIGURE I-7/V.42 bis Exception process next character procedure Recommendation V.42 bis PAGE7 FIGURE I-8/V.42 bis Escape character procedure FIGURE I-9/V.42 bis Signal "Reset" procedure PAGE6 Recommendation V.42 bis FIGURE I-10/V.42 bis Add "string + character" to dictionary procedure Recommendation V.42 bis PAGE7 APPENDIX II (to Recommendation V.42 bis) Guidance for implementors The following notes provide information on the implementation of the data compression scheme and on the selection of parameters. II.1 Selection of N2, the total number of codewords The dictionary size is equal to N2 - N6 (assuming that entries are not provided for the reserved codewords). The selection of a large value for N2 means that the number of strings available is large, but also that the value of N1 is larger. The gain in performance obtained from the selection of a larger dictionary may be offset by the larger codeword size needed, and for certain types of data, better performance may be obtained by using a smaller dictionary. If values of N2 in the range 2n + 1 (for integer n) to approximately 1.3 X 2n are selected, no performance improvement will be gained over the selection of the value 2n. A value for N2 of 2048 provides good compression performance across a wide range of data types. II.2 Data structures The data compression scheme described in this Recommendation is well suited to implementation using a tree based data structure. This type of data structure will provide good utilization of memory space and fast searching. II.3 Calculation of compression performance The calculation of compression performance may be expressed as the number of characters received by an encoder divided by the number of octets transferred from the encoder (to the error control function). The count of characters and octets should be set to zero on receipt of a C-INIT request. II.4 Examples of the operation of the encoder The following three examples illustrate the operation of the encoder. It is assumed that the dictionary is in the state shown in Figure 2.V.42 bis. II.4.1 Simple case: "BAY", compressed mode The first character "B" is read, and the dictionary searched for the string "B". As this string is present, the next character "A" is read and appended, forming the new string "BA". The dictionary is searched for the new string and when it is found, the next character "Y" is read and appended, forming the new string "BAY". The dictionary is searched for "BAY", which is not present. "Y" is removed, and the string matching procedure exits with "BA" as the matched string and "Y" as the unmatched character. The codeword for "BA" is encoded in C2 bits, packed into octets, and passed to the control function for transmission. The new string "BAY" is created by appending "Y" to "BA", assigning the codeword with value C1 to this new string. C1 is incremented, and the node (string) currently assigned this value is tested to see if it is empty or is a leaf node. If the node is empty, it will be used in the next dictionary update. If the node is used and is not a leaf node, i.e. is part of a longer string, then C1 is incremented again and the test repeated. If the node is a leaf node, then it is detached from its parent and will be re-used in the next dictionary update. The character "Y" will be used to restart the string match. II.4.2 Simple case: "BAY", transparent mode In transparent mode, the same sequence of operation described in S II.4.1 will occur, the only difference being that the characters "A", "Y" will be transmitted in place of the codeword for "BA". PAGE6 Recommendation V.42 bis II.4.3 Repeated characters or sequences: "CCCCC", compressed mode The aim of this example is to illustrate a particular aspect of the algorithm. As the encoder is able to update its dictionary on the basis f look- ahead, whilst the decoder is only able to update its dictionary on the basis of previously decoded data, it is necessary to ensure that the encoder does not use new dictionary entries before they are transmitted to the decoder. The first "C" is read and will be matched with the dictionary entry for "C". The second "C" is read, appended to the first, and the dictionary searched for "CC". As "CC" is not in the dictionary, the string matching procedure exits with the matched string as "C" and the unmatched character as "C". "CC" is added to the dictionary, the codeword for "C" sent, and string matching restarts with the second "C". The third "C" is read, appended to the second "C", forming "CC", and the dictionary searched for "CC". As this is in the dictionary but is, however, the entry created since the last string match, [see S 6.3 b)], the string matching procedure exits with the matched string as "C" and the unmatched character as "C". "CC" is not added to the dictionary as it is already present, the codeword for "C" sent, and string matching starts with the third "C". The fourth "C" is read, appended to the third "C", forming "CC", and the dictionary searched for "CC" As "CC" is found in the dictionary, and it does not match the entry created since the last string match (the update operation was inhibited), the fifth "C" is read and appended to the string. References [1] CCITT Recommendation Error Correcting Procedures for DCEs using Asynchronous-to-Synchronous Conversion, Vol. VIII, Rec. V.42. [2] CCITT Recommendation Support by a ISDN of Data Terminal Equipment with V- Series Type Interfaces with Provision for Statistical Multiplexing, Vol. VIII, Rec. V.120. [3] ISO 3309 - Data Communication - High Level Data Link Control Procedures - Frame Structure. [4] CCITT Recommendation Definitions of terms concerning data communication over the telephone network, Vol. VIII, Rec. V.7. [5] CCITT Recommendation Transmission of Start Stop Characters over Synchronous Bearer Channels, Vol. VIII, Rec. V.14. [6] ISO 8885 - 1987 Information Processing Systems - Data Communications - High Level Data Link Control Procedures - General Purpose XID Frame Information Field Content and Format. [7] ISO 8885 - 1987/ADD3 Information Processing Systems - Data Communications - High Level Data Link Control Procedures - General Purpose XID Frame Information Field Content and Format - Addendum 3: Definition of a private parameter data link layer subfield. [8] Serie Z.100 CCITT Recommendations Functional Specification and Description Language (SDL), Vol. X. Recommendation V.42 bis PAGE7