Calculating the XMODEM CRC
J. LeVan, Ph.D.
Introduction:
I have been in the process of developing a terminal emulator
program for my home computer. I was very unhappy with the Xmodem
transfer rate, especially at high modem speeds. The checksum
method is clearly faster, but cannot catch communication errors
as effectively. The CRC method has an upper end throughput of
500 bytes per second.
I place a help call on CompuServe and receive an intriquing
response from Bela Lubkin. The algorithm he suggested appeared
to be much quicker than the classical bit shift method. It was
not at all clear to me how the algorithm worked. The purpose of
this note is an explanation of how and why a table lookup
algorithm can work.
Throughout this note, I will use "effective throughput" to
mean:
(SizeOfPacket * NumberOfPackets)/(TimeOfEot - TimeOfInitialSOH)
All source code will be in C. These programs were written to run
on an Apple MacIntosh, which is equipped with a high speed serial
port.
Below is the standard bit shift and occasionally XOR
algorithm. The condition for doing the XOR is simply, if a 1 is
going to be shifted out of the sign bit then shift and XOR the
old CRC with the generator. Note that the inner loop is
performed exactly eight times.
________________________________________________________________
___Fig.1: Bit Shift CRC_________________________________________
unsigned short ComputeCrc(crc,bufptr,len)
register unsigned short crc; /* unsigned shorts are 16 bits */
register char *bufptr;
register short len;
{
register int i;
while (len--) { /* calculate CRC from end to start*/
crc ^= (unsigned short) (*bufptr++)<<8;
for ( i=0 ; i<8 ; ++i ) {
if (crc & 0x8000)
crc = (crc << 1) ^ 0x1021;
else
crc <<= 1;
}
}
return(crc);
}
________________________________________________________________
After exactly eight shifts, there will be no carries from
the low byte out of the top of the high byte. This leads us to a
critical observation:
Ob 1: Whether or not an XOR is done with the CRC does *not*
depend on the low byte of the CRC.
Let A, B, and C be sixteen bit unsigned quantities. We'll
use the standard C notation for XOR, a carat (^). The following
mathematical properties hold.
Ob 2: XOR is Associative and Commutative. I.e: For any choice
of A, B and C; these equations will always hold true:
A^B = B^A (1)
(A^B)^C = A^(B^C) (2)
Likewize, using the standard C notation for a left shift
( A<**> 8^(*bufptr++)] ^ crc<<8;
return(crc);
}
________________________________________________________________
Thus our third term is found by:
((P1<<7) ^ (P2<<6) ^ ... ^ P8) = CrcTable[crch>>8]
the final crc is caluclated by:
crc = crc_table[crch>>8] ^ (crcl<<8)
Substituting this relation into the algorithm in Fig. 1, we
easily obtain the algorithm in Fig. 2.
That much having been said, it only remains to be seen,
whether or not it has all been worthwhile. How much faster is
table lookup?
After building a 128 byte sector filled with random numbers,
I ran a benchmark in which the CRC and Checksum were each
calculated 100 times with the following results. Here each tick
represents 1/60th of a second.
________________________________________________________________
___Table.1: Comparison of methods for 100 sectors_______________
Slow CRC 120 ticks
Table CRC 25 ticks
Checksum 3 ticks
________________________________________________________________
The table driven method is an order of magnitude faster than
the older CRC method, and only an order of magnitude slower than
the Checksum method. This offers a much smoother trade off than
originally available with the CRC check. What do these
differences mean in terms of effective through put?
My communications program also uses the following
optimizations which can affect transfer speed:
All disk I/O is double buffered.
The transmitter sends the body of the Xmodem Packet while
simultaneously calculating the CRC. This leads to an almost zero
calculate time at low speeds.
The machine on the receiving end of the line only calculates
the CRC after a complete packet has arrived.
________________________________________________________________
___Table.2: Transfer measurements_______________________________
Effective Throughput: (bytes/sec)
Baud: 1200 2400 9600 19200 57600
Method: (128 byte packets)
Slow CRC 106.......201.......598.......770.......773
Table CRC 109.......210.......692.......1113......1172
Checksum 110.......213.......709.......1155......1263
Method: (1k byte packets)
Slow CRC ..........210.......628.......939.......994
Table CRC ..........220.......729.......1185......1185
Checksum ..........222.......745.......1226......1973
________________________________________________________________
One should take benchmarks with a grain of salt. However,
it is clear that the table lookup algorithm stays closer to the
checksum method than the standard bit shift algorithm. The
efficiency and speed of compiled code can vary greatly from
machine to machine. These tests were made with two computers
linked directly through their serial ports.
I have found that CompuServe typically yields and effective
throughput of 60-80 bytes per second at 1200 baud. Genie
typically yields 70-80 bytes per second, and my neighborhood Vax
785 lets me run with an effective throughput of 103-105 bytes per
second, all with Xmodem.
In my first attempt at Xmodem, I used a function call for
processing each character in the crc loop. In addition, I did
not overlap the CRC calculation with transmission of the packet.
This was a dreadful mistake. I could not get an effective
throughput of more than 500 characters per second even with 4k
buffers at any speed. The bottom line is that it take a number
of optimizations to speed up a complex process.
**