Author(s): Bruce Schneier

ISBN: 0471128457

Publication Date: 01/01/96

Previous | Table of Contents | Next |

Ideally, you only want to collect one random bit per keystroke. Collecting more may skew the results, depending on how good a typist is sitting at the keyboard. This technique is limited, though. While it’s easy to have someone type 100 words or so when it is time to generate a key, it isn’t reasonable to ask the typist to type a 100,000-word essay to generate a keystream for a one-time pad.

*Biases and Correlations*

A major problem with all these systems is that there could be nonrandomness in the generated sequence. The underlying physical processes might be random, but many kinds of measuring instruments are between the digital part of the computer and the physical process. Those instruments could easily introduce problems.

A way to eliminate **bias**, or skew, is to XOR several bits together. If a random bit is biased toward 0 by a factor *e*, then the probability of 0 can be written as:

- P(0) = .5 +
*e*

XORing two of these bits together yields:

- P(0) = (.5 +
*e*)^{2}+ (.5 -*e*)^{2}= .5 + 2*e*^{2}

By the same calculation, XORing 4 bits together yields:

- P(0) = .5 + 8
*e*^{4}

XORing *m* bits will exponentially converge to an equal probability of 0 and 1. If you know the maximum bias you are willing to accept for your application, you can calculate how many bits you need to XOR together to get random bits below that bias.

An even better method is to look at the bits in pairs. If the 2 bits are the same, discard them and look at the next pair. If the 2 bits are different, take the first bit as the output of the generator. This eliminates bias completely. Other techniques for reducing bias use transition mappings, compression, and fast Fourier transforms [511].

The potential problem with both methods is that if there is a **correlation** between adjacent bits, then these methods will increase the bias. One way to correct this is to use multiple random sources. Take four different random sources and XOR the bits together; or take two random sources, and look at those bits in pairs.

For example, take a radioactive source and hook a Geiger counter to your computer. Take a pair of noisy diodes and record as an event every time the noise exceeds a certain peak. Measure atmospheric noise. Get a random bit from each and XOR them together to produce the random bit. The possibilities are endless.

The mere fact that a random-number generator has a bias does not necessarily mean that it is unusable. It just means that it is less secure. For example, consider the problem of Alice generating a triple-DES 168-bit key. All she has is a random-bit generator with a bias toward 0: It produces 55 percent 0s and 45 percent 1s. This means that there are only 0.99277 bits of entropy per key bit, as opposed to 1 bit of entropy if the generator were perfect. Mallory, trying to break the key, can optimize his brute-force search to try the most probable key first (000...0), and work toward the least probable key (111...1). Because of the bias, Mallory can expect to find the key in 2^{109} attempts. If there were no bias, Mallory would expect to make 2^{111} attempts. The resultant key is less secure, but not appreciably so.

*Distilling Randomness*

In general, the best way to generate random numbers is to find a whole lot of seemingly random events and distill randomness from them. This randomness can then be stored in a pool or reservoir that applications can draw on as needed. One-way hash functions are ready-made for the job; they’re fast, so you can shovel quite a bit through them without worrying too much about performance or the actual randomness of each observation. Hash almost anything you can find that has at least some randomness. Try:

- — A copy of every keystroke
- — Mouse commands
- — The sector number, time of day, and seek latency for every disk operation
- — Actual mouse position
- — Number of current scanline of monitor
- — Contents of the actually displayed image
- — Contents of FATs, kernel tables, and so on
- — Access/modify times of /dev/tty
- — CPU load
- — Arrival times of network packets
- — Input from a microphone
- — /dev/audio without a microphone attached

If your system uses separate crystal oscillators for its CPU and time-of-day clocks, try reading the time of day in a tight loop. On some (but not all) systems this will reflect the random phase jitter between the two oscillators.

Since much of the randomness in these events is in their timing, use the most finely grained time-of-day clock you can find. A standard PC uses an Intel 8254 clock chip (or equivalent) driven at 1.1931818 megahertz, so reading the counter register directly gives you 838-nanosecond resolution. To avoid skewing the results, avoid taking your event samples on a timer interrupt.

Here is the process in C with MD5 (see Section 18.5) as the hash function:

char Randpool[16]; /* Call early and call often on a wide variety of random or semi- * random system events to churn the randomness pool. * The exact format and length of randevent doesn’t matter as long as * its contents are at least somewhat unpredictable. */ void churnrand(char *randevent,unsigned int randlen) { MD5_CTX md5; MD5Init(&md5); MD5Update(&md5,Randpool,sizeof(Randpool)); MD5Update(&md5,randevent,randlen); MD5Final(Randpool,&md5); }

After calling churnrand() enough to build up sufficient randomness in Randpool, you can now generate random bits from it. MD5 again comes in handy, this time as a counter-mode pseudo-random byte-stream generator.

long Randcnt; void genrand(char *buf,unsigned int buflen) { MD5_CTX md5; char tmp[16]; unsigned int n; while(buflen != 0) { /* Hash the pool with a counter */ MD5Init(&md5); MD5Update(&md5,Randpool,sizeof(Randpool)); MD5Update(&md5,(unsigned char *)&Randcnt,sizeof(Randcnt)); MD5Final(tmp,&md5); Randcnt++; /* Increment counter */ /* Copy 16 bytes or requested amount, whichever is less, * to the user’s buffer */ n = (buflen < 16) ? buflen : 16; memcpy(buf,tmp,n); buf += n; buflen -= n; } }

The hash function is crucial here for several reasons. First, it provides an easy way to generate an arbitrary amount of pseudo-random data without having to call churnrand() each time. In effect, the system degrades gracefully from perfect to practical randomness when the demand exceeds the supply. In this case it becomes *theoretically* possible to use the result from one genrand() call to determine a previous or subsequent result. But this requires inverting MD5, which is computationally infeasible.

This is important since the routine doesn’t know what each caller will do with the random data it returns. One call might generate a random number for a protocol that is sent in the clear, perhaps in response to a direct request by an attacker. The very next call might generate a secret key for an unrelated session that the attacker wishes to penetrate. Obviously, it is very important that an attacker not be able to deduce the secret key from the nonce.

One problem remains. There must be sufficient randomness in the Randpool[] array before the first call to genrand(). If the system has been running for a while with a local user typing on the keyboard, no problem. But what about a standalone system that reboots automatically without seeing any keyboard or mouse input?

This is a tough one. A partial solution would require the operator to type for a while after the very first reboot, and to create a seed file on disk before shutting down to carry the randomness in Randseed[] across reboots. But do not save the Randseed[] array directly. An attacker who steals this file could determine all of the results from genrand() after the last call to churnrand() prior to the file being created.

The fix to this problem is to hash the Randseed[] array before storing it, perhaps by just calling genrand(). When the system reboots, you read in the seed file, pass it to churnrand(), then promptly destroy it. Unfortunately, this does not deal with the threat of someone stealing the seed file between reboots and using it to guess future values of the genrand() function. I see no solution to this problem other than to wait until enough external random events have taken place after a reboot before allowing genrand() to produce results.

Previous | Table of Contents | Next |