PREVIOUS  TABLE OF CONTENTS  NEXT 

Randomness

Jon Orwant

One of the most common questions on comp.lang.perl.misc is "How come my random numbers aren't random?" Usually, a simple answer involving srand() is sufficient. But the simplicity of these answers belies just how hard it is to have a computer generate good random numbers.

In fact, it's downright impossible to generate completely random numbers. We live in a determinist universe, there's no free will, and those of you muttering about quantum uncertainty and souls should go back to your Penrose - tiled holes in the sand.

Anyway, while no source provides completely random numbers, some sources are more random than others. The PGP encryption package, for instance, has you type aimlessly at your keyboard, using the intervals between keystrokes as a source of random numbers. It measures tiny time differences, making it a good source of random numbers unless you possess a truly robotic sense of rhythym, in which case you probably have career options that don't involve the use of contraband cryptosystems.

The measurement of keystrokes is slow and intrusive, so some computers rely on rapidly changing areas of computer memory for their random numbers, or network packet arrival times. Others amplify ambient noise collected from disconnected audio or video inputs. Still others calculate disk seek times; minute variations in speed are caused by air turbulence, which is pretty hard to predict.

Computers that take their randomness seriously have physical hardware dedicated to random number generation: unstable free - running oscillators, a source of radioactive decay, or thermal noise from an amplified diode. That's about as random as you can get.

But the vast majority of computers generate random numbers the same way. They're more properly called pseudorandom numbers, because if you know what number the computer just picked, you can predict the next number.

This is why Las Vegas Mafia thugs break your kneecaps if you try to pry open a slot machine. All modern slot machines first determine what your payoff will be using a pseudorandom number generator, and then spin the wheels so that the appropriate fruits appear. That way, wear and tear on the moving parts of a slot machine won't affect the payoffs. (Casino slot machines are networked in a LAN of sorts; that's how million dollar progressive jackpots are possible. The Bally slot machines in Vegas are even in a WAN connecting many different casinos; that's how they can offer an $11 million jackpot. But I digress.)

Congruential Generators

Many computers use linear congruential generators to create random numbers. It's a lousy method. There are plenty of better ones, but they're slower, which accounts for the near - universality of this lame scheme, which works as follows:

That's all there is to it. Expressed as an equation,

nj+1 = anj + b (mod c)

where nj+1 is the new random number, nj is the last random number, and a, b, and c are all positive integers specific to a particular generator.

So if every random number is determined by the previous random number, where's the first random number come from? You. It's called the seed of the random number generator, and you provide it to Perl (and C) with the srand() function. This is why we see all those questions on comp.lang.perl.misc: people don't realize that without seeding their generators, they'll get the same sequence of random numbers each time. Less often, you see people who know that they're supposed to use srand(), but they aren't quite sure what it does, so they use it as often as possible, with disastrous (but predictable!) results.

Choosing the seed

Astute readers will notice that linear congruential generators produce only integers. And sure enough, if you use rand() in a C program, you'll get integers. But Perl's rand(), like Perl itself, is a little quirky and a little more useful: it divides the integer by the system constant RAND_MAX, yielding a fraction between 0 and 1. On systems which use these generators, RAND_MAX will simply be c - 1.

Linear congruential generators can be cracked, yielding the values of a, b, and c, with only moderate difficulty. There's a complete scheme in [Krawczyk]; to give you a taste (and part I of this issue's contest), I've created an imaginary generator by choosing an a, b, and c. Your task is to deduce their values. I'll give you four pairs of numbers, each containing a seed fed to srand() and the result from rand():

seed	result of rand()
1           26
2            3
3           12
4           21

I picked this generator out of thin air. Real computers wouldn't use such simple generators, of course, but I got a little lucky: my generator happens to be maximal - length, which means that it will only repeat itself after c calls to rand(). Had I picked bad values for a, b, and c, the result would have been a generator that repeats much more quickly. This is a recurring problem.

The ANSI C committee hasn't been much help, either. They published a mediocre generator ( a = 1103515245, b = 12345, and c = 32768) that was meant to demonstrate how pseudorandom number generation might be implemented. As [Press] points out, the ANSI generator was meant only to be an example. But sure enough, many major operating systems, even today, use it as their rand(). Run the program below to see if your computer falls victim to the ANSI plague.

The c = 32768 is especially troubling, since it means that your computer will repeat itself after 32768 calls to rand(). Many years ago, when developing an image processing application, I noticed that my 512x512 image had a discernible pattern repeated every eight lines. The reason? When I added a little white noise to the image, I was working on a computer that used the ANSI generator, and sure enough, (512 * 512) / 32768 is eight.

Press et al. reveal that the only thing worse than the ANSI generator is a system that tries to fix it:

...one popular 32 - bit PC - compatible computer provides a long generator that uses the above congruence, but swaps the high - order and low - order 16 bits of the returned value. Somebody probably thought that this extra flourish added randomness; in fact it ruins the generator.

Tampering with the result of a generator is like shuffling a deck of cards again to make them more random. Sometimes it does; sometimes it doesn't. (If you can shuffle a 52 - card deck perfectly eight times in a row, you'll return the deck to its original order.)

Linear congruential generators are pretty fast: one multiply, one add, one modulus, and bang, you're done. But for the speed freaks among us, it turns out that a mulplicative congruential generator:

nj+1 = anj mod c

(that is, a linear congruential generator with b = 0) can do just as well if you choose your a and c wisely.

Random Number Generator.

LFSRs

Linear Feedback Shift Registers, or LFSRs, are another popular way of producing random numbers. Like congruential generators, LFSRs are completely deterministic, but they can be embedded in electronic devices with less hardware.

An LFSR provides an endless procession of pseudorandom bits, as opposed to full - fledged pseudorandom integers. Don't fret about the fact that LFSRs only produce a bit at a time: by combining multiple bits, you can create a pseudorandom number in any range you want. Of course, if you're as lazy as Mike Stok said he was in the previous article, you might not like this, and I don't blame you: I'd rather roll a six - sided die than flip a coin three times (disregarding HHH and TTT, and then arbitrarily assigning the other outcomes to one through six).

(As an aside, cryptographers often talk about a coin flip as if it were the ideal generator of random bits. Skilled magicians, however, can manipulate coin flips. And because of the way coins are struck by dies at the mint, the edges are often not quite perpendicular to the sides. Don't believe me? Balance a penny on its edge and then bump the table. It'll fall heads more than half the time.)

You can think of every LFSR as containing a "window" of bits. In a 4 - bit LFSR, the system sees four bits at a time; when a user wants a random bit, the system extracts the rightmost bit, shifts the other three bits one place to the right, and calculates a new leftmost bit by XORing some of those four bits together.

For instance, a very simple LFSR might operate on four bits at a time, XORing the first and third bits. Let's say you seed this generator with 15, which is 1111 in binary. Then the first nine pseudorandom bits will be 1, 1, 1, 1, 0, 1, 0, 0, and 1, because they're the rightmost bits of this sequence:

1111
0111
1011
0101
0010
1001
1100
1110
0111

Consider the fourth state: 0101. We arrive at the fifth state by XORing the first and third bits, 0 and 0, yielding 0. That's our new leftmost bit; the other bits are the underlined leftover bits from the fourth state, shifted right.

As with congruential generators, you need to be very careful when choosing the parameters; you can't just XOR any bits you like. The four bit generator above is a "real" LFSR. It's maximal length; no matter what seed you provide, it will repeat after no less than seven iterations. As part II of this issue's contest, I present to you a nine bit generator; can you crack the generator by deducing which bits are XORed?

1 0 1 1 0 1 0 0 1
1 1 0 1 1 0 1 0 0
0 1 1 0 1 1 0 1 0
0 0 1 1 0 1 1 0 1
1 0 0 1 1 0 1 1 0
0 1 0 0 1 1 0 1 1
1 0 1 0 0 1 1 0 1
0 1 0 1 0 0 1 1 0
1 0 1 0 1 0 0 1 1
0 1 0 1 0 1 0 0 1
1 0 1 0 1 0 1 0 0
1 1 0 1 0 1 0 1 0
0 1 1 0 1 0 1 0 1
0 0 1 1 0 1 0 1 0
0 0 0 1 1 0 1 0 1
1 0 0 0 1 1 0 1 0
0 1 0 0 0 1 1 0 1
0 0 1 0 0 0 1 1 0
0 0 0 1 0 0 0 1 1
0 0 0 0 1 0 0 0 1
0 0 0 0 0 1 0 0 0
1 0 0 0 0 0 1 0 0
0 1 0 0 0 0 0 1 0

Unfortunately, I wasn't so lucky as I was when I made up my congruential generator; this LFSR repeats itself after 127 iterations, which I learned from the program on this page. I've placed both programs (and the data above) on the TPJ web site at http://www.tpj.com/contest.html.

If you don't think any of this stuff matters, consider a great cryptosystem. A superb cryptosystem, one that uses, say, hundred - thousand - bit keys. To generate those keys, it needs random numbers. Now suppose your computer uses an eight - bit random number generator. Guess how many attempts you need to crack the cryptosystem through exhaustive search. Not 265536, but 28 = 256.

And yes, my computer (a DEC Alpha running Digital Unix) still uses that stupid ANSI generator.

Implement Arbitrary-sized Linear Feedback Shift Registers.

References

D. Eastlake 3rd, S. Crocker, and J. Schiller. Randomness Recommendations for Security. RFC 1750.

Krawczyk, H. How to Predict Congruential Generators. Journal of Algorithms, 13(4), December 1992.

Press, W., S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. Cambridge University Press, 1992.

Schneier, B. Applied Cryptography. John Wiley & Sons, 1994.

__END__


PREVIOUS  TABLE OF CONTENTS  NEXT