The Prisoner's Dilemma: Results

Jon Orwant

In the last issue of TPJ, readers were asked to submit strategies for a three-way Prisoner's Dilemma. The notion is simple: you and two friends have been arrested for some crime, locked up in separate rooms, and asked to testify against one another. If all three of you testify, everyone goes to jail for a long time. If all of you keep mum, everyone goes to jail for a couple of years. If one or two out of the three testify, they benefit at the others' expense.

Each strategy, encoded as a single Perl subroutine, was pitted against every other pair of strategies in a "duel" of 100 matches. The distinction between a duel and a match is important, since in a series of matches each strategy can observe the others and decide how to act based on their past behavior.

A total of 31 strategies were submitted, yielding

(31)     31!
     =  ---- = 4495
( 3)    3!28!

duels, or about 450,000 matches, in which each strategy returned a single letter: either an 'H' to Hold out (or cooperate, in game theory parlance) or a 'T' to Testify (or defect).

This particular contest was inspired by Profs. Mitch Resnick and Mike Eisenberg, who in 1987 conducted a three-way Prisoner's Dilemma contest at MIT. Your august editor entered that contest (in which the strategies were Scheme functions instead of Perl subroutines), noticed that the contest rules (there were four) didn't prohibit mutators, and wrote a function that changed its opponents' histories, yielding the highest possible score. I was disqualified, and a fifth rule excluding such strategies was added the next year.

I've been seething over this for the past nine years, and wanted some moral validation for what I thought was a clever insight. That's why I left the door wide open for similar strategies in this contest: there was nothing prohibiting subroutines from poking around the filesystem or the symbol table to locate, dissect, or modify other strategies, or even the judging program itself.

A few hackers, notably Felix Gallo and Randal Schwartz, spotted these possibilities, but alas, no one actually tried anything, whether from lack of time or surplus of scruples I don't know. Felix also pointed out another devious tactic: collude with other contestants so that their strategies could recognize one another by their behavior. Then a previously-elected "master" could defect while the other "slaves" all cooperated, sacrificing themselves for the good of the master.

That's exactly what Ken Albanowski, Earle Ake, and Dan Wisehart did: their "gestalt" entries identify whenever Ken's strategy is playing one of the other two. If so, the slave holds out while the master testifies, resulting in a bad score for Earle or Dan but a good score for Ken. It worked: Ken's strategy finished first, and martyrs Earle and Dan finished 27th and 28th.

The top ten, along with the number of years their strategy spent in jail:

Kenneth Albanowski
    (with sacrifices by Earle Ake
     and Dan Wisehart)                  135164      
Peter Frodin                            147341
Eric Le Saux                            147624
Francisco Azevedo                       147678
Dave Pavlick and Beirne Konarski        148527
Bill England                            149053
Peter Seibel                            149317
Steve Frost                             149328
Ravi Kolagotla                          149396
Giovanni Marzot                         149412

Peter Frodin's second-place strategy is a less tolerant version of the fool_me_twice() strategy explained in last issue: his fool_me_once() testifies if either of the opponents testified during the previous match. (An honorable mention goes to Brian Gough, whose subroutine is identical to Peter's, but didn't score in the top ten because of other strategies that behaved nondeterministically.) Eric Le Saux' elephant() placed third by noticing whether opposing strategies retaliate against testifying. Francisco Azevedo's strategy was nice but unforgiving: it holds out until someone testifies, and then testifies forever after. Dave Pavlich and Beirne Konarski collaborated on the most complex strategy submitted: their subroutine contains five additional subroutines, implementing a temporary "probation" state for opponents. It testifies if both opponents are on probation and either one violated his probation by testifying last round.

Then the backstabbing began: the bottom half of the top ten, and ten out of the 31 strategies overall, simply testified each and every time. (Again, differences in score were due to random behavior exhibited by other strategies.) Peter Seibel used genetic programming to breed winning strategies, and found nothing that performed better than a pure testifier. Georg Rehfeld's be_a_good_boy_all_the_time() was the exact opposite: it cooperated regardless of the other strategies' actions. In his description, Georg said,

I think this is the only strategy to save the world: don't do any harm to anybody, be helpful and cooperative all the time. This holds true in real life, I believe...

His strategy finished dead last.