PREVIOUS  TABLE OF CONTENTS  NEXT 

Perl, Politics, and Pairwise Voting: Perl as the Activist's Friend

Rob Lanphier

The U.S. Presidential election once again draws near, and once again we see a contest between two men, each representing one of the two major U.S. political parties. So it goes with the two-party system.

What is it that makes the two-party system a two-party system? It's a direct consequence of plurality voting, the predominant form of balloting used in the United States where the highest vote getter wins an election. This relationship between the two-party duopoly and plurality voting is known as "Duverger's Law," after the 20th century political scientist who had the guts to call it a "law" (Riker, 1982).

Duverger's Law has some disturbing consequences which leaves many voters dissatisfied with the status quo. Politicians will always claim to "feel our pain," but at least in the U.S., two-party skeptics abound. Recent polls have shown that nearly 60% of Americans would support the formation of a new major party (Barrett, 1996).

The main reason Duverger's Law rings so true is that we have a binary ballot that groups people into two categories: a winner and one or more losers. The resulting dilemmas that voters are faced with in siding with a winner manifest themselves several ways:

The bottom line is that the ballot doesn't let people state what they really feel. They can only make a crude approximation of their preference, and then hope that, somehow, the politicians will "get it."

Voters are often forced into a choice between the lesser of two evils. They might not like either candidate, but rather than make a principled stand by voting for none of the above, or a lesser known candidate with no chance of winning, they vote for the major candidate who displeases them least.

Have the strategy problems above ever demonstrably taken the electorate where it didn't want to go? Yes. While Abraham Lincoln is widely considered to be the best U.S. president in history, he owes his victory largely to the strategic error of his foes. He won the gnarled four-way 1860 election with the smallest plurality of any president - 39% of the vote - and the result of that election led to U.S. Civil War. (Of course, if slaves had had the right to vote, the numbers would have been substantially different.)

In our century, the most famous three-way strategy dilemma was when Theodore Roosevelt, angry about losing the Republican nomination, split the Republican Party vote for their 1912 Nominee (and incumbent president) Howard Taft by creating the Bull Moose Party. This allowed Woodrow Wilson to win handily with a mere 42% of the vote (as opposed to Roosevelt's 27% and Taft's 23%). This inspired many states to create "sore loser" laws that keep candidates who fail to win major party nominations from forming third parties, and by making third-party ballot access much more difficult.

Even recently, presidential politics were affected by the three-way split. In 1966, Thomas Finan and Carlton Sickles, two relatively liberal candidates from the left-of-center state of Maryland, split the liberal vote within the state Democratic party gubernatorial nominations. As a result, conservative George P. Mahoney won the Democratic nomination, only to be beaten by Spiro Agnew, who went on to become Richard Nixon's Vice President. When Agnew resigned in 1973, it opened the door for Gerald Ford to be appointed Vice President, and then later President. A tenuous connection to the presidency, but a very real one nonetheless (Anderson, 1994).

And then there are the candidacies that might have been, if only our system hadn't discouraged third parties so much. Rosenstone, Behr, and Lazarus (1984) state that few qualified candidates would run under a third-party label because of the disadvantages they face. They note the bias that third-party candidacies face in the media by quoting James M. Perry of The Wall Street Journal (Rosenstone et al., 1984):

We base [our decision] on the simple proposition that readers don't want to waste their time on someone who won't have a role in the campaign. We're not going to run a page-one spread on a fringe candidate. We don't have a multiparty system. Until we do, nobody's going to cover these candidates.

With such biases built into the system, it is little wonder that third-party candidates can't gain the critical mass of support necessary to become credible contenders. A pragmatic, intelligent potential candidate might look at the seemingly insurmountable odds and simply not run. Thus, the dearth of credible third-party candidates becomes a self-fulfilling prophecy, and the two-party duopoly maintains control of the system.

The Preference Ballot

The solution is simple: convince voters to vote for candidates regardless of their perceived odds of winning. To do this, we must expand the power of the ballot. This are many ways to do this; the method that I will discuss here is the ranked ballot, or "preference ballot," as shown below:

 
2		Fred Flintstone
		Wilma Flintstone
3		Barney Rubble
1		Betty Rubble

The great thing about preference votes is that they're much more expressive than a vote-for-one ballot, allowing them to bargain for a compromise should their top choice be unpopular. This allows people greater flexibility in casting protest votes, while not throwing the election to the most evil candidate (not that Wilma Flintstone is evil - this is just an example).

Ranked ballots are great at limiting voting "strategies" that encourage people to choose candidates based on poll results. But they can't eliminate strategies completely, no matter how the ballots are tallied. Political scientists have debated the relative merits of ranked ballots for years, and many of the discussions have involved Arrow's Impossibility Theorem.

Impossibility Theorems

Political scientists have been debating for some time now about whether or not it's even possible to come up with a way to tally preference votes. Most of the debate started with Arrow's Impossibility Theorem, which claims that any system where people are allowed to freely and exactly list their preferences must have some major defect. Arrow proves this by showing a series of conditions for fairness, not all of which can be satisfied simultaneously.

Arrow's criteria are a bit too complicated to summarize here, but other mathematicians have tweaked and fiddled with the conditions, and have come up their own sets of conditions. Fishburn and Brams (1983) came up with a particularly concise set, listed below:

No-Show Paradox: A voter helps his favored candidate most by not voting.

Thwarted-Majorities Paradox: A candidate who defeats all other candidates in direct-comparison majority votes still loses the election. Also known as the Condorcet criterion, named after the 18th century election theorist who popularized it.

Multiple-Districts Paradox: A candidate wins in every district, but loses the general election.

More-is-less Paradox: If the winner had been ranked higher by some voters, another candidate would have won.

Fishburn and Brams maintain in their 1983 paper that at least one of these four paradoxes will be possible in any election method with a ranked ballot. One may make the case that since all voting systems are vulnerable to at least one of these paradoxes, that a perfect system doesn't exist. Pragmatists counter that it's not necessary to eliminate all of them (Anderson, 1994).

In preference voting, as in anything else, you can't please all of the people all of the time. This means we are stuck with the task of merely minimizing the sticking points rather than pursuing the holy grail of a perfect system. There are many (myself included) who believe that we can relegate the flaws to rare circumstances.

The Borda Method

This is probably the best known method within the United States for tallying ranked ballots. It is used by the Associated Press and the United Press International to determine the champions in NCAA college sports. Sports writers or coaches are asked to rank the 25 best teams, and then the top team on each ballot gets 25 points, the second team gets 24, and so on. The top vote getters are ranked by points received.

This relatively simple method is easy to understand, hence its appeal. However, it discourages people from ranking anything but their top preference, thus making it difficult to derive compromise candidates from their vote. Consider a three way election between Joe Left, Sally Middle, and Martha Right. I'll use this example to describe an election where a reasonable compromise (Sally Middle) exists between two somewhat popular extremes. Given that the seat in question must go to one person and only one, it seems reasonable that the middle candidate be chosen. Suppose the sincere wishes of these voters are as shown below:

Figure 1: Three way Election

Figure 1: Three way Election

If Borda's method is used, where the first place candidate on the ballot receives two points per ballot, and the second place candidate receives one point per ballot, here's what happens:

Figure 2: Total points per Candidate

Figure 2: Total points per Candidate

The good news here is that Borda's method does indeed choose the compromise when everyone votes sincerely. But it's not strategy-free: if Martha Right supporters pay attention to the polls, they can (and should) drop Sally Middle off their ballots. If all Martha Right supporters do this, they trigger a 40 point drop in Sally Middle's Borda score, causing Ms. Right to win.

Even if Martha Right supporters don't do this, it's likely that many supporters of Joe Left will do the same thing if they think that Joe has a shot at winning. Thus Borda picks a compromise when voters naïvely list all of their preferences, but fails when they learn how to beat the system.

Borda's method fails to meet the Condorcet criterion, which is arguably the most important for determining the victor in a single-winner election.

The Hare Method

Dating from 1860, the Hare method is perhaps the best known method for tabulating preference ballots outside the United States. It's used in Australia and Ireland for single-office elections. Preference ballots are tabulated counting only the first-place candidate on each ballot. The candidate with the fewest number of first place votes is eliminated, and every ballot listing that candidate as its first choice has the vote transferred to its second choice.

The Hare method is a popular way of eliminating primaries and allowing people to vote for potentially unpopular alternatives to the two major candidates without fear of wasting one's vote. It does a pretty good job of eliminating strategy and in many ways is a substantial improvement over the American vote-for-only-one system.

Using Hare to tally the results from our election, we tally the first choices to find that Martha Right receives 40% of the vote, Joe Left receives 35%, and Sally Middle is eliminated with only 25% of the vote. The votes for Sally Middle are redistributed based on the second choice of those voters. Joe Left then wins with 51% of the vote.

Thus, Hare falls short when considering popular compromises, such as Sally Middle; like Borda's method, it also fails the Condorcet criterion.

Pairwise Election Methods

Under a class of election methods known as pairwise methods, the election above would result in a different winner. The relative election results of every possible combination of two candidates is tallied and the winner of key pairwise matchups is declared winner of the overall election.

In the above example, the results of the pairwise matchups would be as follows:

Joe Left     (51%) vs Martha Right (49%) 
Sally Middle (60%) vs Martha Right (40%) 
Sally Middle (65%) vs Joe Left     (35%)  

Sally Middle beats both Joe Left and Martha Right, and therefore wins the election overall.

What distinguishes the different pairwise election methods from one another is how they deal with circulaf preferences. A circular preference occurs when one candidate defeats another who in turn defeats our original winner: A beats B beats C beats A. This isn't necessarily a flaw in pairwise systems - one could say it's merely a sign that the electorate is ambivalent. Some theorists, such as Charles Dodgson (a.k.a. Lewis Carroll, author of Alice in Wonderland), claim that if a single winner can't be found, then the election should be called off (Levin and Nalebuff, 1995).

Nonetheless, many pairwise methods have been designed to arbitrate this situation. Three in particular, Condorcet's Method, Smith's Method, and Copeland's Method, deserve mention.

Condorcet's Method

Condorcet's method is probably the most well known. Each voter's list is used to simulate how that voter would have voted in pairwise matchups between each of the candidates on the ballot. Separate tallies of every possible two-way election are calculated, and the winner is the candidate who wins all two-way matchups. Circular preferences are resolved in Condorcet's method by choosing the candidate whose largest pairwise defeat is the smallest, as measured by how many voters explicitly voted for someone else over the candidate.

The reason why many election reformers prefer this method is that, under most plausible circumstances, it solves the "lesser of two evils" problem described above, which many consider to be the litmus test for determining a good pairwise method. However, as Anderson notes, it can produce unexpected results in certain rare circumstances.

Smith's Method

Smith's Method isn't so much a pairwise tie breaker as a method of determining which candidates should qualify for a tie-breaker. The "Smith Set" is the smallest non-zero set of candidates who beat all the candidates outside the set in all pairwise matchups. Not all pairwise methods will pick a member of the Smith Set (most notably, Condorcet's method) yet intuitively, one would hope that would be the case. Smith's method, therefore, makes a good precondition to a tie-breaker such as Condorcet's.

Copeland's Method

Copeland's Method computes the winner of the election by counting the number of pairwise wins, losses, and ties for each candidate. The candidate with the best record wins the election, much in the same way that a sports team with the best record gets the top seed in that sport's playoffs.

One problem with Copeland's Method is that, like Smith's method, it is prone to ties, and so is often paired with another tie breaker. It's also vulnerable when there are three parties locked in a three-way tie. In all likelihood, winning candidates will belong to the party that has the most candidates on the ballot, because they'll win the most pairwise contests, even though many of those victories might come from intraparty matchups. This would encourage parties with sufficient funds to support multiple, similar candidates in order to skew the election in their favor.

There are several other methods that exist for choosing a winner in a preference balloted election, many of which provide a defensible set of criteria. For those of us trying to educate people on alternative election methods, our goal has been to choose the most important criteria and find the election method which best meets those criteria.

So what's all this got to do with Perl?

For many of us who aren't mathematicians by trade, it becomes difficult to debate the relative merits of the different methods without a way of visualizing some examples. The solution was to write a program which illustrates the data in a comprehensible way.

Now it's time to do a little preaching to the choir. I chose to write this program in Perl for several reasons, many of which are all too familiar to Perl aficionados. However, they bear repeating in the context of programs for elections:

I relied heavily on Perl 5 for my program. This is because Perl 5, unlike Perl 4, supports true two-dimensional arrays, helpful for storing pairwise election results.

The Algorithms

Before I talk about Perl specifically, I'll explain the algorithm involved in condorcet.pl.

Consider a sample field of six candidates:

A - John Anderson 
B - Jerry Brown 
C - Bill Clinton 
D - Bob Dole 
E - Dwight Eisenhower 
F - Steve Forbes  

condorcet.pl first creates a 6x6 matrix. The matrix entry at location [x, y] contains the number of votes x received over y. So [A, B] is the number of votes John Anderson received over Jerry Brown, and [B, A] is the number of votes Jerry Brown received over John Anderson.

Each ballot is tallied by determining the pairwise results: who beats whom. So if someone ranks their ballot

A, B, C, E, D, F 

then my program increments [A,B], [A,C], [A,E], [A,D], [A,F], [B,C], [B,E], [B,D], [B,F], [C,E], [C,D], [C,F], [E,D], [E,F], and [D,F], since this is how the voter would have voted in each pairwise election. This assumes that the voter's preferences are transitive, e.g. if they prefer A over B and B over C, that they will necessarily prefer A over C. This lets us simplify the voting process, and ensure a certain consistency among the ballots.

Next, condorcet.pl uses the matrix to determine the pairwise winners. Each complementary matchup is evaluated, and the winner receives one point in the "win" column, and the loser receives one point in the "loss" column. If the simulated pairwise election is a tie, both receive one point in the "tie" column.

Here is one possible outcome:

   Wins Losses Ties 
E    5     0         (Eisenhower beats everyone in
                      separate pairwise elections.)
A    4     1         (A loses to E, but beats everyone
                      else) 
B    3     2         (B loses to A and E) 
C    1     3     1   (C loses to A, E and B, and ties D) 
D    1     3     1   (D loses to A, E and B, and ties C) 
F    0     5         (F loses in all elections)  

This is a clean pairwise victory for Eisenhower. If no candidate emerges unscathed by a pairwise defeat or tie, an alternative method of calculating the winner involves finding the candidate whose worst pairwise defeat was the smallest. For instance, let's modify the table above:

 
         Wins   Losses        Ties 
E       3       3         (E loses to C, D, and F) 
A       4       2         (A loses to E and D) 
B       3       3         (B loses to F, A, and E) 
C       3       3         (C loses to A, E and B) 
D       3       3         (D loses to A, F and B) 
F       2       4         (F loses to A, B, C, and D)  

This is where pairwise methods start to differ. Copeland's method would select A (John Anderson) as the winner, since he has the most wins. In order to calculate the winner in a Condorcet election, we need to look at the matchups where each candidate was defeated. Let's say the election has 1000 votes. The table below shows how the losses for each candidate might tally:

  
E  (495, 505)  (492, 508)  (474, 526)* 
A  (491, 509)  (482, 518)* 
B  (482, 518)  (476, 524)* (492, 508) 
C  (474, 526)* (488, 512)  (490, 510) 
D  (497, 503)  (491, 509)* (493, 507) 
F  (482, 518)  (481, 519)  (477, 523)*  (498, 502) 

* = worst defeat  

In this election, D (Bob Dole) has the smallest "worst defeat" with 509 votes against him, so he'd be the winner using Condorcet's method. This is true in spite of the fact that A lost fewer matchups, and in fact beat D in a pairwise matchup. The theory behind this is that Marquis de Condorcet thought it appropriate to ask the question, "Given that there is no candidate who a majority of the electorate would pick over any other candidate, who is the candidate that a plurality chooses over any other candidate?" No solution to this quandary is going to be particularly satisfying, but many would argue that Condorcet's tie-breaker works about as well as any.

On the Election Methods List (a mailing list where election methods are discussed; see the end of the article for more information), an ASCII notation evolved that works pretty well as shorthand for expressing a bundle of ballots. I've extended that shorthand to make it easily implemented in Perl. We start off associating candidates with an integer by creating a two-column, comma separated list of candidate numbers and names:

1,Joe Left 
2,Sally Middle 
3,Martha Right 
4,Bertha Up 
5,George Down  

Parsing that is trivial. What becomes a bit more interesting is the next portion: a list of candidate numbers separated by '>' when there is a preference, and '=' when there isn't. The ballot below is an example:

Figure 3: This Ballot

Figure 3: This Ballot

Figure 4: Total points per Candidate

Figure 4: 40 Ballots

This ballot would be encoded as 3 > 4 = 5 > 2. That is, 3 is preferred to 4 and 5 is preferred to 2, or Martha preferred to Bertha and George preferred to Sally (and Joe Left an implied last). This value can optionally be prepended by a quantity of voters who voted in that way. For example, coding the example (Three Way Election), we arrive at the following:

  
40: 3 > 2 
9:  2 > 3 
16: 2 > 1 
35: 1 > 2 

Now for some Perl. I'm sure I'll hear of ways to condense this down to one short line, but even the snippet below isn't too bad:

my (@votelist)=();
foreach $tier ( split(/>/, $ballotstring) ) { 
    my (@foo) = split(/=/, $tier); 
    push(@votelist, \@foo);     
}  

This uses the Perl 5 ability to create lists of lists, and creates a structure of an ordered list of tiers, with each tier consisting of equally-ranked candidates. This structure is stored in an object, along with the number of votes (if required by the voting scheme).

The Pairwise Engine

The code at the top of the next page calculates the pairwise tally. The results are stored in the array $self->{tally}; because I don't have to pre-declare its size, it can handle as many candidates as necessary, without predeclaring a ridiculously large array or creating a dynamically allocated structure. This keeps the code relatively simple, although my quasi-object oriented approach makes it harder to read a chunk out of context.

The next stage involves figuring out the winners. For this I create a makeshift voting database, with separate fields for the number of wins, losses, and ties for each candidate, as well as his worst defeat, as measured by the total number of votes against that candidate. Sadly, I wasn't feeling particularly programmer friendly the day I was writing this, and so I made a goofy two-dimensional array rather than an associative array or one-dimensional array with better names. Fortunately, the code isn't too difficult to understand.

Here is the meaning of each of the fields, where $i is the candidate number:

$edata->{results}[$i][0] # defeats for $i
$edata->{results}[$i][1] # ties for $i
$edata->{results}[$i][2] # victories for $i
$edata->{results}[$i][3] # $i's worst defeat
This structure contains all the pairwise tallies, giving us the data we need to calculate the Condorcet, Smith, and Copeland winners. I'll leave it to you to fetch the code if you want to see how all of the methods are calculated, but to give you a taste, here's the code implementing the Copeland method:

sub rank_copeland {     
    my ($self, $edata)=@_;  
    # A Copeland score is computed by doubling
    # the number of victories and adding the
    # number of ties a candidate received.  
    my (@copeland_ranks) = sort {
            -(($self->{results}[$a][2]*2 				
              + $self->{results}[$a][1]) <=>
             ($self->{results}[$b][2]*2 			
              + $self->{results}[$b][1]))
            }     
        $edata->candnum_array;     
    $self->{copeland_ranks} = \@copeland_ranks; 
}  

This passes an anonymous function to Perl's sort() routine, which then returns the candidates' Copeland scores.

Using CGI To Spit It All Out

The best thing about Perl is that , in combination with CGI, it's easy to geneerate nice looking output, even with copious amounts of complicated data. Given the ballots from Total Points per Candidate (Figure 2), I'm able to generate the HTML table below the next section of code (Figure 5).

You needn't run the code yourself to try this out. Just visit:

http://www.eskimo.com/~robla/politics/condorcet.html

sub pairwise_tally {
    my ($self, $votelist) = @_; 

    # $self is an object containing all pairwise election data.
    # @self->{tally} is a 2D array storing the pairwise tally results.
    # $self->{tally}[$candx][$candy] is the number of votes that $candx
    #     received over $candy. 
    # $votelist is a string containing all of the ballots.

    for ( split(/\n/, $votelist) ) {

        # $loservec is a boolean vector with a flag set for all losers,
        # reset with every new ballot.  All are losers until
        # they're listed on a ballot.

        my ($loservec) = $self->{candvec};

        # Parse ballot.  Skip if no ballot is returned.

        (!(my($ballot) = new ballot_obj($self, $_))) && next;

        # @{$ballot->{rankings}} is an array of integers representing the
        # candidates the voter(s) voted for, in order of preference.

        # In addition, $ballot->{quantity} is the number of
        # identical ballots we're considering at this time.

        my (@votelist) = @{$ballot->{rankings}};

        foreach $tier (@votelist) {        # For each preference listed...
            # Remove the chosen candidate(s) from the loser vector.
            foreach $peer (@{$tier}) {
                vec($loservec, $peer, 1) = 0;
            } 

            # For all candidates...
            for ($i = 0; $i <= $#{$self->{candidate}}; $i++) {

                # If said candidate hasn't been listed yet...
                if (vec($loservec, $i, 1)) {

                    # ...they've been beaten by the chosen candidate.
                    # Increment their "votes for the other guy" counter
                    # by the appropriate number of ballots.
                    foreach $peer (@{$tier}) {
                        if (defined($self->{tally}[$peer][$i])) {
                            $self->{tally}[$peer][$i] += $ballot->{quantity};
                        } else {
                            $self->{tally}[$peer][$i] =  $ballot->{quantity};
                        }
                    }
                }
            }
        }
        $self->{total_vote} += $ballot->{quantity};
    }
}

Figure 5: Pairwise Election Results

Figure 5: Pairwise Election Results

Result

The winner is Sally Middle

There's still plenty of work to be done:

Random Thoughts

Flexible and robust election systems have applications beyond their traditional role in government. Local elections, decision making, and shareholder elections are all obvious applications. There's a role for election methods in computer science - one could build a genetic algorithm that generates a pool of voters who submit preference ballots to make decisions).

Sadly, many Americans have been indoctrinated into believing that the American system is the finest in the world and need not be questioned. The vote-for-only-one ballot would be okay if there were only one issue and two points of view, but society, alas, is a bit more complex.

Many pundits and armchair activists talk about how the American system of politics is broken, and then only offer up ways of restricting "the bad guys" as a solution, whether the bad guys are big business, labor unions, or special-interest coalitions. Yet few people are actually really putting the system itself under scrutiny. Which is a shame, considering the consequences of such simplistic feedback mechanisms in how we select our leaders.

References

Amy, Douglas. Real Choices, New Voices. Columbia University Press, 1993.

Anderson, Lowell Bruce. Voting Theory. Handbooks in OR & MS, Vol.6, editors S.M. Pollock, et al. Elsevier Science B.V., 1994.

Barrett, Laurence I. Give Me Your Tired Parties. AllPolitics, March 1996. (http://allpolitics.com/news/email/9603/01/index.shtml)

Fishburn, Peter C. and Steven Brams. Paradoxes of Preferential Voting. Mathematics Magazine, Vol. 56, No. 4, Sept. 1983, pp. 207-214.

Levin, Jonathan and Barry Nalebuff. An Introduction to Vote-Counting Schemes. The Journal of Economic Perspectives, Vol. 9, No. 1, pp. 3-26, Winter 1995.

Niemi, Richard G. and William H. Riker. The Choice of Voting Systems. Scientific American, Vol. 234, No. 6, pp. 21-27.

Riker, W.H. The Two-party System and Duverger's Law: An Essay on the History of Political Science. The American Political Science Review, Vol. 76, pp. 753-766, 1982.

Sites

__END__


Rob Lanphier works at Progressive Networks (home of RealAudio). He spends what little spare time he has promoting alternative voting methods and conning his wife Margaret into creating magazine article graphics (and thanking her for them).
PREVIOUS  TABLE OF CONTENTS  NEXT