Hacking the Perl Core - The Perl Journal, Winter 1999

Nathan Torkington

This document (available as perlhack.pod in the Perl distribution) explains how Perl development takes place, ending with some suggestions for people wanting to become bona fide porters.

The perl5-porters mailing list is where the Perl standard distribution is maintained and developed. The list gets anywhere from 10 to 150 messages a day, depending on the heatedness of the debate. Most days there are two or three patches, extensions, features, or bugs being discussed at a time. A searchable archive of the list is at:

    http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/

The list is also archived under the Usenet group name perl.porters-gw at:

    http://www.deja.com/

List subscribers (the porters themselves) come in several flavors. Some are quiet curious lurkers, who rarely pitch in and instead watch the ongoing development to ensure they're forewarned of new changes or features in Perl. Some represent vendors, who are there to make sure that Perl continues to compile and work on their platforms. Some patch any reported bug that they know how to fix, some are actively patching their pet area (threads, Win32, the regex engine), while others seem to do nothing but complain. In other words, it's your usual mix of technical people.

Over this group of porters presides Larry Wall. He has the final word in what does and does not change in the Perl language. Various releases of Perl are shepherded by a pumpking, a porter responsible for gathering patches and deciding on a patch-by-patch and feature-by-feature basis what goes into the release. For instance, Gurusamy Sarathy is the pumpking for the 5.6 release of Perl.

In addition, various people are pumpkings for different areas. For instance, Andy Dougherty and Jarkko Hietaniemi share the Configure pumpkin, and Tom Christiansen is the documentation pumpking.

Larry sees Perl development as emulating the U.S. government: there's the Legislature (the porters), the Executive branch (the pumpkings), and the Supreme Court (Larry). The legislature can discuss and submit patches to the executive branch all they like, but the executive branch is free to veto them. Rarely, the Supreme Court will side with the executive branch over the legislature, or the legislature over the executive branch. Mostly, the legislature and the executive branch are supposed to get along and work out their differences without impeachments or lawsuits.

You might sometimes see people talk about Rule 1 or Rule 2. Larry's power as Supreme Court is expressed in The Rules:

Larry is always by definition right about how Perl should behave. This means he has final veto power on the core functionality.
Larry is allowed to change his mind about any matter at a later date, regardless of whether he previously invoked Rule 1.

Got that? Larry is always right, even when he was wrong. It's rare to see either Rule exercised, but they are often alluded to.

New features and extensions to the language are contentious, because the criteria used by the pumpkings, Larry, and other porters are not codified in a few small design goals, as with some other languages. Instead, the heuristics are flexible and often difficult to fathom. Here is one person's list, roughly in decreasing order of importance, of heuristics that new features have to be weighed against:

• Does concept match the general goals of Perl?

These haven't been written anywhere in stone, but one approximation is:

Keep it fast, simple, and useful.
Keep features/concepts as orthogonal as possible.
No arbitrary limits (platforms, data sizes, cultures).
Keep it open and exciting to use/patch/advocate Perl everywhere.
Either assimilate new technologies, or build bridges to them.

• Where is the implementation?

All the talk in the world is useless without an implementation. In almost every case, the person or people who argue for a new feature will be expected to implement it. Porters capable of coding new features have their own responsibilities, and won't necessarily be available to implement your idea, no matter how good.

• Backwards compatibility

It's a cardinal sin to break existing Perl programs. New warnings are contentious — some say that a program that emits warnings is not broken, while others say it is. Adding keywords has the potential to break programs, and changing the meaning of existing token sequences or functions might break programs.

• Could it be a module instead?

Perl 5 has extension mechanisms — modules and XS — specifically to avoid the need to keep changing the Perl interpreter. You can write modules that export functions, you can give those functions prototypes so they can be called like built-in functions, you can even write XS code to mess with the runtime data structures of the Perl interpreter if you want to implement really complicated things. If it can be done in a module instead of in the core, it's highly unlikely to be added.

• Is the feature generic enough?

Is this something that only the submitter wants added to the language, or would it be broadly useful? Sometimes, instead of adding a feature with a tight focus, the porters might decide to wait until someone implements a more generalized feature. For instance, instead of implementing a "delayed evaluation" feature, the porters are waiting for a macro system that would permit delayed evaluation and much more.

• Does it potentially introduce new bugs?

Radical rewrites of large chunks of the Perl interpreter have the potential to introduce new bugs. The smaller and more localized the change, the better.

• Does it preclude other desirable features?

A patch is likely to be rejected if it closes off future avenues of development. For instance, a patch that placed a true and final interpretation on prototypes is likely to be rejected because there are still options for future prototypes that haven't been addressed.

• Is the implementation robust?

Good patches (tight, complete, correct code) stand more chance of going in. Sloppy or incorrect patches might be placed on the back burner until the pumpking has time to fix them, or might be discarded altogether without further notice.

• Is the implementation generic enough to be portable?

The worst patches make use of a system-specific feature. It's highly unlikely that nonportable additions to the Perl language will be accepted.

• Is there enough documentation?

Patches without documentation are probably ill-thought out or incomplete. Nothing can be added without documentation, so submitting a patch for the appropriate pod pages as well as the source code is always a good idea. If appropriate, patches should add to Perl's test suite as well.

• Is there another way to do it?

Larry has said "Although the Perl Slogan is There's More Than One Way to Do It, I hesitate to make 10 ways to do something." This is a tricky heuristic to navigate, though -- one man's essential addition is another man's pointless cruft.

• Does it create too much work?

This can mean work for the pumpking, work for Perl programmers, or work for module authors. Perl is supposed to be easy.

• Patches speak louder than words

Working code is always preferred to pie-in-the-sky ideas. A patch to add a feature stands a much higher chance of making it to the language than a random feature request, no matter how fervently argued the request might be. This ties into "Will it be useful?", as the fact that someone took the time to make the patch demonstrates a strong desire for the feature.

If you're on the list, you might hear the word "core" bandied around. It refers to the standard distribution. "Hacking on the core" means you're changing the C source code of the Perl interpreter. "A core module" is one that ships with Perl.

The source code to the Perl interpreter, in its different versions, is kept in a repository managed by a revision control system (currently the Perforce program, see http://perforce.com/). The pumpkings and a few others have access to the repository to check in changes. Periodically the pumpking for the development version of Perl will release a new version, so the rest of the porters can see what's changed. Plans are underway for a repository viewer, and for anonymous CVS access to the latest versions.

Always submit patches to perl5-porters@perl.org. This lets other porters review your patch, which catches a surprising number of errors in patches. Either use the diff program (available in source code form from ftp://ftp.gnu.org/pub/gnu/), or use Johan Vromans' makepatch utility, described in TPJ #15 and available from CPAN/authors/id/JV/). Unified diffs are preferred, but context diffs are accepted. Do not send RCS-style diffs or diffs without context lines. More information is given in the Porting/patching.pod file in the Perl source distribution. Please patch against the latest development version (e.g., if you're fixing a bug in the 5.005 track, patch against the latest 5.005_5x version). Only patches that survive the heat of the development branch get applied to maintenance versions.

Your patch should also update the documentation and test suite.

To report a bug in Perl, use the perlbug program bundled with Perl (if you can't get Perl to work, send mail to the address perlbug@perl.com or perlbug@perl.org). Reporting bugs through perlbug feeds into our automated bug-tracking system, access to which is provided through the web at http://bugs.perl.org/. It often pays to check the archives of the perl5-porters mailing list to see whether the bug you're reporting has been reported before, and if so whether it was considered a bug.

The CPAN testers (http://testers.cpan.org/) are a group of volunteers who test CPAN modules on a variety of platforms. Perl Labs (http://labs.perl.org/) automatically tests Perl source releases on platforms and gives feedback to the CPAN testers mailing list. Both efforts welcome volunteers.

To become an active and patching Perl porter, you'll need to learn how Perl works on the inside. Chip Salzenberg, a pumpking, has written articles on Perl internals for TPJ explaining how various parts of the Perl interpreter work. The perlguts documentation explains the internal data structures. And, of course, the C source code (sometimes sparsely commented, sometimes commented well) is a great place to start (begin with perl.c and take it from there). A lot of the style of the Perl source is explained in the Porting/pumpkin.pod file in the source distribution.

It is essential that you be comfortable using a good debugger (such as gdb or dbx) before you can patch perl. Stepping through Perl as it executes a script is perhaps the best (if sometimes tedious) way to gain a precise understanding of the overall architecture of the language.

If you build a version of the Perl interpreter with -DDEBUGGING, Perl's -D command line flag emits copious debugging information (see the perlrun manpage). If you build a version of Perl with compiler debugging information (typically with your C compiler's -g option instead of -O), you can step through the execution of the interpreter with your favorite C symbolic debugger, setting breakpoints on particular functions.

It's a good idea to read and lurk for a while before chipping in. That way you'll get to see the dynamic of the conversations, learn the personalities of the players, and hopefully be better prepared to make a useful contribution when do you speak up.

If after all this you still think you want to join the perl5-porters mailing list, send mail to perl5-porters-subscribe@perl.org. To unsubscribe, send mail to perl5-porters-unsubscribe@perl.org.

__END__

Nathan Torkington is a miserable husk of a man. Although most people don't know about his intellectual shortcomings and speech impediment, most have noticed the ravages caused by his desperate search for a "miracle cure" to his ongoing glandular problem. With so many personal issues, it's no wonder the poor guy said "Author bio? Can't you write it, Jon?"

TABLE OF CONTENTS