Guts: Knee Deep in the Code

Chip Salzenberg

My first exploration of Perl's internals was more of a toe-dip than a high gainer. Back in 1988, I wanted to use Perl 2.0 on a Xenix/286 system, so I ported it. Over the next few years I contributed some minor patches, and one major patch: support for System V interprocess communication (the msg*, sem*, and shm* operators).

Cut to October 1996. Occupied with other matters, Larry Wall left active development to other interested people. Perl 5.003 was the current public version. Andy Dougherty released seven development "subversions" (5.003_01 through 5.003_07), but was unable to continue. Patches started piling up, with no one to collect and order them. Finally, seeing an opportunity to help Perl development move forward, I volunteered to collect patches, issue a few more subversions, and slap a "Perl 5.004" label on the result. I figured it would be a quick (if not easy) job.

Seven months and 45 subversions later, I finally put Perl 5.004 to bed. It wasn't quick, and it certainly wasn't easy, but it was educational. I learned about Perl's internals the hard way, working backwards and forwards through the code, discovering how it worked so that my patches would actually fix bugs instead of making new ones. I'd like to share what I learned.

Parts is Parts

Perl is a complex programming system. Just as humans are single individuals, but can be analyzed usefully in parts, so we can dissect Perl for analysis. What follows is a description of Perl's major organs.

Core. The Perl core is the minimum portion of the Perl distribution that must be compiled and installed for you to run any Perl program. The Perl core is written almost entirely in C.

Since the core contains most of the little-understood guts of Perl, it will be the focus of this column in the future.

Standard Library. The standard library consists of the standard modules, the standard extensions, and pragmas. There are also a few vestigial files left over from Perls of yesteryear.

Perl 5.004 includes eleven standard extensions. Among them are DynaLoader, which performs dynamic loading if your operating system supports it; IO, the recommended interface for file and socket input and output in Perl 5.004; and POSIX, which provides direct access to POSIX system calls. (POSIX is a set of standards for operating systems.)

Configuration and Installation. Perl is an amazingly portable system. It runs on virtually all Unix variants: VMS, OS/2, Windows NT, Windows 95, Plan 9, AmigaOS, and a few others that you've probably never heard of. A significant portion of the Perl distribution is devoted to adapting to the environment in which it is built and installed.

Test Suite. The Perl distribution includes an extensive test suite. It exercises a large fraction of the language and pragmas, a fair fraction of the standard extensions, and a few of the standard modules. (If you know Perl fairly well and you have some free time, the Perl development team would love to have your help extending the test suite to cover more of the Perl distribution.)

Utilities. Perl comes with some auxiliary utility programs that help people make more effective use of Perl. A partial listing:

In general, Perl utilities are useful for developing Perl code, but are never required simply to run Perl programs.

And The Rest... There's more in the Perl distribution, but most of it is documentation of one kind or another, so it isn't a subject of this column.

Nevertheless, for the good of my readers, I must mention the Frequently Asked Questions (FAQ) document, written and maintained by Tom Christiansen and Nathan Torkington. If you read it, I promise you will learn something. (I know I did.) To read it, run the command perldoc perlfaq on any system with Perl 5.004.

Anatomy of a Distribution

Now that you know what goes into a Perl distribution, you're ready to look at the files in a distribution and understand their basic roles. (The wildcard ** is taken from the zsh shell; it means to search all subdirectories, in the style of the Unix find program.)

Files Description
*.h, *.c, *.y, *.pl (but not lib/*.pl!) Core
os2/*, plan9/*, vms/*, cygwin32/*, win32/* Core support for special environments
Configure, hints/*, **/*.SH, installperl Configuration and installation
ext/**/* Standard extensions
lib/**/* Remainder of standard library
utils/*, x2p/*, h2pl/* Utilities
t/**/* Test suite
INSTALL, README*, pod/* Documentation
eg/**/* Bits of sample code (mostly old)


Perl is more than just a language, it's a programming system. What we may think of as "the real Perl" is only a part of it. It is important to remember that the standard library - especially the set of pragmas - is just as much a standard part of Perl as the print operator.

Coming Up

Next time, we'll delve into the organization of the core and some of the fundamental data structures that lay at the, um, core of Perl. Share and enjoy!

_ _END_ _

Chip Salzenberg has been programming for almost twenty years. For most of those years he has promoted free software. He was coordinator and primary programmer for Perl 5.004. His major solo project was Deliver, the free email local delivery agent that allows flexible, safe, and reliable message handling. Chip's hobbies include patching Perl, tending to his six parrots, and memorizing Mystery Science Theater 3000 episodes. (Hikeeba!)+