PREVIOUS  TABLE OF CONTENTS  NEXT 

Perfect Programming

Nathan Torkington

Imagine a world ten years from now. Programmers know everything there is to know about their language, algorithms and requirements. They apply this knowledge to produce flawless programs, which work correctly the first and every time. Users read the manuals, never provide false or misleading input, and always know what to do next. Clients never change their minds and maintenance is unnecessary.

You can wake up now. We both know this won't happen so long as boneheads like us keep programming, morons like our customers keep giving us incomplete and perpetually changing requirements, and the prerequisite for being a user is that you demonstrate zero ability to read, think, or act without tech support or a programmer holding your hand. Everyone in the programmer-client-user world is a weak link, and programmers must be prepared for each person to make mistakes.

User mistakes. When users are to blame, it's typically because they do something like providing incorrect input to your program, or calling your program in an unexpected way. Paranoid programmers check everything provided by the users (and use the taint mechanism to help them). This has the side benefit of making their programs more secure against exploitation by The Bad Guys. The Bad Guys like to mess with a program's environment, input, and configuration files, in the hope they can trick it into displaying /etc/master.passwd, or changing the permissions of /bin/sh to 4755, making it setuid.

Client mistakes. Customers are fickle. Sometimes they request minor changes ("we want to sort the addresses by zipcode"); sometimes the changes are major ("the CEO just bought an Oracle database. Use it."). Changes run the risk of breaking software which previously worked. The programmer must write code in such a way that substantial changes in behavior can be implemented with minimum risk.

Programmer mistakes. Finally, as unwilling as we all are to accept it, programmers make mistakes. They're typically things like using variables which don't yet have a value, giving incorrect values to a function, and language misunderstandings like $#array vs @array. Programmers who believe in their own fallibility (does the Pope program in Perl?) write code that checks its values, checks return values from system calls, and uses tools like -w and use strict. These humble programmers also know how to debug when all else goes wrong.

Maurice Wilks, 1949

What follows is a list of techniques that I've found useful in real programs. The larger the program you want to write, the more desirable these techniques become. One-liners, or even five-pagers, are short and uncomplicated enough that debugging them is easy. That can't be said for some of the 10,000 line multi-module nightmares that I've given birth to. Consider these techniques your armory for the fight.

Warnings with -w

This is the programmer's most useful debugging aid. As Larry says, Perl's biggest bug is that -w is optional. Some of the things that a hashbang line of #!/usr/bin/perl -w will catch are: use of undefined values (typically a sign that you're expecting a variable to have a value when it doesn't), non-numeric arguments (a string was given instead of a number, which probably means it would be interpreted as 0 instead of being flagged as an error), = instead of ==, and much more.

Sometimes you want -w checks in some places but not others. If there's a chunk of code you just know will work even though -w complains about it, you can disable warnings as follows:

{ 
    local($^W) = 0;	  	# disable warnings... 
    your code here 
}                               # warnings back on now

This traps only run-time warnings. Disabling compile-time warnings won't be possible until later versions of Perl.

There has been a vigorous debate on the subject of -w in production programs. New versions of Perl have created new warnings, which show up as "errors" (broken web pages, strange cron mailings, STDERR sent to users' screens) in programs which previously worked. Tracking these down can be a non-trivial task. I like to keep my code -w clean for all versions, because it makes future changes easier to test with -w. Your mileage may vary.

The STRICT pragma

If you're using references or trying to write maintainable or reusable code, you probably want to use strict. This is a shorthand for use strict 'refs', 'subs', 'vars', which catches the following things:

use strict 'refs' prevents suspicious dereferences. If a subroutine expects a hard reference to a value (the kind of reference you get with \), but you supply it the wrong arguments or the right arguments in the wrong order, you can cause a string or a number to be inadvertently dereferenced. Consider this code:

  sub setref { 
      my $string_ref = shift; 
      my $string     = shift;
      $$string_ref   = $string; 
  }

  # wrong argument order
  setref("Googol", $plexref);	

Here, the setref() subroutine is passed "Googol" where it expects a reference to a string. Without use strict 'refs', Perl assumes you meant $Googol. This is called a soft, or symbolic, reference. When you use that pragma, however, Perl whines and dies. Because soft references are almost never needed, use strict 'refs' catches a lot of errors that would otherwise silently cause bizarre behavior.

use strict 'vars' catches stray variables. It expects you to either qualify every variable completely ($Package::Var) or to declare them with my(). In almost every case, you really want to use my() to scope your variable so that code outside the file or block can't perturb its value. Using my() to predeclare all variables (or using cumbersome fully-qualified variable names) will predispose you to document your variables for the hapless fool who must modify your program in a year's time. Don't laugh. It might be you.

if ($core->active) { 
    my $rems;		# active radiation in rems 
    my $rod_volume;	# volume of carbon rod remaining
    your code here
}

use strict 'subs' forbids stray barewords. When it's in effect, you can't use the bareword style of calling subroutines with no arguments (e.g. $result = mysub;) unless the subroutine was declared before its use, either with a prototype or with the subroutine definition itself. If you don't want to predeclare, you must preface the subroutine call with & or append () so that it looks like a subroutine call. This doesn't affect the use of barewords in hashes in curly braces (e.g. $hash{key}) or on the left side of the => symbol (e.g. %hash = (key => value)).

use strict 'subs';

print count;	  # an error with use strict 'subs'

sub count;  	  # prototyping count() is sufficient

# Not an error because Perl now knows about count()
print count;

Note that simply saying sub mysub; before using the bareword mysub is enough to keep use strict 'subs' quiet.

Tainting And Safe

When Perl encounters a variable whose value hasn't been hard-coded into the program, it marks the variable as tainted if the program is running under the -T flag, or if the program's permissions are setuid (meaning that it assumes the identity of its owner rather than whoever is running the program). Use of a tainted value in exec() or similar calls, or opening for writing a filename, causes a fatal error. To untaint data, you should extract the safe portion (for a filename, that might be /^([\w.\@-]+)$/) with a regular expression and use $1, $2, ... to access the part of the tainted variable guaranteed to be safe. Full details can be found in the perlsec manual page.

Running with -T is almost always a good idea when you're programming defensively. It forces you to validate every piece of user-supplied data with regular expressions before you use them. Not only does this guard against potentially security-compromising errors, it also lets you catch situations where the user gives the wrong type of data (a string instead of a number, for instance).

A different approach is to use the Safe module, which traps certain operations. You can run code that uses untrustworthy data inside a Safe "compartment," knowing that it can't unlink() files, fork() processes, or do other nasty things.

Checking Return Values

Not every fork() will succeed, not every file can be opened, not every child process terminates without error. The return values from system calls contain valuable information on the success or failure of those calls - check them!

The most important things to check are return values of open(), fork(), exec(), and the contents of $? ($CHILD_ERROR if you use English - and why aren't you?( A good reason is that use English slows your program down, because it mentions $& and $', whose very presence retards every regular expression in your program.)).

The same wisdom applies to CPAN or library modules, and to your own modules. Your modules should perform sanity checks and return 0 or undef if something went wrong.

Planning For Failure

Part of catching errors is deciding what to do when they occur. Even before I begin programming, I enumerate the various ways my code can fail, and then decide what to do for each possibility. With some errors it's okay to tell the user exactly what went wrong ("you gave me the name of a user who isn't in the database") but others shouldn't be made so public ("the database doesn't exist", or "I couldn't fork."). User errors typically warrant a message that pats their hand and gives them a chance to try again. System errors should be logged to a file, the administrators notified, and the user told that "the system is down" and they should try again later.

The Perl Debugger

There is only so much that stack traces and strategically placed print() statements can do. When you've located the problem, it can still be difficult to infer the cause. My next step is to write a small program that exhibits the bug and then steps through it with Perl's symbolic debugger (perl -d mysmallprogram). Of course, you can always invoke the debugger directly with perl -de 0 to initiate an interactive session.

Debugging will be most comfortable if you've installed the Term::ReadLine module, or if you use the Ilya Zakharevich's nice Emacs interface, cperl-mode.el. Even without these whizzy utilities, the debugger is still useful. You can step through your code and set breakpoints - locations in your program at which execution stops, giving you a chance to inspect or change variables, thus letting you discover the particular states that trigger the bug you're trying to fix. Consult the perldebug documentation for more information.

The Perl Profiler

When your program works, but runs as slow as a dog, Dean Roehrich's Devel::DProf module (available on the CPAN) will help you determine why. perl -d:DProf myprogram runs your program and creates a file called tmon.out in your current working directory. You then run the dprofpp program to analyze that file and display the fifteen subroutines occupying the most time.

There are other features of the profiler (see the dprofpp documentation for more information) but this list of the most time-consuming subroutines is probably the most important. It pinpoints the parts of your program that use the most time, and hence most suited for optimizing, rewriting, inlining, or avoiding.

Stack Traces

Generating a Stack Trace in a CGI Script

The terse little warnings and die() messages that you're provided are often not sufficient when it comes to working out where things went wrong. For that you need the awesome power of Jack Shirazi's Devel::DumpStack. When I'm debugging a CGI script that refuses to play ball, I'll use the above code (try it on the TPJ web site, or at http://www.frii.com/~gnat/perl/articles/tpj7/stack.cgi), which traps warnings and fatal errors, displaying them in an HTML document instead of burying them in a web server error log. I wouldn't recommend leaving this code in your final product, however. The sight of a stack dump can mentally scar a user for life.

_ _END_ _


Nathan Torkington (gnat@frii.com) cowrote the latest Perl FAQ with Tom Christiansen, is cowriting "The Perl Cookbook" with Tom Christiansen, and is currently doing system administration and Perl programming for Front Range Internet, an ISP in Fort Collins, Colorado (without Tom Christiansen).
PREVIOUS  TABLE OF CONTENTS  NEXT