PREVIOUS  TABLE OF CONTENTS  NEXT 

Perl Heresies

Jon Drukman

Packages Used
Just Perl

Perl has a lot of slogans. Probably the most popular is "There's More than One Way To Do It" (TMTOWTDI). Despite this, some ways of doing things are frowned upon in the Perl community. In this article, I present four heresies for those pariahs who dare go against the established grain.

I have two reasons for disliking the Perl Orthodoxy. First, it discourages the idea that if you know enough Perl to get your job done then you know enough Perl. And programming is an artistic as well as technical discipline, so just as "wrong notes" can make a good piece of music better, programmers should be free to use the "forbidden constructs" if it makes their programs more aesthetically pleasing. However, this comes with a caveat: as with music, if you don't know why you're breaking the rules, you probably shouldn't be.

Heresy #1: Don't use -w.

According to my reading of the 5.004 source, Perl has around 60 optional warnings. If you are a moderately skilled Perl programmer, you will see at most one or two of them on a regular basis.

I rarely use -w. For whatever reason, the kind of mistakes I make are not ones that it catches. The most frequent appearance of -w warnings in my programs are due to DBI queries that return rows with empty columns. Since there's no way to predict in advance which columns will be empty, I'm treated to a slew of Use of uninitialized value at line ... messages.

Since -w generates its warnings based on conditions that change at run-time, it makes the behavior of your programs unpredictable. If your program is sending data to something that merges STDOUT and STDERR like a Netscape web server, this could spell death. If you're doing something interactively, a user unfamiliar with Perl might be unnecessarily alarmed by its warning messages. In these situations I recommend developing with -w and then removing it for production use. Sometimes it's nice to have Perl hold your hand, but sometimes it's impractical.

Don't forget that warnings can be turned on and off within a program.

... with no -w, code here will run without warnings ...

{
  local $^W=1;
  ... code here will run with warnings ...
}

After the curly-brace-delimited block, $^W reverts to its previous value, and warnings will no longer be on.

Heresy #2: Don't always use regular expressions just because they're cool.

Regular expressions are an extremely powerful tool. They can also quickly become difficult to decipher and maintain. Perl has a rich palette of string handling functions, many of which have nothing to do with regular expressions. One of my favorite techniques is using substr as an lvalue. To change the first three characters on a line, you could do:

$string =~ s/^.../abc/;

or you could say:

substr($string,0,3) = 'abc';

The second one is a few characters longer, but to me, much more clear. Unfortunately, clarity sometimes comes at the cost of speed. I benchmarked this script:

use Benchmark;

$string = 'mary had a little lamb.';

timethese (800000, {
  regex = sub { $string =~ s:^...:abc: }
  substr = sub { substr($string,0,3) = 'abc'; }
});

However, the results were inconclusive. On my FreeBSD box, substr had the slight edge. On a Sun Ultra Enterprise 450, the regex was a hair faster. Try it yourself and see.

Heresy #3: Don't always use modules.

Modules are great – they save time and save you from common mistakes. However, they also create dependencies, add to loading time, and sometimes keep you from exploring a subject on your own. One of the first medium-size programs I ever wrote (in BASIC/PLUS on a PDP-11) was a mail program. Of course I didn't write the world's best mailer right out of the gate, but it did work and I learned a lot in the process. Sometimes I feel that Perl's moduleoriented culture prevents people from exploring for the sake of exploring. True, if you're programming on someone else's dime you owe it to them to do the job efficiently, but if you're just messing around, reinventing the wheel can be rewarding.

For instance, the sixth field of localtime contains the current day of the week. If you know this simple fact, you can do a number of useful calculations with simple arithmetic and a little cleverness. What day of the week was yesterday? The answer is:

$yesterday = ( (localtime)[6] - 1 ) % 7;

This returns a number in the range 0-6, with 0 denoting Sunday. One line of code, using one built-in Perl function.

Compare to Date::Manip, weighing in at 5790 lines. Even the author admits there are many situations in which Date::Manip is not practical. On the other hand, if you are trying to handle dates of the form "1st Thursday in June 1992", you're better off using Date::Manip than writing your own parser from scratch.

Sometimes you might want to avoid modules just to get a little practice. HTML::Parser can handle a wide range of possibilities, but if your HTML is machine-generated, you might not need all of its power. If I want to build up a concordance of title tags, I can rely on the regular expression <TITLE>(.*?)</TITLE> to get the job done. Most of the time, I prefer to handle our files by creating regular expressions – since regexps are such a powerful tool, I like to get as much practice with them as possible.

This leads us to a mini-heresy:

Heresy #4: Partial Solutions Are OK

You don't always have to create a generic solution that will solve every possible case. This is more about software design than Perl, but as Perl lends itself to rapid application development so well, there's no reason to fear the rapid part. For one thing, you'll undoubtedly discover new and better ways to solve the problem as you gain experience, and trying to rewrite a huge program that has a fundamental design flaw is harder than expanding a simple program that only requires a few tweaks.

Also, partial solutions can sometimes help you avoid difficult tasks. Consider the ever-popular task of validating an email address. While it's impossible to make sure there's a human on the other end, it is possible to determine if a given address is syntactically invalid. For example, the address jsd@.bud.com is obviously undeliverable, and this is easy to verify with a specific regular expression. We can avoid the complete solution with an instantaneous partial solution that catches a lot of simple typing errors.

__END__


Jon Drukman is a system administrator and techno artist (http://www.cyborganic.com/bass-kittens/). He lives in San Francisco with his wife and cat.
PREVIOUS  TABLE OF CONTENTS  NEXT