PREVIOUS  TABLE OF CONTENTS  NEXT 

Perl Heresies: No, That's Wrong

Jeff Pinyan

This article is a whirlwind tour of some common beginner mistakes, which I've divided into three broad classes: general syntax; confusion between Perl, C, and the shell; storing command output; and idiomatic Perl.

General Syntax

Let's look at three aspects of Perl's syntax that often confuse beginners: slices, stringification, and octal file permissions.

Slices. Let's say you want to access a value in an array or a hash. After skimming Chapter 2 of O'Reilly's Programming Perl, you see that arrays start with a @ and hashes start with a %. So you write a snippet of code that looks something like this:

@names = ("Jeff", "Jon", "Andrea", "Chuck");
$dad = @names[3];

No, that's wrong. Perl's -w switch catches this problem: "Scalar value @names[3] better written as $names[3]...". The problem is that @names[3] denotes an array slice -- you can tell by the @. What you really want is $names[3], which is a true scalar -- you can tell by the $. @array[$index] is a list of one scalar, which is almost never what you want. It only makes sense if you're assigning elements to another list:

@names = ("Jeff", "Jon", "Andrea", "Chuck");
@brothers = @names[1, 3];

This assigns ("Jon", "Chuck") to @brothers.

Likewise, to refer to many hash elements at once, use @hash{$key1, $key2, $key3} and not %hash{$key1, $key2, $key3}. To refer to a single hash element, use a $ to indicate a scalar: $hash{$key1}. You can learn more about these behaviors by typing perldoc perlsyn; the perlsyn documentation is bundled with your Perl distribution.

Stringification. A common mistake by inexperienced programmers occurs when they turn a variable into a string for no reason at all.

print "$var";
$foo = "$bar";
function("$value");

All of these force the scalars into strings, even if they were numbers or references to begin with. This causes big problems if the variable is a reference, because once you convert a reference into a string, you will never again be able to treat it as a real reference again. If you print a reference $ref, you might see something like HASH(0xb5a3c), which tells you that the hash referred to by $foo can be found at memory location 0xb5a3c. But if you say something like $ref = "$ref", then $ref becomes the string "0xb5a3c", and there's no way to get from that to the hash contents.

The autoincrement operator ++ does different things to strings and numbers. C programmers aren't surprised to learn that $x++ adds one to $x if $x is a number, but they're surprised to learn that it works on strings, too: it turns a into b, b into c, and z into aa.

As a twist, consider what happens when you try to autoincrement a string that begins with digits. Perl has to treat the string as a number, and it does that by disregarding everything after those initial digits. For our example, we'll use 0x123456, which is a hexadecimal number (base 16).

# The wrong way
$val = "0x123456";   # oops, meant it to be a hex number
$val++;              # strips everything after the '0'
print $val;          # Now $val is 1.


# The right way
$val = 0x123456;     # now THAT'S hexadecimal
print ++$val;        # prints the decimal representation
1193047

Octal File Permissions. Several Perl functions (chmod, mkdir, and umask) expect a file permission as an argument. These need to be in octal -- base 8 -- which means that you generally want to provide a leading 0. The problem is that many people don't, because when they use chmod in Unix shells, they don't have to. Even worse, some people don't know what the bits in 0754 signify. See your Unix system's chmod documentation for details: in this case, the 7 lets you read, write, and execute the file; the 5 lets people in your Unix group read and execute the file, and the 4 lets everyone else merely read the file.

Again, this is one of those situations where the -w switch helps. The following code produces an error if -w is on:

chmod(644, $file);
chmod: mode argument is missing initial 0

You can turn 644 into 0664 very easily: just use the oct() function. These two lines do the same thing:

chmod(0664, $file);
chmod(oct(664), $file);

As an aside, the oct() function assumes its argument is octal (or hexadecimal, if it starts with a 0x) and returns the corresponding decimal value. hex() assumes its argument is hexadecimal, and returns the corresponding decimal number.

The File::chmod module, available on CPAN, lets you use letters instead of numbers to specify permissions. Instead of extracting the permissions of a file and modifying them as a number, you can change the permissions more intuitively, using r, w, and x to specify readability, writability, and executability.

Confusing Perl, C, and Unix Shells. A common mistake made by C and shell programmers is that they don't partition their brains correctly, and some of their C or shell knowledge leaks into the Perl portion of their grey matter. This is especially notable with command-line arguments, $0, and control structures.

Command-Line Arguments. Unless you're using the -s flag or an options parsing module like Getopts, Perl stores its command-line arguments in the @ARGV array. @ARGV is a true global variable; that is, no matter what package you're in, @ARGV is the same as @main::ARGV. The first argument is index 0, the last is index $#ARGV: if your program is invoked as myprog foo bar baz bletch, then $ARGV[0] is foo and $ARGV[$#ARGV] is the same as $ARGV[3]: that is, bletch.

In C, the arguments are stored in argv[], but the first element in the list is the name of the program. C stores the number of arguments passed to the program in argc, so argv[1,2,3,...,argc] holds all the arguments to the program. In shells, arguments are stored in $1, $2, $3. Perl uses those to store whatever matched the parentheses in the last successful regular expression match.

$0. $0 holds the name of the currently running Perl program. For those of you accustomed to using the English module, it's called $PROGRAM_NAME. Most shells use $0 in the same way. However, C uses the first element in the argv array to store the program name. Often I see Perl programmers mistakenly use $ARGV[0] when they really mean $0.

Control Structures. You can always tell when a person was just programming C, because they'll ask how to do a switch statement. There is information in the perlsyn documentation, and Tom Christiansen has a response to the question at http://mox.perl.com/misc/fmswitch . There are multiple ways of creating a switch-like control structure, such as using a for loop or a series of if-elsif-else statements.

Perl, C, and the various shells all have different ways of spelling elsif, too. In Perl, the word is elsif; in C, else if; and in the sh shell, the word is elif ("file" spelled backwards).

Storing Command Output

Perl has three primary ways to call system commands: system, backquotes, and exec. These are often sources of confusion for inexperienced programmers. The system() function executes a command in the shell, printing to STDOUT whatever that command would print had you typed it from the command line. It does not return what it prints, as many beginners expect, but instead the return value of the command: typically zero for success, non-zero for failure. This snippet shows you how not to get the date from your system:

$date = system("/usr/bin/date");

What that just did was set $date to 0. A better way to get the date from the system is to use backquotes:

chomp($date = '/usr/bin/date');

(The best way of all is to use Perl's localtime and gmtime functions, or one of the various time modules, or even the lowly ctime.pl bundled with the Perl distribution. That way you don't depend on your system having a date program in /usr/bin.)

The backquotes cause the program to return the standard output (with newlines included -- that's why we chomp). The qx() operator behaves the same way. Backquotes behave a little differently depending on whether you use them in scalar context or list context. In scalar context ($date = 'date'), multiline input is stored as a single string of text, with newlines at the end of each line. In list context (@date = 'date'), yield a list of lines, sensitive to the $/ (or $INPUT_LINE_SEPARATOR) variable.

Finally, there's exec(). I see this used much too often. It ends your program and replaces it with something else. Below, the print statement will never be called.

$date = exec "/usr/bin/date";
print "Today's date is $date";

Idiomatic Perl

In this last section, I'll talk about two Perl idioms: operating on arrays, and handling nested quotes.

Operating on Arrays. Another way to tell if someone's been programming in C is to look at how they process array elements. In C, you'll often see code that looks like this:

for (int i = 0; i < sizeof(array); i++) {
   char c = array[i];
   /* et cetera */
}

Then they write Perl the same way, and you get something like this:

$size = @array;
for ($i = 0; $i < $size; $i++){
   $element = $array[$i];
   # et cetera
}

That works, but Perl has a nicer way: foreach. It's actually synonymous with for, but is usually used to directly loop over arrays without a temporary variable ($i, above). These are all equivalent:

for (@array) { ... }
foreach (@array) { ... }
for $element (@array) { ... }
foreach $element (@array) { ... }

__END__


Jeff Pinyan, a pre-frosh at Rensselaer Polytechnic Institute, was never near the grassy knoll.


PREVIOUS  TABLE OF CONTENTS  NEXT