TABLE OF CONTENTS  NEXT 

Just the FAQs: Coping with Scoping

Mark-Jason Dominus

Packages Used
Just Perl

In the Beginning, some time around 1960, every part of your program had access to all the variables in every other part of the program. This caused a lot of bugs when people forgot where their variables were used, so language designers invented local variables, which were visible in only a small part of the program. That way, programmers who used a variable x could be sure that that nobody was able to tamper with the contents of x behind their back. They could also be sure that by using x they weren't tampering with someone else's variable by mistake.

Every programming language has a philosophy, and these days most of these philosophies have to do with the way the names of variables are managed. Details of which variables are visible to which parts of the program, and what names mean what, and when, are of prime importance. The details vary from somewhat baroque, in languages like Lisp, to extremely baroque, in languages like C++. Perl unfortunately is on the rococo end of this scale.

The problem with Perl isn't that it has no clearly-defined system of name management, but rather that it two systems, both working at once. Here's the Big Secret about Perl variables that most people learn too late: Perl has two completely separate, independent sets of variables. One is left over from Perl 4, the way your vermiform appendix and midbrain are left over from a previous geologic era, and the other set is new. The two sets of variables are called 'package variables' and 'lexical variables', and they have nothing to do with each other.

Package variables came first, so we'll talk about them first. Then we'll see some problems with package variables, and how lexical variables were introduced in Perl 5 to avoid these problems. Finally, we'll see how to get Perl to automatically diagnose places where you might not be getting the variable you meant. That often detects mistakes before they turn into bugs.

PACKAGE VARIABLES

$x = 1

Here, $x is a package variable. There are two important things to know about package variables:

  1. Package variables are what you get if you don't say otherwise.
  2. Package variables are always global.
'Global' means that package variables are always visible everywhere in every program. After you do $x = 1, any other part of the program, even another subroutine defined in another file, can inspect and modify the value of $x. There's no exception to this; package variables are always global. Don't say 'But...'; there are no buts.

Package variables are divided into families, called packages. Every package variable has a name with two parts. The two parts are like the variable's given name and family name. You can call the Vice-President of the United States 'Al', if you want, but that's really short for his full name, which is 'Al Gore'. Similarly, $x has a full name, which is something like $main::x. The main part is the package qualifier, analogous to the 'Gore' part of 'Al Gore'. Al Gore and Al Capone are different people even though they're both named 'Al'. In the same way, $Gore::Al and $Capone::Al are different variables, and $main::x and DBI::x are different variables.

You're always allowed to include the package part of the variable's name, and if you do, Perl will know exactly which variable you mean. But for brevity, you usually like to leave the package qualifier off. What happens if you do?

THE CURRENT PACKAGE

If you just say $x, Perl assumes that you mean the variable $x in the current package. What's the current package? It's normally main, but you can change the current package by writing

package Mypackage;

in your program; from that point on, the current package is Mypackage. The only thing the current package does is affect the interpretation of package variables that you wrote without package names. If the current package is Mypackage, then $x really means $Mypackage::x. If the current package is main, then $x really means $main::x.

If you were writing a module, let's say the MyModule module,you would probably put a line like this at the top of MyModule.pm:

package MyModule;

Thereafter, all the package variables you used in the module file would be in package MyModule, and you could be pretty sure that those variables wouldn't conflict with the variables in the rest of the program. It wouldn't matter if both you and the author of DBI were to use a variable named $x, because one would be $MyModule::x and the other would be $DBI::x.

Remember that package variables are always global. Even if you're not in package DBI, even if you've never heard of package DBI, nothing can stop you from reading from or writing to $DBI::errstr. You don't have to do anything special. $DBI::errstr, like all package variables, is a global variable, and it's available globally; all you have to do is mention its full name to get it. You could even say

package DBI;
$errstr = 'Ha ha Tim!';

in your own file, and that would modify $DBI::errstr.

PACKAGE VARIABLE TRIVIA

There are only three other things to know about package variables, and you might want to skip them on the first reading:

  1. The package with the empty name is the same as main. So $::x is the same as $main::x for any x.

  2. Some variables are always forced to be in package main. For example, if you mention %ENV, Perl assumes that you mean %main::ENV, even if the current package isn't main. If you want %Fred::ENV, you have to say so explicitly, even if the current package is Fred. Other names that are special this way include INC, all the one-punctuation character names like $_ and $$, @ARGV, and STDIN, STDOUT, and STDERR.

  3. Package names, but not variable names, can contain ::. You can have a variable named $DBD::Oracle::x. This means the variable x in the package DBD::Oracle; it has nothing at all to do with the package DBD, which is unrelated. Isaac Newton is not related to Olivia Newton-John, and $Newton::Isaac is not related to $Newton::John::Olivia. Even though they both begin with Newton, the appearance is deceptive. $Newton::John::Olivia is in package Newton::John, not package Newton. The slogan is "Packages do not nest."

That's all there is to know about package variables.

Package variables are global, which is dangerous, because you can never be sure that another part of your program isn't tampering with them behind your back. Up through Perl 4, all variables were package variables, which was worrisome. So Perl 5 added new variables that aren't global.

LEXICAL VARIABLES

Perl's other set of variables are called 'lexical variables' (we'll ee why later) or 'private variables' because they're private. They're also sometimes called 'my variables' because they're always declared with my. It's tempting to call them 'local variables', because their effect is confined to a small part of the program, but don't do that, because people might think you're talking about Perl's local operator, which we'll see later. When you want a local variable, think my, not local.

The declaration

my $x;

creates a new variable, named x, which is totally inaccessible to most parts of the program - the whole program, except for the block in which the variable was declared. This block is called the scope of the variable. If the variable wasn't declared in any block, its scope is the entire file, beginning at the place it was declared. You can also declare and initialize a my variable by writing something like this:

my $x = 119;

You can even declare and initialize several at once:

my ($x, $y, $z, @args) = (5, 23, @_);

Let's see an example of where some private variables will be useful. Consider this subroutine:

sub print_report {
    @employee_list = @_;
    foreach $employee (@employee_list) {
       $salary = lookup_salary($employee);
       print_partial_report($employee, $salary);
    }
}

If lookup_salary() happens to also use a variable named $employee, that's going to be the same variable as the one used in print_report(), and the works might get gummed up. The two programmers responsible for print_report() and lookup_salary() will have to coordinate to make sure they don't use the same variables. That's a pain. In fact, in even a medium-sized project, it's an intolerable pain.

The solution: Use my variables:

sub print_report {
    my my @employee_list = @_;
    foreach my $employee (@employee_list) {
        my $salary = lookup_salary($employee);
        print_partial_report($employee, $salary);
    }
}

my @employee_list creates a new array variable which is totally inaccessible outside the print_report() function. foreach my $employee creates a new scalar variable which is totally inaccessible outside the foreach loop, as does my $salary. You don't have to worry that the other functions in the program are tampering with these variables, because they can't; they don't know where to find them, because the names have different meanings outside the scope of the my declarations.

These my variables are sometimes called 'lexical' because their scope depends only on the program text itself, and not on details of execution, such as what gets executed in what order. You can determine the scope by inspecting the source code without knowing what it does. Whenever you see a variable, look for a my declaration higher up in the same block. If you find one, you can be sure that the variable is inaccessible outside that block. If you don't find a declaration in the smallest block, look at the next larger block that contains it, and so on, until you do find one. If there is no my declaration anywhere, it's a package variable. my variables are not package variables. They're not part of a package, and they don't have package qualifiers. The current package has no effect on the way they're interpreted. Here's an example:

my $x = 17;

package A;
$x = 12;

package B;
$x = 20;

# $x is now 20.
# $A::x and $B::x remain undefined

The declaration my $x = 17 at the top creates a new lexical variable named x whose scope continues to the end of the file. This new meaning of $x overrides the default meaning, which was that $x meant the package variable $x in the current package.

package A changes the current package, but because $x refers to the lexical variable, not to the package variable, $x=12 has no effect on $A::x. Similarly, after package B, $x=20 modifies the lexical variable, and not any of the package variables.

At the end of the file, the lexical variable $x holds 20, and the package variables $main::x, $A::x, and $B::x are still undefined. If you had wanted them, you could still have accessed them explicitly by using their full names.

The maxim you must remember is:

Package variables are global variables.
For private variables, you must use 'my'.

LOCAL AND MY

Almost everyone already knows that there's a local function and imagines that it has something to do with local variables. What is it, and how does it relate to my? The answer is simple, but bizarre:

my creates a local variable.
local doesn't.

First, here's what local $x really does: It saves the current value of the package variable $x in a safe place, and replaces it with a new value, or with undef if no new value was specified. It also arranges for the old value to be restored when control leaves the current block. The variables that it affects are package variables. But package variables are always global, and a local package variable is no exception. To see the difference, try this:

$lo = 'global';
$m = 'global';
A();

sub A {
    local $lo = 'string';
    my $m = 'string';
    B();
}

sub B {
    print "B can",
      ($lo eq 'string' ? 'can' : 'cannot'),
      " see the value of lo set by A.\n";
    print "B can",
      ($m eq 'string' ? 'can' : 'cannot'),
      " see the value of m set by A.\n";
}

This prints

B can see the value of lo set by A.
B cannot see the value of m set by A.

What happened here? The local declaration in A() saved a new temporary value, string, in the package variable $lo. The old value, global, will be restored when A() returns, but before that happens, A() calls B(). B() has no problem accessing the contents of $lo, because $lo is a package variable and package variables are always available everywhere, and so it can detect the local value, string, that was set in A().

In contrast, the my declaration created a new, lexically scoped variable named $m, which is only visible inside of function A(). Outside of A(), $m retains its old meaning:

It refers to the package variable $m, still set to global. This is the variable that B() sees. It can't see the value string, because the variable with the value string is a lexical variable, and only exists inside A().

WHAT GOOD IS LOCAL?

Because local does not actually create local variables, it is not of very much use. If, in the example above, B() happened to modify the value of $lo, then the value set by A() would be overwritten. That is exactly what we don't want to happen. We want each function to have its own variables that are untouchable by the others. This is what my does.

Why have local at all? The answer is 90% history. Early versions of Perl only had global variables. local was very easy to implement, and was added to Perl 4 as a partial solution to the local variable problem. Later, in Perl 5, more work was done, and real local variables were put into the language. But the name 'local' was already taken, so the new feature was invoked with the word my. my was chosen because it suggests privacy, and also because it's very short; the shortness is supposed to encourage you to use it instead of local. my is also faster than local.

WHEN TO USE "MY' AND WHEN TO USE 'LOCAL'

Always use my; never use local.

Wasn't that easy?

OTHER PROPERTIES OF 'MY' VARIABLES

Whenever Perl reaches a my declaration, it creates a new, fresh variable. For example, this code prints x=1 fifty times:

for (1 .. 50) {
    my $x;
    $x++;
    print "x=$x\n";
}

You get a new $x, initialized to undef, every time through the loop.

If the my were outside the loop, control would only pass it once, so there would only be one variable:

  { my $x;
    for (1 .. 50) {
      $x++;
      print "x=$x\n";
  }
}

This prints x=1, x=2, ..., x=50.

You can use this to play a useful trick. Suppose you have a function that needs to remember a value from one call to the next. For example, consider a random number generator. A typical random number generator (like Perl's rand function) has a 'seed' in it. The seed is just a number. When you ask the random number generator for a random number, the function performs some arithmetic operation that scrambles the seed, and it returns the result. It also saves the result and uses it as the seed for the next call.

Here's typical code: (I stole it from the ANSI C standard, but it behaves poorly, so don't use it for anything important.)

$seed = 1;
sub my_rand {
    $seed = int(($seed * 1103515245 + 12345) / 65536) % 32768;
    return $seed;
}

And typical output:

16838
14666
10953
11665
7451
26316
27974
27550

There's a problem here, which is that $seed is a global variable, and that means we have to worry that someone might inadvertently tamper with it. Or they might tamper with it on purpose, which could affect the rest of the program. What if the function were used in a gambling program, and someone tampered with the random number generator?

But we can't declare $seed as a my variable in the function:

sub my_rand {
    my $seed;
    $seed = int(($seed*1103515245 + 12345) / 65536) % 32768;
    return $seed;
}

If we did, it would be initialized to undef every time we called my_rand(). We need it to retain its value between calls.

Here's the solution:

BEGIN {
    my $seed = 1;
    sub my_rand {
      $seed = int(($seed*1103515245 + 12345) / 65536) % 32768;
      return $seed;
    }
}
The declaration is outside the function, so it only happens once, at the time the program is compiled, not every time the function is called. But it's a my variable, and it's in a block, so it's only accessible to code inside the block. my_rand() is the only other thing in the block, so $seed is only accessible to the my_rand() function. We wrapped the whole thing in a BEGIN block to make sure that $seed was properly initialized during compilation.

$seed here is sometimes called a 'static' variable, because it stays the same in between calls to the function. (And because there's a similar feature in the C language that is activated by the static keyword.)

MY VARIABLE TRIVIA

  1. You can't declare a variable with my if its name is a punctuation character, like $_, @_, or $$. You can't declare the backreference variables $1, $2, ... as my. The authors of my thought that that would be too confusing.

  2. Obviously, you can't say my $DBI::errstr, because that's contradictory–it says that the package variable $DBI::errstr is now a lexical variable. But you can say local $DBI::errstr; it saves the current value of $DBI::errstr and arranges for it to be restored at the end of the block.

  3. New in Perl 5.004, you can write

foreach my $i (@list) {

instead, to confine $i to the scope of the loop instead.

Similarly,

for (my $i=0; $i<100; $i++) {

confines the scope of $i to the for loop.

DECLARATIONS

If you're writing a function, and you want it to have private variables, you need to declare the variables with my. What happens if you forget?

sub function {
    $x = 42; # Oops, should have been 'my $x = 42'.
}

In this case, your function modifies the global package variable $x. If you were using that variable for something else, it could be a disaster. Recent versions of Perl have an optional protection against this that you can enable if you want. If you put

use strict 'vars';

at the top of your program, Perl insists that all package variables have an explicit package qualifier. The $x in $x=42 has no such qualifier, so the program won't even compile; instead, the compiler will abort and deliver this error message: Global symbol "$x" requires explicit package name at ... If you wanted $x to be a private variable, you could go back and add the my. If you really wanted to use the global package variable, you could go back and change it to $main::x = 42; or whatever is appropriate. Just saying use strict turns on strict vars, and severalother checks besides. See perldoc strict for more details.

Now suppose you're writing the Algorithms::KnuthBendix module, and you want the protections of strict vars. But you're afraid that you won't be able to finish the module because your fingers are starting to fall off from typing $Algorithms::KnuthBendix::Error all the time.

You can save your fingers and tell strict vars to allow that variable without the full package qualification:

package Algorithms::KnuthBendix;
use vars '$Error';

This exempts $Algorithms::KnuthBendix::Error from triggering a strict vars failure when you refer to it by its short name, $Error.

You can also turn strict vars off for the scope of one block by writing

{ no strict 'vars';
# strict vars is off for the rest of the block.
}

SUMMARY

Package variables are always global. They have a name and a package qualifier. You can omit the package qualifier, in which case Perl uses a default, and you can set that with the package declaration. For private variables, use my. Don't use local; it's obsolete.

You should avoid using global variables because it can be hard to be sure that no two parts of the program are using one another's variables by mistake.

To avoid using global variables by accident, add use strict 'vars' to your program. It checks to make sure that all variables are either declared private, are explicitly qualified with package qualifiers, or are explicitly declared with use vars.

__END__


Mark-Jason Dominus lives in Philadelphia, where he has worked as a programmer and consultant long enough to habitually add or die; to the end of every statement. His new manual page, 'perlreftut', will appear soon in an upcoming version of Perl if it hasn't already, and his article on 'Seven Useful Uses for local' will appear in this space in three months. Sometime in the next year he will probably write a book about something; watch this space for updates as they happen. He likes to get mail, so send him some at mjd-tpj@plover.com...or die!
TABLE OF CONTENTS  NEXT