Threadsafing a Module - The Perl Journal, Fall 1999

Dan Sugalski

If you've played around much with threaded Perl, you've probably already discovered that many of the modules available, including some that ship with the base Perl distribution, aren't thread safe. And if you've written any modules that have been released to CPAN, you've probably already gotten mail from someone asking "Is your module thread safe?" This, of course, begs the question "How do I make my module thread safe?" a question that can seem pretty overwhelming, especially if you've got no experience with threads.

What we're going to do in this article is show you what you need to do to make your Perl module thread safe. If you correctly implement everything we cover here, your module should be fine. Do be aware that we're talking strictly about making a module thread safe with a minimum of fuss. This is just the first, and easiest, step in taking full advantage of threads.

Locking Your Internal Data

The first step to making your module thread safe is to use lock() to coordinate access to variables that might be accessed from more than one thread simultaneously. This includes package variables, lexicals declared outside the scope of a subroutine, and variables you access by reference.

If your module will be used with Perl 5.005 or higher, internal locking is simple. The lock() function locks a variable if your Perl was built with threads, and is a no-op otherwise. That's the easy part.

If your module is going to run on versions of Perl 5.004 and below, things get a bit trickier. The easiest thing to do is put this piece of code at the beginning of the module:

BEGIN {
    sub fakelock {};
    if ($] < 5.005) {# $] holds the version number of your Perl
        *lock = \&fakelock;
    }
}

This code will create a lock() subroutine that does nothing if you're running on a version of Perl below 5.005.

Once you're set with a lock() subroutine, just scatter them throughout your code wherever you need to lock things. The standard locking rules apply, of course, so if you're going to have several blocks that lock multiple variables, you'll want to make sure you lock them in the same order in each block.

Locking Your External Data

Purely internal locking is relatively simple, since you've got full control over the source and can make whatever changes you need. Things get a bit tricker if your module uses package variables as part of its documented public interface. When that's the case, you're in the unenviable position of making sure code that's not under your control synchronizes access to shared resources.

Luckily, Perl provides you with a way to fix this. The answer is to tie your globals. This slows down your code, but as you're probably not accessing the globals that much the safety tradeoff is worth it. The following code chunk demonstrates one way to do this, tying the two variables $DEBUG and $BEHAVIOR.

# This should be your real package name
package MyPackage;
use Config;

# Predeclare the variables you want to protect
use vars qw($DEBUG $BEHAVIOR);

BEGIN {
# This only needs to be different from your main package name
# if your main package can be tied to things.
    package MyPackage::ThrSafe;
    sub TIESCALAR {
       my $var;
       my $class = shift;
       return bless \$var, $class;
    }
    sub FETCH {
        my $var = shift;
        lock $var; # Lock goes up one level of reference
        return $$var;
    }
    sub STORE {
        my ($var, $val) = @_;
        lock $var;
        $$var = $val;
        return $val;
    }

# Tie the global variables to our threadsafing package
    if ($Config{usethreads}) {
        tie $DEBUG, 'MyPackage::ThrSafe';
        tie $BEHAVIOR, 'MyPackage::ThrSafe';
    }
}

As you can see from the example, the tie code is very simple; just enough to wrap a lock around the variable access. We also only tie $DEBUG and $BEHAVIOR if we're actually running on a threaded Perl (that is, if $Config{usethreads} is true). And, since the locks are only held for the duration of the subroutines, we don't even need any DESTROY code to clean things up.

You may be tempted to do the tying only if the Thread module has acually been used. That's not a safe thing to do, though - our module might have been used before the Thread module, or the Thread module might get loaded in at runtime via do, require, or eval. Locking Your Code

Sometimes it's more appropriate to lock code rather than data. You might, for example, have a subroutine that updates a configuration file, and the last thing that you want is to have multiple threads running at once. And it's often much simpler to lock a single subroutine rather than lock dozens of variables.

Locking a subroutine is simple. If you're running with Perl 5.005 or higher, make the first line of your subroutine

    use attrs qw(locked);

and Perl will ensure that only one thread is in the subroutine at any one time. If your code might run on older versions of Perl, though, you don't want to do that. Instead, make the first line of the subroutine

    lock(\&subname);

where subname is the name of the subroutine being locked. The use attrs method is slightly faster, but the speed difference isn't that noticeable unless you're doing a lot of subroutine locking.

Once the subroutine is locked, you can be sure that no other thread can enter it until the lock is released. Subroutine locks, by the way, are the only mandatory locks in Perl - when a thread locks a subroutine, Perl enforces that lock and will not let any other thread into that subroutine until the lock is released. While this isn't that big a deal if the subroutine lock is inside the subroutine (like we're talking about here), it can be an issue if you lock the subroutine someplace else.

Locking Your Methods

If your code is object oriented, it's more useful to use method locks instead of subroutine locks. This allows multiple objects to be running the same subroutine simultaneously, but each object will only be in one locked subroutine in a single thread.

Once again, Perl 5.005 and higher provide this functionality. All you need to do to get Perl to use method locking rather than subroutine locking is to make this the first line of your subroutine:

     use attrs qw(locked method);

and Perl will automatically use method locking instead of subroutine locking. If this subroutine is called as a method on an object, Perl will lock the object. If called as a static method, Perl locks the whole stash. (The stash, for those not familiar with Perl's guts, is a hash that holds a package's global variables and subroutines.)

This makes duplicating the method locking behavior a bit trickier. The code to do so looks like this:

   package MyPackage::SubPackage;
   sub locked_method {
       my $obj = shift;
       # Lock the object if we got one
       lock $obj                                if ref($obj);
       # Lock the stash if we didn't
       lock $::{'MyPackage::'}{'SubPackage::'}  unless ref($obj);
       # Do your stuff here while the locks are still in scope
   }

You'll need to update the stash lock line depending on what package the subroutine is actually in. While it's possible to determine this at runtime, it's pretty expensive, and Perl's method calls hurt enough as it is.

One thing you'll notice here is that we're getting a lock just on the object or stash. Nothing special is done to match up the subroutine and object, or subroutine and stash. This is consistent with Perl's behavior - entering a locked method for an object or package prevents any other thread from entering a locked method for that object or package.

Coarser Locking

Locking individual variables or subroutines is fine, but there are times when you need a coarser locking scheme. You may, for example, have groups of variables that are always locked en masse, or something such as a filehandle that can't be locked. While the best way to do this is with semaphores from Thread::Semaphore, that method has the disadvantage of not working on non-threaded Perl. For cross-version compatibility you're best off using a file-scoped lexical or two and coordinating your locks with them. For example:

      package MyPackage;
      my $package_lock;

      sub foo {
        lock $package_lock;
        # Do stuff
      }

      sub bar {
        lock $package_lock;
        # More stuff
      }

      sub baz {
        # Just do stuff without locks
      }

Deadlocks

A deadlock occurs when two threads have each locked a resource, and then each blocks trying to lock the resource the other owns. Deadlocks with threads are nasty because there's no way to get out of one. Once two threads are deadlocked, they will never recover. And unfortunately there's no way to check and see if something is already locked or would block if you tried.

While these shortcomings may be fixed in future releases of Perl (we are, to some extent, limited to the facilities provided by different platform's threading libraries), right now the only defense against deadlock is careful programming. To avoid them, follow these rules;

Always obtain locks in the same order. Alphabetical order is the standard.
Hold locks for as short a period as possible.
If locks are nested, lock the outer locks first.
Try not to call subroutines, especially subroutines whose source you don't control, while holding locks.

A Note About Performance

Now that we've covered how to lock things down for safety, we need to talk a bit about performance. It's very important that your code hold locks for as short a period of time as possible, and the locks need to be as specific as possible.

Actually acquiring a lock isn't really a performance killer. What can bite you is when a thread blocks trying to acquire a lock. Blocking and later waking up cost a little bit of time, but more importantly it creates a bottleneck - what folks doing threads call a critical path.

Critical paths are usually bad, especially on multiprocessor machines, since they reduce the level of concurrency in your program. The more threads stuck trying to get into a critical path, the lower the level of concurrency. The lower the level of concurrency, the less well your CPU resources are used. In particularly bad cases, adding an extra CPU can actually decrease performance. Concurrency is your friend!

Don't think that critical paths are only an issue on multiprocessor machines, though. You can cause yourself similar problems on a uniprocessor machine by holding a lock across a blocking system call, so just don't do that.

So What Have We Learned?

Hopefully, that proper locking is both good and reasonably simple.

__END__

Dan Sugalski is the VMS Systems Administrator for the Oregon University System's ITS department. He's been involved in the VMS Perl port for a few years, likes threading, mail, doing obscure things in XS, and tilting at windmills. To reach him you can either leave your message and a plate of cookies under the bushes by your front door or send mail to dan@sidhe.org.

TABLE OF CONTENTS