PREVIOUS  TABLE OF CONTENTS  NEXT 

Patterns in Perl

Sean Hunter

Most programmers in professional development know someone who speaks in awed tones of the "gang of four", and has a copy of Addison-Wesley's Design Patterns that is almost as well-thumbed as your average blue camel. Someone who starts tedious one-sided conversations over lunch about the intimate details of the "visitor pattern", and wonders why no one else understands them and they just don't seem to be able to attract members of the opposite sex. We all know a pattern junkie when we see one.

Those of you who are sufficiently curious and have not simply moved on to the next article at the mere mention of "patterns" may wonder why anyone would ever be foolish enough to consider that sort of thing in Perl. Patterns describe commonly occurring problems and how they should be solved, and they're too rigid, too "buzzword-compliant", too abstract and dull for Perl.

Well, I agree. So why use them? In general I don't. However, I've recently gotten pretty good results from the judicious application of two of these patterns, and found that they can solve some problems that are a little tricky or inefficient to do by other means. In actual practice, these patterns aren't really anything new, but are just a way of naming common techniques that most people use all the time to solve various problems. The unfortunate thing about the movement is that the people behind it all have proven very dogmatic and unflexible. "There's only one way to do it" is not an attitude many Perl programmers agree with.

The Factory Pattern

This is a fancy name for having an object that builds other objects for you. If you've used the DBI interface, you've used a Factory pattern whenever you call DBI->connect(), and actually get back an object that uses special magic to talk to your database.

Factories let you abstract away the complexity of creating objects so that the calling code can achieve a lot, while remaining simple and understandable. (That's the theory, anyway.)

The Iterator Pattern

An iterator is an object that loops over a collection, returning one at a time. foreach (@my_list) is pretty much all there is to the classic iterator.

So why use one? Well, in other languages, you need them to iterate over complex data structures like hashes. In Perl, however, that's already taken care of by the facilities of the language. You just say for (keys %my_hash) or while (my ($key, $value) = each %my_hash) or some variation on those themes. As such, this pattern adds almost nothing of value to Perl - but we'll see a useful iterator soon.

If you know that the folks using your collection are going to be retrieving the items in a structured way, you can use that knowlege to delay processing items in the collection until the caller specifically asks for them. This is called lazy evaluation, and it's useful for big collections because it lets us return quickly, rather than performing all the processing in one big chunk before your program can proceed. I do this by using what I call a Chained Iterator. We'll see how this works in a minute.

Enough theory, let's see some code

Our code will demonstrate both a Factory and an Iterator, using a base object: a simple blessed hash. The constructor of our BaseObj class simply takes whatever arguments are passed to it, munges them into a hash ($self = {@_}), and blesses that hash into a Perl object. I always use the two-argument form of bless so it's clear what class the object is being blessed into, which will sometimes be a Factory and sometimes an Iterator. This source code can be found in the file BaseObj.pm at http://tpj.com/tpj/programs.

use strict;
package BaseObj;

sub new {
    my $class  = shift;

    my $self = {@_};
    bless($self, $class);

    return $self;
}

1;

Our first Factory class is also pretty straightforward. It has one method, getBassists(), which simply provides a short list of some of the great jazz bassists and pathetically attempts to raise their status by uppercasing their names.

Our Factory does this by returning an Iterator object. The Iterator takes a "chain" argument, a list of code references. The idea is that when someone calls next() to retrieve the next item from the Iterator, it secretly runs all of the functions in the chain, passing the output from one function into the input of the next. Typically, these functions retrieve a single item and perform various transformations on it, such as blessing it into an object, adding extra data, or converting data from one form to another. In this example we build two small closures (subroutines whose behavior is partially determined as your program runs) to run in the Iterator chain. If we wanted to transform the data into HTML for our "great bassists" CGI program, we'd just throw another closure into the chain.

Note that we don't need a constructor for our Factory or Iterator objects because we're inheriting from the BaseObj class, which provides one for us. Done right, Perl OO can be a great aid to the lazy. This code comes from Factory1.pm.

use strict;
package Factory1;

use BaseObj;
use Iterator;

use vars qw(@ISA);
sub getBassists {
    my @great_bassists = (
			  'Jimmy Blanton', 'Jimmy Garrison',	
			  'Ray Brown',     'Charlie Haden',
			  'Steve Swallow', 'Dave Holland');

    my $get_bassists = sub { shift @great_bassists };
    my $raise_status = sub { uc(shift) };

    return new Iterator(chain=>[$get_bassists, 
$raise_status]);
}

1;

At the moment the getBassists() function simply uses static data, but soon we'll make it more interesting by having it read from a database.

Now we get to the meat of our design - the Iterator object, defined in Iterator.pm. It has one method, next(), which runs all of the closures in the chain in turn, passing the output of each to the input of the next. Because of the power of Perl, the expression is elegant and concise.

use strict;
package Iterator;
use BaseObj;

use vars qw(@ISA);
sub next {
    my $self = shift;

    my $row;
    for (@{$self->{chain}}) { $row = &$_($row) if defined $_ }

    return $row;
}

1;

Now here's our first test program, which I call fact1.pl. Here we see the primary benefit of the Factory: It can accomplish a lot while keeping the calling code very simple.

#!/usr/bin/perl -w

use strict;
use Factory1;

my $fact = new Factory();

my $it = $fact->getBassists();

while (my $bass = $it->next()) {
    print "$bass\n";
}

Here lies one of the great gotchas of OO programming. It's very easy to become carried away by how much is achieved behind the scenes, resulting in code that hides where individual operations are performed. Simple code becomes a maintenance nightmare.

Now, why did we go to the bother of creating these objects when all we needed to do was to run for over the list and convert each one to upper case right there? It wouldn't make much sense if this was all you wanted to do; this is just an example to demonstrate how patterns work. In the case of larger collections with more complex transformations, this idea really comes into its own.

Say we decide we want to spread the word about jazz, and publish the gory details of our (very) extensive CD collection online. We back this up with a SQL database to allow the hordes of jazz-loving web surfers to search our CD collection by bassist. For this example we're just going to consider one part of the above scenario - how to get the list of bassists and their recordings.

Let's say we've created a set of database tables to represent all the data we need for our fabulous web solution. The only parts of relevance here are the BASSIST and RECORDING tables. (Although we are creating the database on MySQL, I am using uppercase for database objects so that once my CD collection site takes off and I go for an IPO, it's easier to port Oracle for good investor relations. It's always a good idea to keep possible future rquirements in mind when designing.)

Here are our table definitions. If you don't know SQL, don't worry. All that matters is that we have two databases: one for bassists and one for recordings, and each consists of a few fields of data.

create table BASSIST (
	ID	int	(10) unsigned not null auto_increment,
	NAME	varchar	(50),
	primary	key	(ID)
);


create table RECORDING (
	ID	int	(10) unsigned not null auto_increment,
	TITLE	varchar	(50),
	BASSIST_ID 	int	(10) unsigned not null,

	key IDX_BASSIST_ID	(BASSIST_ID),
	primary key	(ID)
);

RECORDING.BASSIST_ID is a "foreign" key referencing ID from the BASSIST table. This lets us record which bassist was on any particular recording. (Since MySQL doesn't actually support foreign keys, I've just made it an index, and would make it a proper foreign key if we migrated to another database engine.) The RECORDING table supports many recordings for each bassist. For some sample data along with the above table definitions, see the file bassists_dump.sql bundled with this article's source code at http://tpj.com.

Here's a query so you can see some of the sample data:

mysql> select B.*, R.* from RECORDING R, BASSIST B where B.ID = R.BASSIST_ID;
+----+----------------+----+------------------------------+------------+
| ID | NAME           | ID | TITLE                        | BASSIST_ID |
+----+----------------+----+------------------------------+------------+
|  1 | Ray Brown      |  1 | Night Train                  |          1 |
|  3 | Jimmy Garrison |  2 | Crescent                     |          3 |
|  3 | Jimmy Garrison |  3 | A Love Supreme               |          3 |
|  3 | Jimmy Garrison |  4 | Ballads                      |          3 |
|  4 | Charlie Haden  |  5 | Liberation Music Orchestra   |          4 |
|  4 | Charlie Haden  |  6 | Dreamweaver                  |          4 |
|  4 | Charlie Haden  |  7 | Shades                       |          4 |
|  5 | Paul Chambers  |  8 | Milestones                   |          5 |
|  6 | Steve Swallow  |  9 | Real Book                    |          6 |
|  7 | Dave Holland   | 10 | Conference of the Birds      |          7 |
|  7 | Dave Holland   | 11 | Gnu High                     |          7 |
+----+----------------+----+------------------------------+------------+
11 rows in set (0.00 sec)

As you can see, each bassist in the table has one or more recordings. The first ID column is BASSIST.ID, and the second is RECORDING.ID. We actually have another bassist who hasn't shown up in the above query: Jimmy Blanton has no entries in the sample data in the RECORDING table. Here's his entry in the BASSIST table:

mysql> select * from BASSIST where ID = 2;
+----+---------------+
| ID | NAME          |
+----+---------------+
|  2 | Jimmy Blanton |
+----+---------------+

Now, what we want to be able to do is to query the database dynamically, producing a list of bassists with their respective recordings that reflects the "one to many" structure of the above data. In order to take away some of the drudgery of using DBI, we use a small wrapper that makes the calling and error checking easier and more consistent. This wrapper function is in the file SqlTools.pm.

package SqlTools;
use strict;
use Carp;

sub sql_hash {
    my ($sql, @rest) = @_;
    if (not defined $sql or $sql eq '') {
	    carp("doing nothing, just like you asked");
	    return;
    }
    my $sth = $dbh->prepare($sql);
    if (!$sth) {
	    croak("Unable to prepare sql statement '$sql' " .
	        "in sql_tools:\n\t " . $dbh->errstr);
    }
    unless ($sth->execute(@rest)) {
	croak("Unable to execute sql statement '$sql' " .
	    "in sql_tools:\n\t " . $dbh->errstr); 
    }
    my $hashref;
    my @retval;
    while ($hashref = $sth->fetchrow_hashref) {
	    push @retval, $hashref;
    }
    return \@retval;
}

1;

The sql_hash() function takes a SQL query and returns the results as a reference to a list of hashes. The @rest parameter lets us support placeholders (see the DBI documentation for more details). Note that just because we are using objects for some of our design doesn't mean that everything we do has to fall under that model. The above functions work perfectly well as they are, and any further objectification would be gratuitous and pedantic. One of the most important design skills is not overapplying a favored technique.

Here's the new version of our Factory class. I've called it Factory2, and the code can be found in Factory2.pm. The getBassists() function is slightly more involved now, because we query the database to return a collection of Bassist objects. These have an internal structure representing the BASSIST and RECORDING tables. The Bassist object is a blessed hash which has a key and value for each column of the BASSIST table, and an extra key, called RECORDINGS, holding a list of the bassist's recordings.

For example, if we defined the desired data structure for Jimmy Garrison statically, it would look something like this:

my $jimmy = {
   	ID => 3,
   	NAME => 'Jimmy Garrison',
   	RECORDINGS => [
	        {BASSIST_ID => 3,
                ID => 2,
             TITLE => 'Crescent'},

	        {BASSIST_ID => 3,
                ID => 3,
             TITLE => 'A Love Supreme'},

	        {BASSIST_ID => 3,
                ID => 4,
                TITLE => 'Ballads'} ]
};

As you may have expected, our new getBassists() function returns an Iterator that creates these on the fly. If you'd like a closer look at these data structures (or any others), you can display them using the Data::Dumper module or the Perl debugger.

use strict;

package Factory2;

use BaseObj;
use Iterator;
use SqlTools;
use Bassist;
use vars qw(@ISA);
sub getBassists {
    my $bassists   = sql_hash("select * from BASSIST");
    my $recordings = sql_hash("select * from RECORDING");

    # turn $recordings into a hash keyed by BASSIST_ID
    my %recording_by_bassist;
    for (@$recordings) {
	    push @{$recording_by_bassist{$_->{BASSIST_ID}}}, $_;
    }

    # get a bassist
    my $get_bassist = sub { shift @$bassists }

    # add the recordings for this bassist
    my $get_recordings = sub {
	  my $bassist = shift;
	  return unless defined $bassist;
	  $bassist->{RECORDINGS} = $recording_by_bassist{$bassist->{ID}};
	  return $bassist;
    };

    # create a Bassist object
    my $make_object = sub {
	  my $data = shift;
	  return new Bassist(%$data)
	      if defined $data;
      };
    
    return new Iterator(chain=>[$get_bassist,
				      $get_recordings,
				      $make_object]);
}

1;

Here are the contents of Bassist.pm. At the moment, it has a single method, getString(), which returns a string containing the name of the bassist and the titles of any corresponding recordings. We could easilly extend this object to include the ability to display itself as an HTML snippet.

package Bassist;
use strict;
use BaseObj;

use vars qw(@ISA);
sub getString {
    my $self = shift;

    my $str = "$self->{NAME}:\n";
    for (@{$self->{RECORDINGS}}) { 
         $str .= "\t$_->{TITLE}\n" 
    }
    $str .= "\n";

    return $str;
}

1;

Finally, here is fact2.pl, our test program for Factory2. It calls getBassists(), using the getString() method on each Bassist object to print out each name and the corresponding recordings.

#!/usr/bin/perl -w
use strict;
use Factory2;

my $fact = new Factory2();

my $it = $fact->getBassists();

while (my $bass = $it->next()) {
    print $bass->getString();
}

I have found these techniques to be a good way of returning collections of objects to clients as the result of database queries. When dealing with large result sets, it is often beneficial to be able to return quickly, performing row processing "just in time" before giving each row back to the caller. I hope that these examples have given you some idea of the possible uses of Factories and Iterators in general, and Chained Iterators in particular. In the spirit of "There's More Than One Way To Do It", these can form a useful weapon in the Perl programmer's problem-solving arsenal - bearing in mind that no technique solves all possible problems.

References

_ _END_ _


Sean Hunter (sean@uncarved.co.uk) is a consulting software guy who works for an e-commerce company that is unusual because it is actually making money. He lives in London, England, and can also occasionally be found playing bass in various jazz clubs.


PREVIOUS  TABLE OF CONTENTS  NEXT