PREVIOUS  TABLE OF CONTENTS  NEXT 

CGI Programming

Lincoln Stein

In this first installment of my Web CGI column, I'll introduce you to the elements of CGI scripting using the basic CGI module, CGI.pm. In later installments I'll cover the more advanced CGI::* library, a collection of modules providing an object-oriented class hierarchy which gives you more control over the behavior of CGI scripts.

CGI stands for Common Gateway Interface; it's the standard way to attach a piece of software to a World Wide Web URL. The majority of URLs refer to static files. When a remote user requests the file's URL, the Web server translates the request into a physical file path and returns it. However, URLs can also refer to executable files known as CGI scripts. When the server accesses this type of URL, it executes the script, sending the script's output to the browser. This mechanism lets you create dynamic pages, questionnaires, database query screens, order forms, and other interactive documents. It's not limited to text: CGI scripts can generate on-the-fly pictures, sounds, animations, applets or anything else.

The basic CGI script is simple:

#!/usr/bin/perl

print "Content-type: text/html\r\n";
print "\r\n";
chomp($time = 'date');
print<<EOF;
<HTML><HEAD>
<TITLE>Virtual Clock</TITLE>
</HEAD>
<BODY>
<H1>Virtual Clock</H1>
At the tone, the time will be 
<STRONG>$time</STRONG>.
</BODY></HTML>
EOF

This script begins by printing out an HTTP header. HTTP headers consist of a series of e-mail style header fields separated by carriage-return/newline pairs - in Perl, "\r\n".

After the last field, the header is terminated by a blank line - another "\r\n" sequence. Although HTTP recognizes many different field names, the only one you usually need is "Content-type", which tells the browser the document's MIME (Multipurpose Internet Mail Extension) type, determining how it will be displayed. You'll often want to specify "text/html" for the value of this field, but any MIME type, including graphics and audio, is acceptable.

Next, the script uses the Unix date command to place the current time in the Perl variable $time. It then proceeds to print a short HTML document, incorporating the timestamp directly into the text.

The output will look something like this on a browser:

Figure 4: Virtual Clock Script

Each time you reload this script you'll see a different time and date.

Things get trickier when you need to process information passed to your script from the remote user. If you've spent any time on the Web, URLs invoking CGI scripts will look familiar. CGI scripts can be invoked without any parameters:

http://some.site/cgi-bin/hello_world.pl

To send parameters to a script, add a question mark to the script name, followed by whatever parameters you want to send:

http://some.site/cgi-bin/index_search.pl?CGI+perl

http://some.site/cgi-bin/order.pl?cat_no=3921&quantity=2

The examples above show the two most commonly used styles for parameter passing. The first shows the keyword list style, in which the parameters are a series of keywords separated by + signs. This style is traditionally used for various types of index searches. The second shows a named parameter list: a series of "parameter=value" pairs with "&"s in between. This style is used internally by browsers to transmit the contents of a fill-out form.

Both the script's URL and its parameters are subject to URL escaping rules. Whitespace, control characters, and most punctuation characters are replaced by a percent sign and the hexadecimal code for the character. For example, the space between the words "John Doe" should be passed to a CGI script like this:

http://some.site/cgi-bin/find_address.pl?name=John%20Doe

since spaces are ASCII 32, and 32 is hexadecimal 20.

The problem with processing script parameters is that, for various historical reasons, the rules for fetching and translating the parameters are annoyingly complex. Sometimes the script parameters are found in an environment variable. But they can also be accessed via the command line (@ARGV) array. Or, they can be passed via standard input. Usually you'll have to recognize the URL escape sequences and translate them, but in some circumstances the server will do that for you. Which rules apply depend on whether your script was generated by a GET or POST request (the former is usually generated when a user selects a hypertext link; the latter when a browser submits the contents of a fill-out form), whether the parameters are formatted using the keyword list or named parameter styles, and whether the browser takes advantage of the Netscape 2.0 file upload feature.

Fortunately CGI.pm (and the CGI::* modules discussed in later columns) knows the rules. It takes care of the details so that you can concentrate on your application.

CGI.pm combines several functions:

CGI.pm requires Perl 5.001 or higher. Its home base is:

http://www-genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html

It's also widely distributed via the CPAN. It installs like any other Perl module. You can either copy it directly to your Perl library directory, or you can use the Perl5 MakeMaker program to locate the library directory and install CGI.pm for you.

Using CGI.pm, we can enhance the simple virtual clock script to allow the remote user some control over the time format. This script allows the user to control whether the time, day, month, and year are displayed, and toggle between displaying the time in 12-hour or 24-hour format.

#!/usr/bin/perl

use CGI;

$q = new CGI;
if ($q->param) {
    if ($q->param('time')) {
        $format = ($q->param('type') eq '12-hour') ? 
                              '%r ' : '%T ';
    }
    $format .= '%A ' if $q->param('day');
    $format .= '%B ' if $q->param('month');
    $format .= '%d ' if $q->param('day-of-month');
    $format .= '%Y ' if $q->param('year');
} else { $format = '%r %A %B %d %Y' }

chomp($time = 'date '+$format'');
 
# print the HTTP header and the HTML document
print $q->header;
print $q->start_html('Virtual Clock');

print "<H1>Virtual Clock</H1>At the tone, 
       the time will be <STRONG>$time</STRONG>.";
print "<HR><H2>Set Clock Format</H2>";

# create the clock settings form
print $q->start_form, "Show: ";
print $q->checkbox(-name=>'time', -checked=>1), 
          $q->checkbox(-name=>'day',-checked=>1);
print $q->checkbox(-name=>'month',-checked=>1),
          $q->checkbox(-name=>'day-of-month',-checked=>1);
print $q->checkbox(-name=>'year', -checked=>1), 
                          "< P>";
print "Time style: ", $q->radio_group(-name=>'type',
           -values=>['12-hour','24-hour']),"< P>";

print $q->reset(-name => 'Reset'), 
            $q->submit(-name => 'Set'); 
print $q->end_form;
print $q->end_html;

Before I explain how this program works, let's see what it does:

Figure 5: Regenerating Virtual Clock Script

Let's walk through this script step by step:

  1. We load the CGI module and send a new() message to the CGI class. This creates a new CGI object, which we store in the Perl variable $q. Parameter parsing takes place during the new() call, so you don't have do it explicitly.

  2. Next, using specifications determined by the script parameters, we create a format string to pass to the UNIX date command. The key to accessing script parameters is the CGI param() call. param() is designed for the named parameter list style of script argument (another method call, keywords(), is used to access keyword lists). Called without arguments, param() returns an array of all the named parameters. Called with the name of a parameter, param() returns its value, or an array of values if the parameter appears more than once in the script parameter list. In this case, we look for parameters named time, day, month, day-of-month, year and style. Using their values, we build up a time format specifier to pass to the date command (see its manual page for details). If no parameters are present - for instance, if the script is being called for the very first time - we create a default format specifier. Then we call the date command and save its value in $time as before.

  3. We create the HTTP header using the CGI header() method. This method returns a string containing a fully-formed HTTP header, which the program immediately prints out. Called without any parameters, header() returns a string declaring that the document is of the content type "text/html". To create documents of other MIME types, you can call header() with the MIME type of your choice, e.g.
    print $q->header('image/gif');
    

    You can also use the named-parameter style of calling to produce headers containing any of the fields defined in the HTTP protocol:

    print $q->header(-Status => 200,
                        -Type => 'image/gif',
                        -Pragma => 'no cache',
                        '-Content-length' => 8457);
    

    You don't have to remember to write that blank line after the HTTP header. header() does it for you.

  4. We start the HTML document by printing out the string returned by start_html(). Called with just one argument, this method returns an HTML <HEAD> section and the opening tag for the HTML <BODY>. The argument becomes the title of the document. As in header() you can call start_html() with named parameters to specify such things as the author's e-mail address, or the background color (a Netscape extension):
    print $q->start_html(
                   -Title => 'Virtual Document',
                   -Author => 'andy@putamen.com',
                   -BGCOLOR => '#00A0A0');
    

  5. The program then spits out a few lines of HTML, including the formatted time string.

  6. This is followed by a horizontal line and a fill-out form that allows the user to adjust the format of the displayed time. CGI.pm has a whole series of HTML shortcuts for generating fill-out form elements. We start the form by printing out the
    string returned by the start_form() method, and then create a series of checkboxes (using the checkbox() method), a pair of radio buttons (using the radio_group() method), and the standard Reset and Submit buttons (using the reset() and submit() methods). There are similar methods for creating text input fields, popup menus, scrolling lists and clickable image maps. One of the features of these methods is that if a named parameter is defined from an earlier invocation of the script, its value is "sticky": a checkbox that was previously turned on will remain on. This feature makes it possible to keep track of a series of user interactions in order to create multipart questionnaires, shopping-cart scripts, and progressively more complex database queries. Each of these methods accepts optional arguments that adjust the appearance and behavior; for example, you can adjust the height of a scrolling list with the -size parameter. After we finish the form, we close it with a call to end_form().

  7. We end the virtual document by printing the string returned by end_html(), which returns the </BODY> and </HTML> tags.

In addition to its basic parameter-parsing, HTTP header-creating, and HTML shortcut-generating abilities, CGI.pm contains functions for saving and restoring the script's state to files and pipes, generating browser redirection instructions, and accessing useful information about the transaction, such as the type of browser, the machine it's running on, and the list of MIME types it canl accept.

The next column will discuss how to handle errors generated by CGI scripts, and additional techniques for maintaining state in CGI transactions.

__END__


Lincoln Stein wrote CGI.pm.


PREVIOUS  TABLE OF CONTENTS  NEXT