PREVIOUS  TABLE OF CONTENTS  NEXT 

Parsing Command Line Options

Johan Vromans

PACKAGES USED
Package Version
Perl 5
Getopt::Std and
Getopt::Long
both bundled with Perl 5; recent versions of Getopt::Long are on CPAN

Controlling a computer by typing commands into a shell is still the preferred way of working for most programmers. Despite the capabilities of modern window systems, working from a shell is much faster and less complicated than sequences of mouse movements and button clicks - once you know the names of the commands and how they work.

The expressiveness of a command-line program depends on what options it supports, and how they're parsed - converted into a form that your program can understand. When you execute ls -l /tmp on Unix, or dir /w c:\windows on MS-DOS, or your_program -height=80, the -l, /w, and -height=80 are options. Sometimes the shell handles the parsing; that's what happens with the /w in DOS. More often, the program named by the command (ls and your_program) must handle the parsing itself. In this article, I'll show you how your Perl programs can parse their own options.

OPTION PARSING CONVENTIONS

Under modern command shells, including those on Unix and Windows, options can be either letters or words. Programs that accept single letters might be invoked like program -a -b -c. Or, they might look be invoked as program -abc, meaning the same thing. If the options take values, you can bundle them together: -aw80L24x is equivalent to -a -w 80 -L 24 -x. With option words, you sacrifice brevity for clarity: program --all --width=80 --length 24 --extend.

In either case, options precede other program arguments (the /tmp in ls -l /tmp) and the parsing stops as soon as a non-option argument is encountered. A double dash by itself immediately terminates option parsing.

These conventions aren't universal. Some programs accept option words with a single dash (e.g. -h for height); some let you mix option letters and option words; some let you mix options and regular program arguments. Options can be mandatory or optional, case-sensitive or case-insensitive, and can expect an argument afterward - or not.

Parsing options in Perl isn't very hard, but after writing eight subroutines for eight programs, you might wonder whether there's a better way. There is. In fact, there are several ways.

THE SIMPLEST WAY

Perl directly supports the single-character style of options with the -s switch. If you start Perl as

perl -s script.pl -foo -bar myfile.dat

Perl will remove anything that looks like an option (-foo and -bar) from the command line and set the variables ($foo and $bar) to true. Note that the options are words preceded with a single dash. When Perl encounters an argument without the dash, it stops looking for options.

THE EASY WAY

Perl comes with two modules that handle command line options: Getopt::Std and Getopt::Long.

Getopt::Std provides two subroutines, getopt() and getopts(). Each expects a single dash before option letters and stops processing options when the first non-option is detected.

getopt() takes one argument, a string containing all the option letters that expect values. For example, getopt ('lw') lets your program be invoked as program -l24 -w 80 (or program -l 24 -w80), and it will set $opt_l to 24 and $opt_w to 80. Other option letters are also accepted; for example, program -l24 -ab will also set both $opt_a and $opt_b to 1.

When you don't want global variables defined in this way, you can pass a hash reference to getopt(). The keys are the option letters, and the values will be filled with the values (or 1 if the option doesn't take a value).

getopts() allows a little bit more control. Its argument is a string containing the option letters of all recognized options. Options that take values are followed by colons. For example, getopts ('abl:w:') makes your program accept -a and -b without a value, and -l and -w with a value. Any other arguments beginning with a dash result in an error. As with getopt(), a hash reference can be passed as an optional second argument.

THE ADVANCED WAY

Getopt::Long provides the GetOptions() function, which gives you ultimate control over command line options. It provides support for:

Other features:

This article describes version 2.17 of the Getopt::Long module.

Option Words. In its standard configuration, GetOptions() handles option words, ignoring case. Options may be abbrevi-ated, as long as the abbreviations are unambiguous. Options and other command line arguments can be mixed; options will be processed first, and the other arguments will remain in @ARGV.

This call to GetOptions() allows a single option, -foo.

GetOptions ('foo' => \$doit);

When the user provides -foo on the command line, $doit is set to 1. In this call, -foo is called the option control string, and \$doit is called the option destination. Multiple pairs of control strings and destinations can be provided. GetOptions() will return true if processing was successful, and false otherwise, displaying an error message with warn().

The option word may have aliases, alternative option words that refer to the same option:

GetOptions ('foo|bar|quux' => \$doit);

If you want to specify that an option takes a string, append =s to the option control string:

GetOptions ('foo=s' => \$thevalue);

When you use a colon instead of an equal sign, the option takes a value only when one is present:

GetOptions ('foo:s' => \$thevalue, 'bar' => \$doit);

Calling this program with arguments -foo bar blech places the string 'bar' in $thevalue, but when called with -foo -bar blech, something different happens: $thevalue is set to an empty string, and $bar is set to 1.

These options can also take numeric values; you can use =i or :i for integer values, and =f or :f for floating point values.

Using and Bundling Single-Letter Options. Using single-letter options is trivial; bundling them is a little trickier. Getopt::Long has a Configure() subroutine that you can use to fine-tune your option parsing. For bundling single-letter options, you would use Getopt::Long::Configure ('bundling'). Now GetOptions() will happily accept bundled single-letter options:

Getopt::Long::Configure ('bundling');
GetOptions ('a' => \$all,
            'l=i' => \$length,
            'w=i' => \$width);

This allows options of the form -a -l 24 -w 80 as well as bundled forms: -al24w80. You can mix these with option words:

GetOptions ('a|all' => \$all,
            'l|length=i' => \$length,
            'w|width=i' => \$width);
		

However, the option words require a double dash: --width 24 is acceptable, but -width 24 is not. (That causes the leading w to be interpreted as -w, and results in an error because idth isn't a valid integer value.

Getopt::Long::Configure('bundling_override') allows option words with a single dash, where the words take precedence over bundled single-letter options. For example:

Getopt::Long::Configure ('bundling_override');
GetOptions ('a' => \$a, 'v' => \$v,
            'x' => \$x, 'vax' => \$vax);

This treats -axv as -a -x -v, but treats -vax as a single option word.

Advanced destinations. You don't need to specify the option destination. If you don't, GetOptions() defines variables $opt_xxx (where xxx is the name of the option), just like getopt() and getopts(). Similarly, GetOptions() also accepts a reference to a hash (as its first argument) and places the option values in it.

If you do specify the option destination, it needn't be a scalar. If you specify an array reference, option values are pushed into this array:

GetOptions ('foo=i' => \@values);

Calling this program with arguments -foo 1 -foo 2 -foo 3 sets @values to (1,2,3).

The option destination can also be a hash reference:

my %values;
GetOptions ('define=s' => \%values);

If you call this program as program -define EOF=-1 -define bool=int, the %values hash will have the keys EOF and bool, set to -1 and 'int' respectively.

Finally, the destination can be a reference to a subroutine. This subroutine will be called when the option is handled. It expects two arguments: the name of the option and the value.

The special option control string '<>' can be used in this case to have a subroutine process arguments that aren't options. This subroutine is then called with the name of the non-option argument. Consider:

GetOptions ('x=i' => \$x, '<>' => \&doit);

When you execute this program with -x 1 foo -x 2 bar this invokes doit() with argument 'foo' (and $x equal to 1, and then calls doit() with argument 'bar' (and $x equal to 2).

Other Configurations. GetOptions() supports several other configuration characteristics. For a complete list, see the Getopt::Long documentation.

Getopt::Long::Configure ('no_ignore_case') matches option words without regard to case.

Getopt::Long::Configure ('no_auto_abbrev') prevents abbreviations for option words.

Getopt::Long::Configure ('require_order') stops detecting options after the first non-option command line argument.

Help messages. People often ask me why GetOptions() doesn't provide facilities for help messages. There are two reasons. The first reason is that while command line options adhere to conventions, help messages don't. Any style of message would necessarily please some people and annoy others, and would make calls to GetOptions() much lengthier and more confusing.

The second reason is that Perl allows a program to contain its own documentation in POD (Plain Old Documentation) format, and there are already modules that extract this information to supply help messages. The following sub-routine uses Pod::Usage for this purpose (and demon-strates how Pod::Usage can be loaded on demand):

sub options () {
    my $help = 0;       # handled locally
    my $ident = 0;      # handled locally
    my $man = 0;        # handled locally
	
    # Process options.
    if ( @ARGV > 0 ) {
        GetOptions('verbose'=> \$verbose,
                           'trace' => \$trace,
                           'help|?' => \$help,
                           'manual' => \$man,
                           'debug' => \$debug)
            or pod2usage(2);
    }
    if ( $man or $help ) {
        # Load Pod::Usage only if needed.
        require "Pod/Usage.pm";
        import Pod::Usage;
        pod2usage(1) if $help;
        pod2usage(VERBOSE => 2) if $man;
    }
}

Pod::Usage is available at http://www.perl.com/CPAN/modules/authors/Brad_Appleton. The latest version of Getopt::Long (2.17 as of this writing) can be found in authors/Johan_Vromans. This kit also contains a script template that uses both Getopt::Long and Pod::Usage.

OTHER OPTION HANDLING MODULES

A few other option handling modules can be found on the CPAN. The following modules can be downloaded from http://www.perl.com/CPAN/modules/by-category/12_Option_Argument_Parameter_Processing.

Getopt::Mixed provides handling option words and option letters. It was developed a couple of years ago, when Getopt::Std only handled option letters and Getopt::Long only handled option words. It's obsolete now.

Getopt::Regex is an option handler that uses regular expressions to identify the options, and closures to deliver the option values.

Getopt::EvaP uses a table-driven option handler that provides help messages in addition to most Getopt::Long features.

Getopt::Tabular is another table-driven option handler loosely inspired by Tcl/Tk. Powerful, but very complex to set up.

__END__


Johan Vromans (jvromans@squirrel.nl) has been engaged in software engineering since 1975. He has been a Perl user since version 2 and participated actively in its development. Besides being the author of Getopt::Long, he wrote the Perl 5 Desktop Reference and co-authored The Webmaster's Handbook.

PREVIOUS  TABLE OF CONTENTS  NEXT