Extending Perl with Inline.pm - The Perl Journal, Fall 2000

Brian Ingerson

Packages Used:

Inline ........................................................................... CPAN
Inline::Config ................................................................ CPAN
Parse::RecDescent ....................................................... CPAN
Digest::MD5 ................................................................ CPAN

Everybody. Get In Line!

Making Flippy Floppy, Talking Heads

I started out my career as an IBM assembly language programmer. You know, hexadecimal arithmetic, bit-level operations, debugging 500 page core dumps printed on greenbar paper. The cool thing about assembly language is that you can do anything. You could write a nice menu-based hyper-linking user interface that stores its data on your own homemade mass-storage device. What sucks about assembly is that you have to do everything. Programming A = B + C takes more than one punch card.

After a while, I spent most of my time developing programming tools and language extensions. Any hacker worth his salt cannot code something the same way more than three times without writing an abstraction to do it instead. I would write things to turn concepts like memory allocation, I/O, and database access into assembly language one-liners with some object-oriented behavior.

When I switched to Perl about 3 years ago, it was a very natural transition. Assembly to Perl? Natural? Definitely! I soon found that although I could accomplish almost anything I wanted to, I didn't have to do all the work. Perl has so many powerful built-in features and extensions: regular expressions, run-time evaluation, LWP, and CGI, to name a few. If I needed to write my own protocol or device level stuff, I could generally do that too.

Everything was proceeding swimmingly. Then one day, I needed to make my Perl code work with someone else's C code. I had heard that Perl has facilities for doing such things. I assumed that since Perl was so awesome, it must be really easy to do. Something along the lines of:

 $question = "How soon is now?";
    print "And the answer is: ", &ask_Mr_Wizard($question);
    exit;

    BEGIN :C {
        char* ask_Mr_Wizard(char* q) {
        /* omniscient C code omitted */
            return a;
        }
    }

Unfortunately, it turned out that I needed to create a separate module, a separate C file, a "glue code" file in a language called XS, a type mapping file, and a Makefile generating file. (Actually, the h2xs utility creates all of these for you. But it's up to you to modify and maintain them.)

Then I needed to absorb the content of over a half dozen lengthy Perl man pages, read a couple of books, and muck about in the Perl source code for examples. All very interesting stuff, I assure you, but all I wanted to do was ask Mr. Wizard a question. If Perl is supposed to make simple things simple, and hard things possible, this was bordering on the impossible.

Introducing Inline.pm

Inspired by the many presentations of Damian Conway at this past Summer's Perl Conference 4.0, I decided to create a module that would let me include other programming languages directly in my Perl code in much the same manner shown above. What impressed me about Dr. Conway's modules was that he coupled problems of immense magnitude with solutions of equal simplicity, or in Damian's words "DWIMity" (Do What I Mean). I decided to call this module Inline.pm. It is fitting that much of the work done by this module is accomplished with Damian's Parse::RecDescent.

Inline works on all flavors of Unix and Microsoft Windows, provided you have the proper development environment. Read the Inline documentation for more information.

Enough talk. Let's check this thing out. Here's a simple but complete program:

    # rithmatick.pl
    print "9 + 16 = ", add(9, 16), "\n";
    print "9 - 16 = ", subtract(9, 16), "\n";
 
    use Inline C => <<'END_OF_C_CODE';
    
    int add(int x, int y) {
      return x + y;
    }
 
    int subtract(int x, int y) {
      
	  return x - y;
    }
   
    END_OF_C_CODE

That's it! Just run it like any other Perl program and it will print:

    9 + 16 = 25
    9 - 16 = -7

I've just managed to accomplish something in ten lines that used to take two. But it's just an example; the point is that you can now jump painlessly from Perl-space to C-space and back. Once you're in C-space, you can do whatever floats your boat, like write a super speedy algorithm, invoke legacy code through an API, or access the entire internals of perl.

But how is this possible? Don't you need to compile and link the C code? Wouldn't that make the program extremely slow? How do the Perl variables get converted to C variables and back? How can C functions be called like Perl subroutines?

That's the DWIMity kicking in: all the hairy details are handled for you by the module. You just say what you need to say and let Inline do the rest. Here's how it works.

The first time you run this program, Inline does everything the hard way. It analyzes your C code, creates all those different files, compiles it, links it, and finally loads the executable object. On my Linux box, this causes a 3-4 second delay in execution time. The second time you run it, it's lightning fast. That's because Inline caches the executable object on disk. You can change your program as much as you like, and as long as you don't touch the C code, Inline will use the cached version. As soon as you do change the C code, Inline will recompile it on the next run.

I'd Like To Buy A Vowel

Let's look at a slightly more complex example. The program, vowels.pl, takes a filename from the command line and prints the ratio of vowels to letters in that file. vowels.pl uses an Inline C function called vowel_scan that takes a string argument and returns the percentage of vowels as a floating point number between zero and one. It handles upper and lower case letters, and (true to my IBM roots) both ASCII and EBCDIC. (It is also quite fast; check out the benchmarks at the end of the article.)

Listing 1: Using C within Perl in vowels.pl.

Here's how to count the vowels in the Unix word list:

    % perl vowels.pl /usr/dict/words
    The letters in /usr/dict/words are 37.5% vowels.

Although this is just another example of calling a C function as if it were a Perl subroutine, it introduces a couple of new concepts.

First, notice that the syntax for invoking Inline is different. The C source code is stored after the __END__ token, which means that it is accessible to the program through the DATA filehandle. Unfortunately, you can only read from the DATA filehandle at run time, and use is a compile time directive. Fortunately, in Perl, TMTOWTDI.

Since

    use Foo(LIST);

is just another way of saying

    BEGIN { require Foo; Foo->import(LIST) }

we can invoke Inline at run time by calling import manually, and with the code accessible via DATA. This gives us a very clean way to organize our Inline source code.

Second, there are two new data types in our C program: double and char*. Luckily, those are two of the five data types Inline supports:

    int
    long
    double
    char*
    SV*

Those five are all you need! int and long are for integer scalars, double is for floating point scalars, and char* (usually pronounced "Char Star" in social settings) is for strings. SV* is a generic Perl type that covers "anything else" like hash references, for instance. It will be covered in detail in the following sections. These types provide a very simple interface that can be expanded to handle the most complex situations. (Just like Perl itself.)

At this point, your optimism for having Inline solve your real life needs is probably inversely proportional to your knowledge of C, XS, and Perl internals. "XS provides a lot more type-mapping and functionality", you say. If you're skeptical, that's good. Stick with me.

TAFWTDI

There are four ways to do it, where "it" is calling C functions from Perl. C functions typically take a fixed number of arguments as input, and produce one or zero return values. When a C function needs to return multiple values, it has the caller pass in the return values "by reference". Perl, on the other hand, almost always returns multiple values as a list. This provides us with four different situations for Inline:

1. int foo(int i, double n, char* str) {

This is the simplest case. The function, foo(), takes an exact number of input arguments and returns one value, an integer. All of the Perl to C conversions happen automatically. The examples shown earlier in the article were like this.

2. void foo(int i, double n, char* str) {

In C, void normally means that the function doesn't return anything. Inline gives special meaning to a void declaration: it's used to indicate that the function can return any number of values (including zero), so this is how you return a list. It is also less automatic, because you'll need to manage the Perl internal stack yourself. Read on.

3. int foo(SV*, ...) {

Just like in C, the ellipsis indicates that an unknown number of arguments will be passed in. Again you will need to access Perl's internal stack manually. Inline provides a bunch of C macros to make this easier.

4. void foo(SV*, ...) {

This is just a combination of calls 2 and 3 above. It's another way of saying, "I can handle everything myself, thank you".

Chip the Glasses and Crack the Plates, That's what Bilbo Baggins Hates

Internally, Perl is centered around a stack, commonly referred to as the Stack. A stack is a just an array that you only access from one end. Computer scientists like to compare it to a spring-loaded stack of dinner plates in a cafeteria: you can push plates onto the stack or pop them off, and that's all you can do. Perl uses the Stack to pass scalar arguments to a subroutine. When the subroutine takes control, it pops the plates from the Stack. Before the subroutine returns control, it pushes the return values back onto the Stack.

You do this all the time in Perl without knowing it, using @_ and return. With Inline, you need to delve a bit into Perl's internals. If you ever look into the Perl source code itself, you'll undoubtedly find references to wizards, elves, and hobbits. Fear not, for Inline can help you slay the dragons.

Inline provides the following C macros for dealing with the Stack:

• Inline_Stack_Vars

You'll need to use this macro if you want to use the others. It sets up a few local variables: sp, items, ax, and mark, for use by the other macros. It's not important to know what they do; I'm mentioning them so you can avoid naming conflicts.

• Inline_Stack_Items

This macro returns the number of arguments passed in on the stack.

• Inline_Stack_Item(i)

This macro refers to a particular SV* in the stack, where i is an index number starting from zero. It can be used to get or set the value.

• Inline_Stack_Reset

Use this macro before pushing anything back onto the Stack. It resets the internal Stack pointer to the beginning of the Stack.

• Inline_Stack_Push(sv)

This macro pushes a return value back onto the stack. The value must be of type SV*.

• Inline_Stack_Done

After you have pushed all of your return values, you must call this macro.

• Inline_Stack_Return(n)

This macro returns n items on the Stack.

• Inline_Stack_Void

This is a special macro that indicates you really don't want to return anything. It's the same as Inline_Stack_Return(0).

The C type SV* deserves an explanation. SV, which stands for "scalar value", is simply the name of the internal structure that Perl uses to hold scalars. The Stack, therefore, is an array of pointers to SVs. Perl provides a slew of helper macros for getting data in and out of SVs (and AVs, HVs, RVs, GVs, and so on). See the perlapi and perlguts documentation bundled with Perl for all the details. You are using Perl 5.6, aren't you? (Inline works with Perl 5.005 and above, but the perlapi documentation is only available with 5.6 or later versions.)

Another example should help clear the fog. The get_scalars() function takes a list of names of Perl global scalars and returns the values of the ones that actually exist and contain a string.

Here's what we get for output:

    % perl scalars.pl
    scissors/paper
    scissors/Inline/42

The first time we call get_scalars() it fails to return $scalar4 because it is not defined, and $scalar3 because it's not a string. In the second case, $scalar4 is defined (and thus returned), $scalar1 is undefined (and thus ignored), and $scalar3 is returned because it is now a string.

The more important thing is that we can handle list input and list output with relative ease. You'll notice that I snuck in a few Perl internal macro calls. SvPVX returns the string (char*) from an SV variable. SvPOK indicates whether an SV has a string component or not, and perl_get_sv (get_sv in Perl 5.6) returns an SV from Perl's internal symbol table. You can read about these and many, many more in the perlapi documentation.

Listing 2: Retrieving scalar values via C in scalars.pl.

The Inline Outline

Let's take a break from using Inline and examine exactly how it does its magic. The module lets you take C source code and effectively eval it into Perl at run time. What, exactly, is going on under the hood to make all of this work? Here is a basic outline of what happens when you invoke Inline:

1. Receive the source code

Inline gets the source code from your program or module with a statement like the following:

    use Inline C => 'source code';

where "C" is the programming language used, and 'source code' is the actual source code itself in the form of a string. 'source code' can also be a filename, a reference to a subroutine, or anything else that returns source code. Inline then prepends the following header includes to your source code:

    #include "EXTERN.h"
    #include "perl.h"
    #include "XSUB.h"
    #include "INLINE.h"

This should be all the headers you need for regular situations. (The perl.h file includes all the standard C header files like stdio.h.)

2. Check if the source code has been compiled

Inline only needs to compile the source code if it has not yet been compiled. But how can it tell if the source code has changed? It accomplishes this seemingly magical task by running the source text through the Digest::MD5 module to produce a virtually unique 128-bit hexadecimal "fingerprint" of the source code. The fingerprint is mangled together with the current package name and the name of the programming language. If the package is "main", then the program name is added, and otherwise, the module version number is used. This forms a unique name for the executable object. For instance, the vowels.pl example produces a cached executable object called (on a Unix system):

    main_C_vowels_pl_bcc13cd1d188b32fc216cea883239ee3.so

If an object with that name already exists, then skip to step 8, because no compilation is necessary.

3. Find a place to build and install

At this point Inline knows it needs to compile the source code. The first thing to figure out is where to create the great big mess of files associated with compilation, and where to put the object when it's done.

By default Inline will try to build and install under the first of the following places that is a writable directory:

a. $ENV{PERL_INLINE_BLIB}. The PERL_INLINE_BLIB environment variable overrides all else.

b. ./blib_I/. (Inside the current directory, unless you're in your home directory.)

c. $bin/blib_I/. (Where $bin is the directory the program is in.)

d. $ENV{HOME}/blib_I/. (Under your home directory.)

e. $ENV{HOME}/.blib_I/. (Same as above, but more discreet.)

(blib stands for "build library" in Perl-speak. It is a temporary staging directory created when you install a Perl module on your system. blib_I is the Inline version of the same concept.)

If none of those directories exists, Inline will attempt to create and use $bin/blib_I/ or ./blib_I/, in that order. Optionally, you can configure Inline to build and install exactly where you want, using Inline::Config. In the unlikely event that Inline cannot find a place to build, it will croak.

4. Parse the source for semantic cues

Inline uses the Parse::RecDescent module to parse your chunks of source code and identify things that need run-time bindings. For instance, in C it looks for all of the function definitions and breaks them down into names and data types. These elements are used to bind the C function to a Perl subroutine.

5. Create the build environment

Now Inline takes all of the gathered information and creates an environment to build your source code into an executable object, creating all the appropriate directories and source files.

6. Compile the code and install the executable

The planets are in alignment, and all that's left is the easy part. Inline just does what users normally do to install a module on Unix systems:

    % perl Makefile.PL
    % make
    % make test     # (Inline skips this one)
    % make install

If something goes awry, Inline croaks with a message indicating where to look for more info.

7. Tidy up

By default, Inline removes all of the mess created by the build process, assuming that everything worked. If the compile fails, Inline leaves everything intact so you can debug your program. Running something like this:

    % perl -MInline=NOCLEAN example.pl

prevents Inline from cleaning up, in case you want to poke around in the blib_I directory.

8. DynaLoad the Executable

Inline uses Perl's DynaLoader module to pull your external object into Perl-space. Now you can call all of your C functions like Perl subroutines!

CPAN run. Run PAN, run.

So far, all the examples have been Perl programs, but Inline can create Perl modules as well, just like the ones found on CPAN. Modules that use C code as well as Perl are called "extension modules". This section describes how to create an extension module that can be uploaded to CPAN.

Let's create a module called Math::Simple that provides four functions: add, subtract, multiply, and divide. We'll assume you're using some kind of Unix. Execute the following commands:

    % h2xs -PAXn Math::Simple
    Writing Math/Simple/Simple.pm
    Writing Math/Simple/Makefile.PL
    Writing Math/Simple/test.pl
    Writing Math/Simple/Changes
    Writing Math/Simple/MANIFEST
    % cd Math/Simple
    % ls
    Changes  MANIFEST  Makefile.PL  Simple.pm  test.pl

The h2xs program is useful even if you're not using XS; it generates all of the files you'll need to distribute your module. The -X and -A switches prevent it from generating a lot of XS specific stuff that you won't need. The -P switch prevents the generation of sample pod documentation. Documentation is very important for a distributed module, but it gets in the way of the Inline code. Put your documentation in a separate file called Simple.pod and add an entry for it in the MANIFEST file, or use pod normally but put the C source code inside a string instead of after the __DATA__ token.

Now edit Simple.pm to look something like what's shown in Listing 3.

Listing 3: The Simple.pm module.

This should be pretty familiar stuff. The important thing is that you define $VERSION before invoking Inline. Since Inline is often invoked at compile time, it is best to put the $VERSION line inside a BEGIN block. Also notice the croak statement inside divide. This is the correct way to die from inlined C code.

Now add this line to the top of your test.pl file:

    use Inline SITE_INSTALL;

You must do this to distribute the module properly, because it ensures that the module will get installed in the proper place by the recipient. It also requires the person installing Math::Simple to use the make test command. (People sometimes skip this part of the install process, unfortunately.)

If you add the following line to Makefile.PL, it will verify that the proper version of Inline.pm is already installed on the user's system.

    PREREQ_PM => {Inline => 0.25},

Finally, run these commands:

    % perl Makefile.PL
    % make
    % make test
    % make install  # Optional
    % make dist

The make install command will install the module on your local system. When it's all working, the make dist command will produce the file Math-Simple-1.23.tar.gz. This is your complete distribution package, ready to go to the CPAN.

When the Going Gets Tough...

...the tough use Inline::Config!

Inline tries to do the right thing as often as possible. But sometimes you may need to override the default actions, and that's where Inline::Config comes in handy. It gives you fine-grained control over the entire process.

An important point to remember is that the configuration settings must be done before Inline receives the source code. Since use happens during compile time, you may need to do something like this to use Inline::Config.

    BEGIN {
        use Inline;
        $Inline::Config::PRINT_INFO = 1;
        Inline::Config::Force_Build(1);
        Inline::Config->makefile('LIBS' => ['-lm']);
    }
    
    use Inline C => "C code goes here...";

This demonstrates the three different syntaxes for setting options. You can also set options on the command line; to cut down on typing, several options have terse (and case-insensitive) command-line versions. Some examples:

    % perl -MInline=Info program.pl
    % perl -MInline=Force,Noclean,Info program.pl
    % perl -MInline=Clean program.pl

Info tells Inline to print a small report about the status of the Inlined code. Force forces a build to happen even if the cached object is up to date, and Noclean leaves the build mess intact so that you can inspect it. The Clean option tells Inline to clean up all previous messes that it knows about. (Remember, everything is under one blib_I directory, so it's a manageable mess.)

You can even get information about any installed module that uses Inline with a one-liner like this:

% perl -MInline=Info -MMath::Simple -e 42
<------------Information Section------------>

Information about the processing of your Inline C code:
    
Your module is already compiled. It is located at:
/usr/local/lib/perl5/site_perl/5.6.0/i686-linux/auto/Math
/Simple_C_1_23_9cddc5e3bf29ec8e1b4218f2de670c59
/Simple_C_1_23_9cddc5e3bf29ec8e1b4218f2de670c59.so
    
The following Inline C function(s) have been successfully bound to Perl:
        double add(double x, double y)
        double divide(double x, double y)
        double multiply(double x, double y)
        double subtract(double x, double y)
    
<------------End of Information Section------------>

There is a special option called Reportbug. When you run into a problem, and suspect that it is the fault of Inline, just issue the following command.

    % perl -MInline=Reportbug program.pl

Explicit instructions will be displayed telling you how to report the problem.

For more information about configuration issues, see the Inline::Config documentation.

XS and SWIG

This is my opinionated rant on why Inline is better than XS and SWIG. If you're already convinced that Inline is the best way to extend Perl, feel free to skip this section. SWIG (Simplified Wrapper and Interface Generator) is more or less a generic version of XS that supports other scripting languages as well. Since this rant applies equally to both methods, I will only talk about XS.

XS (External Subroutines) is a mini "glue" language that works together with the h2xs template generating tool and the xsubpp translating compiler. The basic idea is that you run h2xs against some existing C library's header files. This creates a Perl module, an XS interface file, and a Makefile.PL. Then you run the normal Perl install commands and presto, you have a Perl module that gives you full access to that library's API.

If you can get it to work that easily, then by all means use XS.

The first problem that you will undoubtedly run into is that you need to tweak each of the generated files. A lot. That means you'll need to read a lot of documentation about the format of those files. You'll do most of the tweaking in the Foo.xs file. XS gives you a dozen or so special keywords to help you tweak. Keywords like INIT, PREINIT, CODE, and PPCODE allow you to sprinkle bits of C code around the calling of the function. Knowing how all of these bits get pasted together at compile time is the stuff of legends.

Another problem is typemaps, which translate Perl data types to C and vice versa. XS provides a lot of defaults, but some of them actually update the input arguments themselves. That's good in C, but horrible in Perl. If you use these literally mapped function calls, you'll end up providing a very confusing interface from the perspective of a Perl programmer. Also, if your existing library uses any but the simplest types and typedefs, you'll have to write your own typemaps in yet another file called typemaps.

To make Inline use an existing API, you'll need to write your own wrapper function for each function you want to expose. If this seems crummy at first, consider that all of your code will be in your module, and that it will all be laid out in the true order of execution, instead of being masked by a lot of extra syntax. And you don't have to run make every time you tweak.

If you're not using an existing API, choosing Inline should be a no-brainer. One of the best things about Inline is that you can use it from a program. With XS and SWIG you always need to create a full-blown module.

On the Inside, Looking Out

As you journey beyond the examples and into more complex C programming, you may find yourself clicking your heels from time to time. "I'm not in Perl anymore!", you might say. But if you think about it, you never really left. You're merely on the dark side now. Use the force.

The full power of Perl is still at your fingertips. For example, in Perl, memory is automatically allocated each time you mention a new variable. If you add text to a string variable, Perl automatically allocates more memory. When the variable goes out of scope, all the memory is automatically freed. But in C you need to use malloc and free to manage buffers. Right?

Why not just use the power of Perl? You can ask Perl for a new anonymous scalar (SV) at any time. You can ask Perl to extend it for you, and you can even tell Perl to free it at some point after your C function returns.

Here's a simple example using a function that takes a hash reference and returns its values as a comma separated string. Of course, we'll need to build the return value in a buffer of unknown size.

Listing 4: Accessing Perl hashes from C.

If you run the code in Listing 4, you'll get:

    % perl ./hash_keys.pl
    Perl,Loves,Ingy

I've just presented you with dozen or so new calls. I leave it you to find out how they all work.

The Future

The primary goal of Inline is to make it as easy as possible to extend Perl. I'll continue to add features for debugging and other real-life situations, and I'm also considering creating Inline::C::Typemaps, which would provide a library of useful typemaps and support for adding your own.

From the start, Inline was intended to allow for programming languages other than C. Other languages I would like to support include C++, Fortran, Pascal, and Python.

On Your Mark

I did some benchmark testing on the vowels.pl program. The vowel_scan subroutine was called 1000 times with the contents of /usr/dict/words as its input string. This is a huge string (409093 bytes). It took 16.0 secs to run. That's 0.0160 secs/call.

A similar subroutine written in Perl took 2.96 secs/call, 186 times slower than C. An optimized version of this routine, which used only numeric comparisons, took 2.54 secs/call. Better, but not much.

If you think this an argument against Perl, think again. The algorithm was then coded as a Perl one-liner with creative use of the tr command.

  sub vowel_scan { $_[0]=~tr/aeiouAEIOU// / $_[0]=~tr/a-zA-Z// }

Pretty? Maybe not. Fast? This ran at 0.0169 secs/call. Less than a millisecond slower than the C function. And it still works in EBCDIC. TMTOWTDI!

_ _END_ _

Brian Ingerson (INGY@cpan.org) is a devoted member of the Seattle Perl Users Group. http://www.halcyon.com/spug/. He is also a brand new employee of ActiveState Tool Corp in Vancouver BC. In addition to Perl, he now knows Dick.

TABLE OF CONTENTS