Implementing the X Protocol in Perl

Mark Eichin

The X Window System, developed by MIT's Project Athena, provides networked graphics on workstations. It has become the de facto standard for Unix graphics, and is available in numerous other environments. Back in 1991, I began implementing the X protocol in Perl 4. I dropped it for a while, and then picked it up again a few months ago as a chance to try some of the new Perl 5 features. The code currently covers more than half of the protocol requests, with several demonstrations: a graphical "Hello, World", Perl implementations of xwininfo and xdpyinfo, and programs to set background image and enable backing store.

Note that this is not Perl/Tk, nor is it a version of the X library written in C and interfaced to Perl via XS. At its heart, X is a binary network protocol; and therefore Perl provides some useful tools for speaking it. This lets you create X applications in Perl with more power than Perl/Tk while still being extremely portable - this code should behave on any Perl for which use Socket works - allowing client code to run on VMS, NT, OS/2, or any of the other non-Unix platforms to which Perl has been ported.

X has two components: a server that controls the actual graphics hardware and listens for connections, and client applications that send requests and interpret responses from the server. There are 120 different requests - dealing with windows, pixmaps, drawing, fonts and text, keys and keyboards, extensions, and connection management. Many of these requests trigger replies; the OpenFont request, for example, returns an identifier for later use. In addition to replies and errors, "events" are generated by the server to tell the client that something interesting has happened, such as keypresses or mouse motion.What's important to remember is that the client doesn't have to perform "graphics" per se; it merely has to speak a network stream protocol. The server handles the dirty work of allocating windows, colors, bitmaps, and fonts.

Why do this?

While most graphical applications want a more complete toolkit, there are some small applications that can benefit from Perl's portability and platform independence. X is known for requiring a lot of code to accomplish the smallest task - but the clickable "hello world" is around 40 lines of Perl, most of which is setting specific parameters for the initial window. The "backing store" program is around 85. xwininfo, weighing in at 1000 lines of C code in the X distribution, is emulated in Perl below. While this brevity is due in part to some simplifying assumptions, it's also due to Perl's capacity for expressing things at broad and specific levels simultaneously.

Volume Zero of the O'Reilly X Window System series specifies the X11 protocol. It's a convenient and well organized reference, but it would be a lot of work to type it all in. As we all know, laziness is a cardinal virtue, and luckily all of the constants and protocol structures are also contained (albeit in a more haphazard manner) in the header files that come with the X Library - in particular, <X11/Xproto.h>. It's already online, just not in the form we want (namely Perl code), so I wrote parse-xheader.pl, a 450 line Perl script, that scans the X headers looking for useful things. It does not have to actually parse C code - it merely has to recognize the stylized #define and typedef lines used in the X headers. Rather than handling obscure syntax robustly, it need only employ a few rules for handling the rarer constructs. For example, the only case of a struct containing a union is the typedef struct _xEvent in Xproto.h; instead of trying to convert this to some general type in Perl, we just pick apart the pieces and treat them separately.

The Perl distribution comes with h2ph, a simple program that serves a similar purpose. However, h2ph assumes too much about how close C syntax and macros are to legitimate Perl. It also doesn't flag code that it is unsure about - it simply uses eval() to protect questionable translations. parse-xheader provides diagnostics when translation fails, which allowed me to improve the most important features first and postpone the less critical translations.

parse-xheader has an easy time identifying constants. They all end up in a hash called %xlib::defines, although a few also proceed directly to the X:: namespace - values beginning with X_ in C are more commonly used and therefore merit their own namespace in Perl. The parser extracts three kinds of data from any struct: size, pack() format, and type. The pack() format is used to split the bytes into an array.

Originally, parse-xheader built up the arrays in memory, and then opened a connection; this was quite slow, and I soon adopted the optimization of parsing once and writing all the arrays out to disk in Perl form. I learned several lessons from attempting to optimize that further. The most important: always profile! Intuition about Perl performance is very hard to come by; some of the obvious speedups aren't actually speedups at all. I used the simplest of profiling methods: write a short program that covers the use of the feature that you care about, and run it a few times using the Unix "time" command. There are more sophisticated techniques available for finding bottlenecks in a system - and Perl itself provides the DProf and Benchmark modules.

The first step in any X application is connecting to the server. This means picking apart the $DISPLAY environment variable into a hostname and a "display number". Normally, a TCP connection is made to the host at port numbered 6000 plus the display number. If the hostname is left out, a local Unix domain socket (stored in /tmp/.X11-unix/X0) is used instead. In either case, use Socket gives us access to the right values for PF_INET, PF_UNIX, and SOCK_STREAM without having to rely on h2ph (which is inaccurate on some platforms) or hardcoded values (which aren't portable - SOCK_STREAM is different between Linux and Solaris, for example.)

Once we've created the socket, we treat it like any other Perl filehandle. In Perl 4, filehandles were somewhat hard to pass around; standard practice was to pass a string which was the name of the filehandle. This left the filehandles in the global namespace, which creates the possibility of a namespace collision. Perl 5 gives us "globrefs", which lets us use a reference to the whole entry, string and filehandle alike. Furthermore, the new IO package helps us manage the namespace better. In my system, the mkport() subroutine, which is what actually handles creating the socket, simply accepts a reference to a filehandle and works with that, because it evolved from Perl 4 code that did the same with just the name.

Hello, World

Let's look at a simple X application: a "Hello, World" program called xhello.pl that speaks the X protocol. It's a lot longer than its Perl/Tk equivalent, but it does something Perl/Tk doesn't let you do: get under the hood.

First, we'll open the display...

#!/usr/bin/perl
require "./xbase.pl";
$xopen = &xlib::x_open_display($ENV{"DISPLAY"});
die $xlib::status if not defined $xopen;

...and choose a font. "fixed" is guaranteed to exist, so we'll use that.

$fnt = xlib::x_openfont($xopen, "fixed");

Now we'll create the window: 8 bits per pixel, at x=100 y=200, with dimensions 300 pixels wide and 400 tall, and a border 5 pixels thick. It's an "InputOutput" window whose background color (BackPixel) of black, and accepts a set of events (button presses and releases, expose events, and keypresses). This statement won't make it appear - that won't happen until we explicitly "map" the window.

$win = xlib::x_create_window($xopen, 8, xlib::defaultroot($xopen),
			100, 200, 300, 400, 5,
			$X::defines{"InputOutput"},
			$X::defines{"CopyFromParent"},
			$X::defines{"CWBackPixel"}
			| $X::defines{"CWEventMask"},
			$xlib::root_black,
			$X::defines{"ButtonPressMask"}
			| $X::defines{"ButtonReleaseMask"}
			| $X::defines{"ExposureMask"}
			| $X::defines{"KeyPressMask"}
		);

Given the window, we then create a graphics context, or GC. This is sort of a "paintbrush" for the window - it's in the GC that many of the common features of drawing such as colors, thickness, and font are set. Here, we set the foreground to white, the background to black, and the font to the "fixed" font we opened earlier.

$gc = xlib::x_create_gc($xopen, $win,
		$X::defines{"GCForeground"}
		| $X::defines{"GCBackground"}
		| $X::defines{"GCFont"},
		$xlib::root_white, $xlib::root_black, $fnt);

Now we clear the window (which just paints it with the background color.) The width and height of "0" are special, and dictate that the full remaining width and height will be used for the operation.

$st = xlib::x_clear_area($xopen, $win, 0,0,0,0,0);

Next, we set up the event handlers. First, an "Expose" handler that actually does the work of drawing the "hello world" text. In the X model, the server tells you when part of a window has become visible. You (the client application) are responsible for updating it. Your notification comes via an Expose event, caught by this handler:

$xlib::handler{"Expose"} = sub {
	xlib::x_imagetext8($xopen, $win, $gc,
					50, 75, "hello world");
};

If the user presses a key (a Keypress event), we use the function xlibconvert() to extract the details from the event and debugxlib() to print its contents. This can easily be expanded to do particular things based on what key the user pressed.


 $xlib::handler{"KeyPress"} = sub {
    my $rep = xlib::xlibconvert("keyButtonPointer",
                                $xopen->{"readqueue"});
    xlib::debugxlib($rep); 	

 };

Now we map the window, which creates an expose event...

xlib::x_mapwindow($xopen, $win);

...and now we wait for events to occur. There is a default handler that just prints the raw event structure, which in this case will trigger on mouse clicks.

while(defined($st = xlib::handle_event($xopen))) {
	print "event: $st\n";
}

A left mouse click (which really consists of a "mouseDown" event and a "mouseUp" event) results in the following being printed to the terminal from which xhello.pl was launched:

u:
		type: 4
		detail: 1
		sequenceNumber: 6
event: 0
u:
		type: 5
		detail: 1
		sequenceNumber: 6
			
event: 0

Currently nothing will cause the program to exit (other than the user killing it) but for good form, we close the display. X automatically cleans up everything on the server side when the associated connection goes away, but this function will also clean up the contents of the hash.


 	xlib::x_closedisplay($xopen);

That's it! The brevity of this program affords flexibility: We can exert precise control over how the X protocol is manipulated, or hide it behind abstractions.

In the process of developing my system, I used two Perl features that have little to do with X but might prove educational: the scalar() function and closures.

Why I use `scalar()`

Perl has plenty of semi-sophisticated list-manipulating features, such as splice(), map(), grep(), and foreach. Still, there are times when you simply want to know the length of a list. As always, "there's more than one way to do it" - but some ways are worse than others.

One method derives from the shell notation $#. The scalar $#array is the index of the last element of the @array. If array indexing starts from 0 (as it should; the ability to modify that with $[ is deprecated in Perl 5) then the number of elements in the array is given by $#array+1.

Another method derives from the distinction in Perl between array context and scalar context. An array, when evaluated in an array context, yields the array itself; in a scalar context, it yields the length. For example, $len = @name sets $len to the number of elements in @name. This makes the code harder to read. And while it might work in that particular assignment, consider a user subroutine; routine(@name) always passes all the elements to the routine. You can force the context with routine(0+@name) or routine($len=@name), but it's a little ugly.

An easy mistake is the use of length(). If @name has 251 elements, length(@name) is 3. Why? Because length provides a scalar context, so @name evaluates to 251; length(251) is 3, because 251 has three digits. Unless you're trying to calculate logarithms, this is not what you want.

Perl provides a way to get the right answer and make it clear: the scalar() function, which provides a scalar context to its argument. Thus, $len = scalar(@name) will do the exact same thing as $len = @name. That's why I use it in my code: it's both correct and unambiguous.

How I use closures

In X, the getxwd() routine operates on an in-memory string containing an "X Window Dump" format image, splitting it apart into size and pixel values. This format is designed for saving screen dumps, rather than for image processing; the header simply describes how the series of bytes is padded and ordered into pixels. This makes it easy to redraw the screen.

In my system I also needed to be able to read a file containing that data, which the X getxwd() doesn't do. For ease of maintenance, I decided to abstract out the difference between the original getxwd() and my own.

Rather than have two completely different routines, I made the original into a getxwd_core() function which takes one argument, a "reader" function. getxwd_core() doesn't need to know where the data come from - only that it can call the &$readfn(length) function to get it.

In order to do this from "outside" the original function, I rely on a feature that is more widely known in languages like Scheme and Lisp, called closures. Closures are tricky to explain because they "just happen" as a side effect of the way lexical scoping (the my() declaration) works. Think of closures as function templates: you define the template when you write your program, and complete the function definition at runtime.

The getxwd_str() subroutine, shown below, creates and returns a closure. Here we simply define the xsubstr() function, which performs the extraction with substr() and keeps track of the offset with $ptr, and then we return our newly created function. xsubstr() can "see" $xwdbits and $ptr when it is defined, so it can see them later when it gets invoked. That's the beauty of closures.

sub getxwd_str {

	my($xwdbits)=@_;
	
	my $ptr = 0;
	
	sub xsubstr {
	
	my $sz = shift;
	
	my $v = substr($xwdbits,$ptr,$sz);
	
		$ptr += $sz;
		
		return $v;
		
		}
		
		getxwd_core(\&xsubstr);
	}

Note that $xwdbits and $ptr are now tied to &xsubstr(); if the reference to &xsubstr is returned, the references to $xwdbits and $ptr will survive even though the names go out of scope at the end of &getxwd_str(). Closures aren't a feature you'll need very often, but when you do you'll be glad Perl supports them.

In Conclusion...

Perl may not be the best language for X programming, but even at the lowest level it certainly is possible. By starting with the network protocol itself, we can get complete control, without a special Perl installation or shoehorning things into C-style interfaces. Yet we don't need much additional code to do some useful things. Above all, Perl 5 provides a wealth of new features that enable new abstraction techniques, leading to better, or at least better organized, programs.

My system, including all code in this article as well as several additional client applications, can be found at http://tpj.com and at http://www.mit.edu/people/eichin/x-perl.html.

_ _END_ _

Mark Eichin works on computer and network security for Cygnus Solutions, where some roll their eyes, and some cheer, at the things he does with Perl.

TABLE OF CONTENTS