PREVIOUS  TABLE OF CONTENTS  NEXT 

Signals, Sockets, and Pipes

Steve Lidie

I'm relatively new to the Unix world; it wasn't until 1991 that our monolithic mainframe was replaced by several large Unix servers. When meeting these new beasts, I was surprised to find that the things known as "filesystems" had a maximum size of two gigabytes - I was used to 64 bit words and maximum filesystem sizes measured in terabytes. These days two gigs can be gobbled up quite easily; for instance, a high quality color scan of a photograph might produce a 42 megabyte file. A finite element analysis might require several hundred megabytes of scratch space and an hour of digital video hogs 20 gigabytes of disk. Wow.

In short order I grew tired of manually monitoring these newfangled workstations and their filesystems, so I threw together a Perl TCP/IP client-server to do the dirty work. Once a minute the client connected to a server (often called a daemon) running on a monitored machine, and the server executed a Unix df command and returned the output. The client took this data, reformatted it, and wrote it to a log file which could be dynamically viewed with tail -f. This scheme sufficed for several years since it provided ample lead-time to take corrective action before a filesystem became critically low on space.

When Tk for Perl arrived I immediately dreamt of colored bars on a X display that expanded and contracted as disk usage varied over time. Blending Tk with the original code resulted in a package collectively known as monds, short for "monitor disk space." The issues involved have more to do with signals, sockets, and pipes than regular Perl/Tk, so my column will be a little different this time. Anyway, here's what monds looks like:

The conversion was slightly painful because, as we saw in TPJ #3, mixing socket events with Perl/Tk X events can make the application hang. That's why the simple-minded client was completely redesigned to separate the Tk code from the network code. The solution: spawn a child process to perform the client socket duties, and use a pipe to pass information between the client and the Tk parent. The Tk code never blocks when reading the pipe because it uses Perl's select() function to ensure that df data is available.

The Tk parent and child client employ a simple communications protocol that describes how the processes exchange information. This means that the monds system requires bi-directional pipes. One direction is used when the parent speaks to a child; the other is used when the child speaks to its parent.

When the child client is activated it immediately requests df data from its associated server, writes the filesystem data to its output pipe, and then waits for an ACK (acknowledge) by attempting to read from its input pipe. Meanwhile, the Tk parent periodically polls its input pipes (there's one for each monitored machine), reads the filesystem data, and acknowledges a child's response by sending a newline. When the child receives that ACK, the cycle repeats. This simple scheme synchronizes the entire application while still allowing X events to flow.

monitor disk space

The monds protocol also provides for rudimentary error reporting. If there are network problems, or a child client cannot connect to its server, the child sends one of several canned messages to the parent, who then posts the alert message on the X display.

The Monitor Disk Space TCP Daemon

The simplest part of monds is the remote server. No doubt many of you have already written your own client-server pairs, based on examples from Perl books, but this server is even more concise because inetd, the Unix Internet daemon, does all the work.* Here is a good example of my inherent laziness:


#!/usr/bin/perl -w 

# 

# monitor_disk_space_daemon - transmit 'df' 

# output to the asynchronous TCP/IP task 

# spun-off by monitor_disk_space.



print '/bin/df 2>&1'; 

print "END_OF_DF\n";


A single backticked df written to STDOUT, followed by a marker line, and that's it - inetd automatically connects Perl's filehandles to the network sockets! Filesystem data is now outward bound for the client who initiated this connection.

The Monitor Disk Space TCP Client

Although monds comes with its own client, Unix already provides us with an alternative: telnet. All we need to do is specify the monds TCP port number on the command line. Typing this:


% 	telnet Turkey.CC.Lehigh.EDU 10346

produces output similar to this:


Trying... 

Connected to turkey.CC.Lehigh.EDU. 	

Escape character is '^]'. 

Filesystem Total KB     free %used iused %iused Mounted on 

	/dev/hd4      12288     8052 34%     785  25%  / 

	/dev/hd9var   12288     7620 37%     144   3%  /var 	

/dev/hd2     303104     1612 99%   18496  24%  /usr 	

/dev/hd3      12288     1336 89%     412  10%  /tmp 	

/dev/lv00     20180     2808 86%    1718  27%  /vice/cache 	

AFS        72000000 72000000  0%       0   0%  /afs 	

END_OF_DF

That df output comes from a SYSV-like Unix machine. Here's output from a BSD-like server:


Trying... 

Connected to turkey.CC.Lehigh.EDU. 

Escape character is '^]'. 

Filesystem  1024-blocks  Used  Available  Capacity  Mounted on 	

/dev/hda4     247871  171858  63212     90%   / 

	/dev/hda1     523968  389952  134016    74%   /dosc 

	/dev/hda2     189300  112014  77256     94%   /dosd 	

END_OF_DF

The representations differ, but monds normalizes them into a common format as the data arrives.

The monds client is similar to the telnet client in that it expects an IP address and port number and writes the incoming df data to STDOUT, but there the similarity ends. Instead of exiting, the client then issues a read on STDIN, waiting for the go-ahead to re-connect to the server for its next df sample. Just like telnet, the monds client can be run interactively; later we'll see how the Perl/Tk parent creates the bi-directional pipes and attaches them to the client.

Here is the entire client, compact enough that we can see at a glance how it implements its half of the monds protocol.


#!/usr/bin/perl -w 

# 

# monitor_disk_space_client - request 'df' data from

# monitor_disk_space_daemon and feed to 

# monitor_disk_space.



require 5.002; 

use English; 

use IO; 

use strict;



do {print "Usage: monds_client host port\n"; exit} 

  if scalar @ARGV != 2; 



STDOUT->autoflush(1); # always flush output buffer



sub timeout {

    $SIG{ALRM} = \&timeout; 

    print "Socket Timeout\n";

}



# Read socket data into a list until END_OF_DF detected; only

# then output to our parent's pipe. This ensures that the

# parent never blocks reading input data, since select()

# won't know we have data for our parent until we actually do.

# Cycle after we receive the ACK from monds.



while (1) { 

  my $sock = IO::Socket::INET->new(PeerAddr => $ARGV[0],

                                   Proto    => 'tcp',

                             PeerPort => "monds($ARGV[1])");

   if ($sock) { 

        	my(@sd) = (); 

        # prevent infamous "Alarm clock" problem 

        	$SIG{ALRM} = \&timeout;	

        	alarm 60; 	

        while(<$sock>) { 	

            push @sd, $ARG; 

            last if /^END_OF_DF$/; 	

        } 	

        alarm 0; 

        	print ((/^END_OF_DF$/ and $#sd > 0) ? @sd 

          : "Daemon Failure\n"); 

    } else { 

        	print "Cannot Connect\n"; 

    } 

    $_ = <STDIN>; 		# wait for go-ahead from monitor_disk_space 

}


At the top of the loop IO::Socket::INET->new() attempts to connect to the client's peer daemon using the IP address and port number specified on the command line. Assuming the connect succeeds and the END_OF_DF marker is detected, the data is printed. It might go to a pipe connected to the Perl/Tk parent, or simply STDOUT if the client is executed interactively.

The client also reports error conditions like "Cannot Connect", "Socket Timeout" or "Daemon Failure". In particular, it's free to use Unix signals like SIGALRM to timeout socket reads; if the Perl/Tk parent were to attempt this, it would surely crash the application.

Connecting the TCP Client to the Perl/Tk Parent

The next hurdle is to propagate the filesystem data to the Perl/Tk program monds; the basic mechanism is taken directly from O'Reilly's Programming Perl and relies on pipe(), fork(), and exec() system calls. monds uses the global hash %CHILD to manage its multiple client connections and bi-directional pipes. Each %CHILD key is the hostname of a monitored machine, and the key's value is a reference to another hash with three keys, so %CHILD is a hash of a hash with three elements. Thus for a machine named 'dandy' we have this structure:

{'dandy'}->{pid} child process ID (for KILLing)
{'dandy'}->{pr}  filehandle of parent read pipe
{'dandy'}->{pw}  filehandle of parent write pipe

With all that in mind (put on your thinking caps please), here's the code that fires off the clients:

$READ_BITS = '';		     # "bitlist" of parent filehandles
my($fh) = ('fh0000');	# indirect filehandle names
my($cr, $cw);	        	# child read and write filehandles
foreach (@{$OPT{hosts}}) {
	$cr = $fh++;
	$CHILD{$ARG}->{pw} = $fh++;
	pipe($cr, $CHILD{$ARG}->{pw}) or abort 'cr/pw pipe';
	$CHILD{$ARG}->{pr} = $fh++;
	$cw = $fh++;
	pipe($CHILD{$ARG}->{pr}, $cw) or abort 'pr/cw pipe';
	if ($CHILD{$ARG}->{pid} = fork) { # parent
		close $cr;
		close $cw;
		$CHILD{$ARG}->{pw}->autoflush(1);
		vec($READ_BITS, fileno($CHILD{$ARG}->{pr}), 1) = 1;
	} elsif (defined($CHILD{$ARG}->{pid})) { # child
		close $CHILD{$ARG}->{pr};
		close $CHILD{$ARG}->{pw};
		open(STDIN,  "<&$cr") or abort 'STDIN open';
		open(STDOUT, ">&$cw") or abort 'STDOUT open';
		open(STDERR, ">&$cw") or abort 'STDERR open';
		STDOUT->autoflush(1);
		STDERR->autoflush(1);
		exec("$LIBDIR/monds_client",$ARG,$PORT)
			or abort 'exec';
	} else {
		abort 'fork';
	} # if fork
} # for each monitored machine

First, the global $READ_BITS is cleared. This variable represents a bit list of the Unix file descriptors (not filehandles) available for reading. You'll learn more about $READ_BITS as we proceed.

Then the variable $fh is initialized with an indirect file handle template; that is, the value of $fh is the name of the file handle in question, rather than a plain Perl filehandle bareword like STDOUT. We do this for simplicity, so filehandles can be generated dynamically without regard to the number of clients.( Within reason, of course. The code depends upon Perl's ability to increment a string, and as long as we donít try to create more filehandles than more than your supports, space-time behaves normally.)

For each host, inbound and outbound pipes are created, a requirement for the two-way protocol. The fork() call creates a clone subprocess of the Perl/Tk parent, and the flow of the code splits, like a fork in a road, depending upon whether the parent or child is executing it. If fork() succeeds it returns the Unix process ID of the child to the Perl/Tk parent, and zero to the child. Later we'll see why it's important for the parent to record the PID of all its children.

The parent cares nothing about the child's filehandles so it immediately closes them. Then it unbuffers its write filehandle, builds up the select() bit list, and goes about its way.

To understand exactly what vec() is doing, examine this statement (executed within the Perl debugger):

DB<1> @f = (fileno STDIN, fileno STDOUT, fileno STDERR))
DB<2> print "@f"
0 1 2

We see that fileno() maps a Perl filehandle to a small integer - the Unix file descriptor of the filehandle. Each file descriptor represents an index in the select() bit list. So each assignment to vec($READ_BITS, ...) flips on a particular bit, building it up so that eventually there is one bit set for every parent read pipe.

Like its Perl/Tk parent, the child closes filehandles it doesn't need. It then prepares to invoke the TCP client in the standard Unix way: by connecting its read and write pipe filehandles to STDIN, STDOUT and STDERR, and calling exec() (with three arguments: the client's path name, an IP address, and port number) to overlay the client upon itself.

The Perl/Tk Parent's Secondary Main Loop

Once all the clients are running, monds enters its main loop, which coexists with, but is independent of, the ubiquitous Perl/Tk MainLoop().


$MW->repeat($poll_interval, \&poll_clients);

Subroutine poll_clients() is nominally called every minute, at which time it collects and processes all available df data, and then sends acknowledgements as required.


sub poll_clients {

  my($rbits, $nfound); 

  $nfound = select($rbits = $READ_BITS,undef,undef,undef); 

  return if $nfound == 0;

  my(@go_ahead) = ();		  # ACK hosts for another 'df' 

  my(@host_list) = @{$OPT{hosts}};

  PROCESS_ALL_HOSTS: while ($nfound > 0) { 	

      my $them = shift(@host_list); 	

      if (vec($rbits, fileno($CHILD{$them}->{pr}),1) == 0){ 

          # if no incoming data from this client

	          next PROCESS_ALL_HOSTS; 

      } 

      $nfound--; 	

      push @go_ahead, $them;	 # give a "go-ahead" signal

      # parent's (monds') read filehandle

      my $fh = $CHILD{$them}->{pr}; 



    PROCESS_HOST_FILESYSTEMS: 	while (&lgt;$fh>) { 	 

         last PROCESS_HOST_FILESYSTEMS if /^END_OF_DF$/;

         # Process file system data here ...

     	}

  } # whilend PROCESS_ALL_HOSTS

  foreach (@go_ahead) { 	

      print {$CHILD{$ARG}->{pw}} "\n";     

  }     

  display_poll_results;

} 

Here's our friend $READ_BITS again, that bit list of all possible input file descriptors. We copy it into $rbits. select() then checks the specified file descriptors, and for file descriptors with no input data, it clears the corresponding bits in $rbits. When select() returns, $rbits now has bits set only for those file descriptors have input data, and $nfound tells us how many there are.

Using vec(), the data collection loop finds a viable client, pushes its name on the acknowledgement list, and proceeds to read and process the client's df data. Once all the df data has been read, each client is given its ACK, the poll results are displayed, and the cycle repeats.

Further Considerations

Termination. Unix etiquette dictates that the Perl/Tk parent properly terminate its children, so this code is bound to the application's Quit button:


foreach (@{$OPT{hosts}}) { 

    	kill 'SIGTERM', $CHILD{$ARG}->{pid} 

      if defined $CHILD{$ARG}->{pid}; 

}

Efficiency. As usual, the marginal utility of an optimiziation isn't always worth the time and effort needed to create, implement, and test the change. However, the monds daemon is loaded, compiled, and executed 1440 times per day, and so it's worth looking at. One possibility is to translate the Perl code to C, resulting in a compact, lightweight process that would load very fast. Another approach is to do away with inetd altogether and write a true monds daemon that loads once and accepts and handles network connections itself. Then again, perhaps 1440 loads per day is mere noise, so the choice is up to you.

Intelligence. The current monds warning scheme is overly simplistic because it only notices a filesystem when its usage exceeds 90%. Suppose you have an empty two gigabyte filesystem, and a process, for whatever reason, starts writing a file upon it at 200,000 bytes per second. Assuming no disk limits, that filesystem becomes completely filled in 2e9/2e5 seconds, or about 2.78 hours, and monds will post its first alert with 16 minutes left. But if your filesystem had only 200 megabytes. That same process will fill the filesystem an order of magnitude faster, in around 16 minutes, and monds would only yelp with less than two minutes to go! So although the rate of filling is identical in each case, the percentage of disk usage is radically different, and that's what you care about. This is an interesting idea to throw into the heuristic stew - scale the threshold criteria so that every filesystem, regardless of size, is provided an identical "reaction time period."

Action. But wait, there's more! Since monds has a modicum of intelligence, why not let the program load-level a machine's filesystems? By that I mean actually move directories from a file system low on space to one with excess space. If monds was supplied with a list of filesystems among which it was permitted to distribute directories, we can imagine a Perl/Tk window like this:

We can take this idea even further and envision load-leveling AFS volumes across the Internet, but now we need to deal with multiple mount points possibly spread across multiple AFS severs, a topic, as they say, beyond the scope of this column. 'Nuff said.

_ _END_ _


PREVIOUS  TABLE OF CONTENTS  NEXT