PREVIOUS  TABLE OF CONTENTS  NEXT 

FTP: File Transfer Using Perl

Graham Barr

In my last article I showed you how to use Perl to create, process and send e-mail. I also gave a brief introduction to the Simple Mail Transfer Protocol (SMTP), one of the many protocols used across the Internet. In this article and those that follow I'll introduce some of the others.

In particular, this article will show you how to create an FTP (File Transfer Protocol) client. You've probably used the program ftp, a program which is a user interface to the FTP protocol. The difference between the two is subtle but important, because the program I'll develop in this column is also an interface to FTP.

It might not surprise you to hear that most of the work has already been done: a Perl 5 module, Net::FTP, interprets the FTP protocol for you. So you won't have to mess with the nuts and bolts of FTP (defined in RFC 959) to have your program send and receive files all by itself.

FTP is a client-server protocol. That is, there's a server which listens for connections on an agreed-upon port address (FTP uses 21 by default). Once a connection is made, the server allocates a new port for communication with the client. This leaves port 21 free to accept the connection from the next client. The client and server communicate conversationally, with the client sending commands defined in the FTP protocol to the server, and the server sending responses back to the client. This is the architecture for many well known protocols on the Internet such as SMTP, NNTP, and HTTP.

Here's an example of a conversation between an FTP server and a client. It shows what communication is necessary to connect, login, change directory and retrieve a file. The commands sent from the client to the server are shown in bold.

220 ftphost FTP server (SunOS 4.1) ready. 
USER anonymous 
331 Guest login ok, send ident as password. 
PASS perl-journal-staff@perl.com
230 Guest login ok, access restrictions apply. 
CWD pub 
250 CWD command successful. 
PWD 
257 "/pub" is current directory. 
PORT 127,0,0,1,16,110 
200 PORT command successful. 
RETR testfile 
150 ASCII data connection for testfile (127.0.0.1,4206) (0 bytes).
226 ASCII Transfer complete. 
QUIT 
221 Goodbye. 
 

The FTP protocol actually uses two connections: one for the commands just shown, and one for the actual data transfer. The PORT command tells the server which socket address the client is using. The server uses this information (4 IP octets and a 2-byte port address) to make the data connection.

You will see from the examples that Net::FTP simplifies this interface by keeping track of the status and providing methods for each of the commands.

My first program contacts a Comprehensive Perl Archive Network (CPAN) site and retrieves all modules that have been uploaded within a given number of days. First, initialization:

#!/usr/bin/perl  
# Load the Net::FTP package 
use Net::FTP;  
use File::Listing qw(parse_dir);  

# Look for files under 7 days (in seconds), 	
$age =      7*24*60*60;  

# Change this to the name of your nearest CPAN host,
$CPANhost = 'CPAN';  

# A likely path to the CPAN/modules directory 	
$CPANpath = '/mirrors/CPAN/modules';  

Now we need to construct a Net::FTP object which will talk to the remote server. The Net::FTP constructor takes, as arguments, the FTP hostname followed by some options:

# Create a new Net::FTP object, changing the
# timeout to 60 seconds
$ftp = Net::FTP->new($CPANhost, Timeout => 60) 
               or die "Cannot contact $CPANhost: $!";  

Once a connection has been made, the login() method must be called before any other methods. login() takes three optional arguments: login, password and account.

If no arguments are supplied, Net::FTP searches the .netrc file in your home directory (on UNIX machines).( Net::Netrc, which is what interprets the .netrc file, has only been tested on UNIX platforms.) If no login information is found, the login defaults to "anonymous."

When doing .netrc lookups, Net::FTP performs certain security checks, just like the ftp program. You must own the file, and nobody else should be able to read or write to it. If these checks fail, Net::FTP ignores your .netrc.

If no password is given and the login is 'anonymous' then Net::FTP guesses your e-mail address and sends it as the password.

The third argument is account information which might be required by the FTP server. For anonymous FTP it's unnecessary.

# We'll login to the ftp server as anonymous; 
# specifying a login id prevents a .netrc lookup. 

$ftp->login('anonymous') 
   or die "Can't login ($CPANhost):" . $ftp->message;  

O.K. - so the server has accepted us. Little does it know that we aren't mere surfers! First we need to change directory to the root of the CPAN modules and retrieve a recursive directory listing.(Recursive directory listings take a long time on large filesystems. That can annoy FTP site maintainers, so only do this when necessary.) By changing directories first, we reduce the size of the listing and therefore the time required to transmit it.

# Change the working directory 	
$ftp->cwd($CPANpath) or 
    die "Can't change directory ($CPANhost):" . 
        $ftp->message;  

# Retrieve a recursive directory listing 
@ls = $ftp->ls('-lR');  

Before we start to transfer the files we need to tell the FTP server what type of file we're expecting. Different machines store files in different ways - what a wonderful world we live in. That's why FTP supports multiple transfer modes:

However, only two of these are supported by Net::FTP: ASCII and IMAGE. In binary (IMAGE) mode the files are transferred as is, but in ASCII mode some translations, such as <CRLF> to <NL>, can be performed.

  
# We probably want binary, 
# although some files may be ASCII
$ftp->binary();  

Now we have a recursive directory listing in @ls and an FTP connection in $ftp. We use the parse_dir() subroutine in the File::Listing module to split our directory listing into its components. (File::Listing is available in the libwww distribution in the CPAN.)

From these components we can access the filename, the last time the file was written, and its type, which can be one of l, d, or f, representing links, directories, and files.

foreach $file (parse_dir(\@ls)) { 	    
    my($name, $type, $size, $mtime, $mode) = @$file;  

    # We only want to process plain files, 	    
    # we shall ignore symbolic links 	    
    next unless $type eq 'f';  

    # Check age of file against $age 	    
    # $mtime is a UNIX time: seconds since 1 Jan 1970
    # $^T is the time this script started.
    if ($^T - $mtime < $age) { 		
        print "Retrieving ", $name, "\n";  

        # Get the file from the ftp server 		
        $ftp->get($name) or 
            warn "Couldn't get '$name', skipped: $!";
    }
}
# Close the connection to the FTP server. 

$ftp->quit or die "Couldn't close the connection 
                      cleanly: $!";  

# We're done! 
exit;  

Before I go into more detail you'll need to know the four commands made available by FTP for retrieving and storing files:

Now if you want to get adventurous and speed up transfer, you can use multiple FTP connections managed either by multiple processes or by a select() call. The latter is demonstrated below, with several Net::FTP objects, one per connection.

#!/usr/bin/perl  

use Net::FTP; 	
use File::Listing qw(parse_dir); 	

# We'll need to open and write some files 	
use FileHandle;   

# Look for files under 7 days (in seconds), 
$age = 7*24*60*60;  

# Change this to the name of your nearest CPAN host
$CPANhost = 'CPAN';  

# The path to the CPAN/modules directory on most CPAN hosts
$CPANpath = '/mirrors/CPAN/modules';  

# Create the initial connection 	
$ftp = connection();  

# Retrieve a recursive directory listing 
@ls = $ftp->ls('-lR');  

# Set the transfer mode to binary 	
$ftp->binary or die "Cannot set binary mode: $!";  

# Create a list of files we want to get 	
@files = (); 	
foreach $file (parse_dir(\@ls)) { 	    
    my($name, $type, $size, $mtime, $mode) = @$file;  

	    # We only want to process plain files 	    
    next unless $type eq 'f';  

	    # Compare the age of file to $age 	    
    if ($^T - $mtime < $age) { push(@files, $name) }  

	}
# The maximum number of connections to make 	
$max_connection = 4; 	
$max_connection = @files   if @files < $max_connection;  

# Create a list of connections. We already have one: $ftp.
@ftp = ($ftp);  

for($i = 1 ; $i < $max_connection ; $i++) { 	    
    my $ftp = connection();  
	$ftp->binary or die "Cannot set binary mode: $!";  
	push(@ftp, $ftp); 	
}  

print "Using ", scalar(@ftp), " connections,\n"; 	
print " to download ",scalar(@files)," files.\n";  

# Keep a list of data connections 	
@data = ();  

# We'll start off with an empty file set.
$fdset = "";  

# Prime the ftp servers with RETR commands 
while(@ftp && @files) { 	    
    my $ftp  = shift @ftp; 	    
    my $file = shift @files; 	    
    my($data,$fh) = init_xfer($ftp, $file); 	 	        
	push(@data, [$data, $fh]); 	
}  

# Close any unused connections 	
while (@ftp) { 	    
    my $ftp  = shift @ftp; 	    
    $ftp->close or warn "Can't close connection cleanly: $!"; 	
} 

We now have several FTP data connections to the same server, each in charge of one file. To service all of these connections simultaneously, we need select() to tell us when there's data to be read. We loop for as long as there is data to read; on each iteration, up to 1024 bytes are read from any descriptor with data available. If an EOF is found, the descriptor is closed. If there are still more files to be retrieved, a new file is requested on the corresponding command socket. This creates another descriptor. If there are no more files to transfer, the command socket is closed - when the list of data descriptors is empty, we'll know the transfer is complete.

# Loop while we have connections. They'll be closed and
# removed from @data when transfers finish and @files
# is empty.

while (@data) {
      $nfound = select($rout=$fdset, undef, undef, undef);
      next unless $nfound; 
      die "select: $!" 		if ($nfound == -1); 
      my @d = @data;

      # Empty @data, connections will be added back into @data
      # if they're still in use later.

	    @data = ();  
	    foreach $con (@d) { 		
                my($data, $fh) = @$con;  
                # Do we have data waiting on this connection?  
                if (vec($rout, fileno($data),1)) { 		    
                    my $buf = "";

                    # Read some data. This may block if there's
                    # less than 1024 bytes ready for reading. To
                    # reduce the blocking time, use a smaller number.

                    my $l = $data->read($buf, 1024); 
                    die "Error reading data: $!"  if $l < 0; 
                    if ($l) {

                            # Write the data to the local file 
                            syswrite($fh, $buf, $l)

            } else {

                # The data transfer is complete, so we can 
                # close the data connection 

                my $ftp = finish_xfer($data, $fh); 
  
                # Reuse the FTP connectiopn if there are
                # files left to retrieve.

                if (@files) { 
                    my $file = shift @files;
                    @$con = init_xfer($ftp, $file);
                } else { 

                # close the FTP connection and remove it
                # from @data

                        $ftp->close or
                           warn "Can't close connection: $!";

                        # the connection is no longer in use 
                        undef $con;
                } 
            } 
        	} 

        # If the connection is still in use, return it to
        # @data 
        push(@data, $con) if defined $con; 
    } 
	} 

And finally, the three subroutines we've been using: connect(), init_xfer(), and finish_xfer().

# Create a new connection to the ftp server  
sub connection {    # Create a new NET::FTP object  
        $ftp = Net::FTP->new($CPANhost, Timeout => 60) 
               or die "Can't contact $CPANhost: $!";  
        # We shall login to the ftp server as anonymous; 	          
        $ftp->login('anonymous') 
               or die "Can't login ($CPANhost):" . 
               $ftp->message;  
        # Change the working directory  
        $ftp->cwd($CPANpath) or die 
               "Can't change directory ($CPANhost):". 
               $ftp->message;  
        return $ftp; 
    }  

# Initialize a file transfer 
sub init_xfer {
    my($ftp,$file) = @_; 

    # Send the retr command, and get a file descriptor 
    # for the socket 
    my $data = $ftp->retr($file) or die
               "Can't retrieve file '$file': $!";  

    # Store all files locally, in the current directory  
    my ($path) = ($file =~ m!([^/]+)$!); 	    

    # Open a filehandle to the local file  
    my $fh = FileHandle->new($path, "w") 
             or die "Cannot open file '$path': $!";  
    print "Retrieving $file as $path ...\n";  

    # Add data connection into fdset for select()     
    vec($fdset, fileno($data), 1) = 1;  
    return ($data, $fh); 
}  

# Cleanup after a file transfer has completed  
sub finish_xfer { 	    
    my($data, $fh) = @_;  

    # Get the ftp command object  
    my $ftp = $data->cmd;  

    # Remove data connection from fdset for select()  
    vec($fdset, fileno($data), 1) = 0;  

    # Close the data connection  
    $data->close or warn "Cannot close 
              data connection: $!";  

    # Close the local file  
    close($fh) or  warn "Can't close filehandle: $!";  

         return $ftp; 
	}  

As you can see, the whole problem becomes a lot more complex, fun, or obscure, depending on how twisted you are.

So far we've looked at transferring files to and from one server. But what if we have two remote servers and want to transfer a file from one to the other? FTP contains a powerful facility for doing this, but first let's consider the obvious solution.

You could transfer the remote file to the local filesystem and then transfer it to the other remote server. Better would be to connect to each of the servers simultaneously, and perform sequential reads and writes between them using the local machine as a waystation. The code for this is shown below.

 
#!/usr/bin/perl  

use Net::FTP;  

# Create connections to both remote servers...
$ftpf = Net::FTP->new('from') or die 
   "Cannot connect to 'from': $!";  
$ftpd = Net::FTP->new('dest') or die 
   "Cannot connect to 'dest': $!";  

# ...and login to them.
$ftpf->login('anonymous') or die "Can't login to 'from'";  
$ftpd->login('anonymous') or die "Can't login to 'dest'";  

# Place both servers into the correct transfer mode. 
# In this case I'm using ASCII. 
$ftpf->ascii() &&  $ftpd->ascii() or die 
    "Can't set ASCII mode: $!";  

# Send the RETR command to the source server 	
# and obtain a file descriptor  
$ffile = '/pub/testfile';  
$fdf = $ftpf->retr($ffile) 
    or die "Can't retrieve '$ffile': $!";  

# Send the STOR command to the destination server 	
# and obtain a file descriptor  
$sfile = '/pub/outfile';  
$fdd = $ftpd->stor($sfile) or die "Cannot store '$sfile': $!";  

# Read and write the data between the two file descriptors  
while ($fdf->read($buf,1024)) { 	
    $fdd->write($buf, length $buf); 	
}  

$fdf->quit() &&  $fdd->quit() or die 
    "Can't close connections: $!";  
$ftpf->quit() && $ftpd->quit() or die 
    "Can't quit ftp connections: $!";  

While this is better than reading the whole file to the local filesystem and re-sending it, this process is still not as good as it could be. Consider the situation when the file in question is rather large, say over 10MB. It takes a long time to transfer just once, and here we're actually transferring it twice which could, potentially, double the transfer time. For those who pay by the minute, this could get expensive.

This is where the PASV ("passive") command comes in handy. Assuming that both of the remote servers can connect to one another, you can transfer the file directly:

#!/usr/bin/perl  

use Net::FTP;  

# Create connections to both remote servers...
$ftpf = Net::FTP->new('from') or die 
    "Can't connect to 'from': $!";  
$ftpd = Net::FTP->new('dest') or die 
    "Can't connect to 'dest': $!";  

# ...and login to them.
$ftpf->login('anonymous') or die "Can't login to 'from'";  
$ftpd->login('anonymous') or die "Can't login to 'dest'";  

# Place both servers into the correct transfer mode. 
# In this case I'm using ASCII. 
$ftpf->ascii() &&  $ftpd->ascii() or die 
    "Can't set ASCII mode: $!";  

# Send the PASV command to the destination server. 
# This returns a port address.
$port = $ftpd->pasv or die 
    "Can't put FTP host in passive mode: $!";  

# Send the port address to the source server so it 
# knows where to send the data.
$ftpf->port($port) or die "Error sending port: $!";  

# Send the RETR and STOU commands to the servers  
$rfile = '/pub/testfile';  
$ftpf->retr($rfile) or $ftpf->ok or die 
    "Can't retrieve '$rfile': $!";  
$sfile = '/pub/outfile';  
$ftpd->stou($sfile) or die "Can't store '$sfile': $!";  

# Wait for the transfer to complete  
$ftpd->pasv_wait($ftpf) or die "Transfer failed: $!";  

$fdf->close() && $fdd->close() or die 
    "Can't close connections: $!";  
$ftpf->quit() && $ftpd->quit() or die
    "Can't quit ftp connections: $!";  

After creating the connections, and placing them in the correct transfer mode, we send the destination server a PASV command. This tells the server, for the next command, that it should listen on a port for a connection instead of making the connection itself. The PASV command returns the port at which it is listening. We then send this information to the source server with a PORT command, which tells the server where to make the data connection for the next command. Once this is done we send the two commands, which start the transfer between the two servers, and wait for the transfer to complete.

The programs in this article are available on CPAN at modules/by-author/id/GBARR/ftp_eg.tar.gz and on the TPJ web site.

__END__


PREVIOUS  TABLE OF CONTENTS  NEXT