Project 3: Remote Procedure Call

CS233/333 - Networks and Distributed Systems
Fall, 1999.

Due Date: Friday, November 19, 11:59 p.m.

1. Overview

One difficulty of using remote procedure call (RPC) with languages like C and C++ is that the programmer is required to precisely define an interface specification (usually in a separate file such as the .x files used by rpcgen). The primary purpose of this file is to identify all of the datatypes and calling conventions for the functions that will be used. It should also be noted that having this file is, in a sense, necessary since there is no easy way for a C program to simply discover the calling conventions of a library once it has been compiled (i.e., calling conventions and type information is not encoded into C libraries or executables).

More modern languages like Java and Python however, have the ability to perform "introspection." That is, a program can inspect the contents of classes and modules at run time and, in some cases, even dynamically generate code to be executed later. As a result, it is possible to support an RPC-like mechanism without requiring a special interface specification or even a stub generator.

In this project, you are going to create a module rpc.py that uses the introspection capabilities of Python to allow remote procedure calls to be made to arbitrary Python modules.

2. The RPC Module

Your rpc module should work for both servers wanting to provide an RPC service and clients that want to connect to those servers. You only need to implement three functions:

register_module(module, portmap).
Creates a new RPC service and registers it with the portmapper. module is any valid Python module (already loaded with import) and portmap is a tuple of the form (ipaddr, port) containing the IP address and port number of the portmapping service (which may live on any machine).
serve_forever().
This function is used by servers to start the RPC runtime. Once started,this function listens for incoming client connections and dispatches procedures to any of the modules that have been previously registered using the register_module() function.
remote_import(modulename, portmap).
Perform a remote module import. modulename is the name of the module as a string and portmap is a tuple of the form (ipaddr,port) containing the IP address and port number of the portmapping server. This function should return a module object with a collection of stubs that behave exactly like local procedures, but which actually execute on a remote server (see the example below). If the remote module name is unknown to the portmapper or an error occurs, this function should raise the ImportError exception.

Here is a simple example showing how your module is support to work:

Start the portmap server on some machine and port. For example:

% python portmap.py 10000
Portmapper started on gargoyle.cs.uchicago.edu:10000

Write a simple server that provides an RPC service. For example, the following code turns the "string" module into an RPC service:
```
# RPC server
import string
import rpc

rpc.register_module(string,("gargoyle.cs.uchicago.edu",10000))
rpc.serve_forever()
```

Now, start your server on some machine and try to connect to it with a client as follows:

>>> import rpc
>>> rpcstring = rpc.remote_import("string",("gargoyle.cs.uchicago.edu",10000))
Remote module 'string' loaded.
>>> rpcstring.split("Hello world")       # Makes a remote procedure call
['Hello', 'world']
>>> rpcstring.split(3)  
Traceback (innermost last):
  File "<stdin>", line 1, in ?
TypeError: argument 1: expected read-only character buffer, int found
>>>

And that's it. Hopefully, the big picture is clear.

The next few sections describe the pieces you need to implement.

3. The portmapper

The first thing you should implement is the portmapping server. All this server does is keep track of remote services. Both RPC servers and clients will contact the portmapper.

Registering a service

When an RPC server wants to publish the availability of a remote module, it should contact the portmapper and send it the following information:

The remote server's IP address and port number (on which it will receive connections by clients).
The remote module name as a string.
A list of strings containing all of the procedure names exported by that module.

This information should then be saved by the portmapper so that it can later hand it out to clients.

To illustrate, suppose you executed the following code on rustler.cs.chicago.edu:

# RPC server
import string
import rpc

rpc.register_module(string,("gargoyle.cs.uchicago.edu",10000))

This might contact the portmapper and send it the following information:

("rustler.cs.uchicago.edu", 18736, "string", [ 'atof', 
'atoi', 'atol', 'capitalize', 'capwords', 'center', 'count',
'expandtabs', 'find', 'index', 'join', 'joinfields', 'ljust', 'lower',
'lstrip', 'maketrans', 'replace', 'rfind', 'rindex', 'rjust',
'rstrip', 'split', 'splitfields', 'strip', 'swapcase', 'translate',
'upper', 'zfill'])

In this case 18736 is the port number selected by the server for client connections (the value is completely arbitrary). The list of strings starting with 'atof' simply contains all of the function names contained within the string module.

Requesting a service

When a client requests a service using the remote_import() function, it contacts the portmapper and asks it for a particular module name. If the portmapper knows about that module, it should send the above information back to the client (at which point it is up to client to contact the remote service directly). Otherwise, if the portmapper does not know about the module, it should return an error to the client.

Implementation details of the portmapper

Implementation of the portmapper should be relatively straightforward:

Implement the portmapper in a file portmap.py.
Use the socket module to write a simple portmapping server using TCP sockets.
Create a simple protocol for both registering and requesting services. (It should be a very simple protocol indeed).
Use the pickle module to marshal all of the data sent to and from the portmapper. (See the details on pickle in the Python book).
It should be possible to run the portmapper as follows:
```
% python portmap.py 10000
```
where 10000 a port number (you can pick whatever port number you want).

I would be very surprised if your implementation of the portmapper is more than 50 lines of code. Think simple (it's not much more than a dictionary, a socket connection, a few calls to the pickle module).

4. The RPC server runtime

After you have the portmapper working, create a file rpc.py and implement the register_module() and serve_forever() functions. Follow the basic steps below:

The rpc module, when first imported, should create a TCP socket for receiving incoming connections. Unlike your past assignments, you should set this up so that it binds the socket to any available port number. This is easy:

# rpc.py

import socket
sock = socket.socket(socket.AF_INET, sock.SOCK_STREAM)
sock.bind("",0)     # Assign to any port
hostname = socket.gethostname()
port     = sock.getsockname()[1]
print "socket open at %s:%d" % (hostname,port)

Implement the register_module(module, portmap) function. This is also relatively easy. First, the module argument should be a module already loaded using the import statement. Next, the first thing that your function should do is examine the contents of the module and locate all its function names. For example:
```
def register_module(module, portmap):
    mod_name = module.__name__
    func_names = [ ]
    for name, object in module.__dict__.items():
         if callable(object) and name[0] != '_':
               func_names.append(name)
```
The first statement simply extracts the string name of a module. The callable() function tests an object to see if it is callable like a function. The check for a leading underscore ('_') is needed because Python treats all function names of this form as private (when importing modules).
Next, contact the portmapper and send it a message containing the local socket address, module name, and list of functions you extracted above. The easy way to do this is to simply package everything up in a tuple (as shown earlier), run it through the pickle module and send it to the portmapper. Note: when contacting the portmapper, you should use a *different* socket than the one created above.
Finally, have the rpc module keep an internal record of all of the modules that have been registered. You will need this to handle incoming requests. This should be pretty easy, just keep a global dictionary mapping module names to module objects. For example:
```
modules = { }
...
def register_module(module,portmap):
    ...
    modules[module.__name__] = module
    ...
```
Implement the serve_forever() function. This function should start listening for incoming connections on the socket created in the first step. When a connection arrives, it should look at the incoming message and try to dispatch one of the functions contained in the registered modules. To do this, you will first need to come up with an appropriate message format. One option is to simply pass a tuple containing something like this:
```
(modulename, functionname, args)
```
Where modulename is a string containing the module name, functionname is a string containing the name of the function, and args is a tuple containing the function arguments. Given this information, it is easy to invoke a function in the module. Simply do this:
```
result = apply(modules[modulename].__dict__[functionname],args)
```
(Read the Python book to know exactly what's going on here). Of course, you will probably want to add in some security and error checking too.
As for the result, you need to follow a similar procedure. The result of a function should be packaged in a message format suitable for sending back to the client. Furthermore, if an error occurs, you should propagate the error back to the client (exceptions on the server should generate exceptions on the client). Note: an exception on the server should not cause the server to stop running so you will need to do some exception handling using try and except.

Now, a few miscellaneous implementation notes:

You should use TCP for client connections. Furthermore, a client will keep its TCP connection open the entire time it is connected. Thus, you will need to figure out how to manage multiple requests and responses going across the same connection (it shouldn't be hard).
Your RPC runtime should either use fork() or threads to allow multiple clients to connect simultaneously.
Use the pickle module to marshal and unmarshal data sent across the TCP connection. This will simplify your life considerably.
It is an error for the RPC runtime to allow a client to execute any function not explicitly registered with the portmapper. Thus, you will need to add some error checking to make sure the client isn't accessing private functions, procedures in non-registered modules, etc...

5. Client Stubs

Finally, your last step is to figure out the problem of client stubs. First, keep in mind that the real implementation of the remote functions live on the server. A stub is just a little function that lives on the client that takes the arguments passed by the user and packages them up into a network message to be sent to the server. It then needs to be able to receive the return result. In Python, this is going to be relatively easy. A stub might look something similiar to the following:

# atoi stub
def atoi(*args):
    message = ("atoi", args)
    send_message(server, message)

The *args is used to collect any sequence of function arguments into a tuple. Once in this form, it's pretty easy to package. We'll just stuff them all into a network message and let the server figure them out (if the arguments are invalid, it should send an error back).

Now, there are a few somewhat complicated details to work out on the client. However, the general idea of how this is going to work is that when the client contacts the portmapper, it is going to receive a list of function names. Using this list, you are going to dynamically generate a set of stub functions as a big Python string. Then, using some magic, you are going to execute this string to generate stub functions "on the fly"--at which point you will have a working stub module.

Here's how to proceed:

First, in the rpc.py file (created earlier), write an internal function send_message() that knows how to send an RPC call message to an RPC server. Basically, this should create a message that is compatible with what is expected by the RPC server runtime code written earlier. Use the pickle module to marshal data.
Now, start implementing the remote_import(name, portmap) function. The first thing that this function should do is establish a connection with the portmap server and see if it knows anything about the module name. If it does, you should receive the IP address, port number, module name, and a list of function names exported by the RPC service back in its response. Otherwise, raise an ImportError to indicate an unknown module.
Next, using the list of function names returned by the portmapper, continue to work on remote_import() by writing some code that generates a big string containing Python function definitions. It's going to look a little funky, but pieces of it might look like this:
```
print """
def %s(*args):
    rpc.send_message(__sock__, "%s","%s",args)
""" % (fname, modulename, fname)
```
(Note: the specific details will depend on how you have implemented things). Note, the __sock__ variable is just something I picked to indicate the socket connection to the remote server. You will need to have something like this somewhere.
Now, the magic begins. Continuing to work on the remote_import() function, you need to construct a new Python module out of thin-air. The way to do this is as follows:
```
import new
m = new.module(modname)    # Create a module object
```
Next, execute all of the stub code you placed into a string inside this new module like this:
```
# stubs = string of stub code
# execute stub code inside the newly created module
exec stubs in m.__dict__, m.__dict__
```
Patch up the newly created module with any addition information needed to make it work (such as having a reference to socket object connected to the remote server).
Return the stub module back to the caller.
And that's it.

Now, if you make it this far (as I'm sure you will), you will definitely know something about how Python operates.

6. Testing

Testing is pretty simple. I should be able to start your portmapper on some machine. Then using code similiar to that listed at the beginning, I should be able to publish modules on the network and import modules remotely.

7. Extra Credit

Make your RPC module support Python's keyword arguments.
Use UDP instead of TCP.
Modify Python's import statement to automatically contact the portmapper and remotely load a module if available.