Interprocess Communication

Before we have some papers describing comparitively recent IPC mechanisms, I'd like to spend a lecture talking about the low-level IPC primitives available on Unix systems.

Pipes

Early versions of Unix were quite weak in their interprocess communications facilities. The only real IPC facility available in early Unices was the ``pipe.''

Pipes are creating using the pipe system call (not surprisingly!). When a process calls pipe, a pipe is created and assigned two file descriptors. One of these is used for reading; the other for writing.

Pipes are used in the standard shell construct


% prog1 | prog2

When a command like this is given to the shell, the shell creates a pipe before forking prog1 and prog2. The ``write to'' end of the pipe is connected to prog1's stdout, while the ``read from'' end of the pipe is connected to prog2's stdin. Now, output from prog1 is fed to prog2's input.

This mechanism is very inflexible, not least because (1) only processes with a common ancestor can be connected with a pipe, and (2) that common ancestor has to know that its descendants will wish to communicate (in the example, the shell serves as the common ancestor, and the command line it was given tells it to have the children communicate).

Named Pipes (FIFOs)

Closely related to the pipe is the ``named pipe,'' also called a FIFO. A FIFO is an IPC channel that is given a name in the file system space, using the mkfifo system call (there is also a mkfifo command that is just a wrapper around the mkfifo system call). Once a FIFO has been created processes can open it just like a file, and write to it or read from it. The only thing is, the data that is written is not actually written to a file; it's maintained in a buffer by the kernel.

Named pipes were a huge step forward, but still suffered from only being able to be used between two processes on a single system, not over a network.

Sockets

The Berkeley 4BSD series was an almost unbelievable advance in Unix development. Among the features it added were virtual memory, shared memory, and sockets. Sockets were added in 4.2BSD. There is a lot more you can do with the interprocess communications capabilities in Unix than I'm going to talk about here; the Advanced 4.4BSD Interprocess Communication Tutorial and relevant man pages go into a lot more detail. In fact, nearly all of the information to follow is taken directly from that document.

Conceptually, internet sockets on a Unix system look like a numbered array of interprocess communication channels -- so there is a socket 0, socket 1, socket 2, and so forth. They pretty much expect to be used in a client-server relationship; a daemon wishing to provide a service creates a socket and listens to it; a client program connects to the socket and makes requests. The daemon is also able to send messages back to the client.

Even though the process of establishing a socket is asymmetrical, the actual use of the socket doesn't have to be - it's sort of like making a phone call. Making the call is asymmetrical (somebody is the caller), but the conversation needn't be.

Creating a Socket

The best way I could think of to introduce sockets is to discuss the socket system call. It looks like this:


s = socket(domain, type, protocol);

The domain is either AF_UNIX or AF_INET. An AF_UNIX socket can only be used for interprocess communications on a single system, while a AF_INET socket can be used for communications between systems. We're only going to be worrying about AF_INET sockets.

The type specifies the characteristics of communication on the socket. SOCK_STREAM creates a socket that will reliably deliver bytes in-order, but does not respect messages boundaries; SOCK_DGRAM creates a socket that does respect message boundaries, but does not guarantee to deliver data reliably, uniquely (so a packet may get delivered multiple times), or in order. A SOCK_STREAM socket corresponds to TCP; a SOCK_DGRAM socket corresponds to UDP

The protocol selects a protocol. Ordinarily this is 0, allowing the call to select a protocol. This is almost always the right thing to do, though in some special cases you may want to select the protocol yourself. Remember that this refers to the underlying network protocol: such well-known protocols as http, ftp, and ssh are all built on top of tcp, so tcp is the right choice for any of those protocols. And if you specify protocol 0 with SOCK_STREAM, that's what you get.

For example:


s = socket(AF_INET, SOCK_STREAM, 0);

will create a socket that will use TCP to communicate.

The return value (s) is a file descriptor for the socket.

Binding a Socket

At this point we've created a socket, but we haven't given it a name so it isn't very useful. We give the socket a name using the bind system call:


bind(s, name, namelen);

This call gives the socket a name. For an internet socket, the name is a struct defined as

struct sockaddr_in {
    sa_family_t    sin_family; /* address family: AF_INET */
    u_int16_t      sin_port;   /* port in network byte order */
    struct in_addr  sin_addr;  /* internet address */
};

/* Internet address. */
struct in_addr {
    u_int32_t      s_addr;     /* address in network byte order */
};

sin_family is always AF_INET; sin_port is the port number, and sin_addr is the IP address. It's a bit of a surprise to me that you have to specify the family; after all, that was already specified when the socket was created. It takes a bit of thought to realize why the IP address has to be specified - it's quite common for a machine to have more than one IP address; for instance, the gateway machine in my house has one address for the in-home network, and a second address for talking to my ISP. You can get the list of IP addresses for a host with the gethostbyname() call.

One little wrinkle on this is that port numbers below 1024 are reserved - that means only processes with an effective user id of 0 (ie the root) can bind to those ports.

Listening to the Socket

Once the socket has been created and bound, the daemon needs to indicate that it is ready to listen to it. It does this with (surprise!) the listen system call, as in


listen(s, 5);

The main thing this does is to set a limit on how many would-be clients can be queued up trying to connect to the socket (the limit in this example is 5). If the limit is exceeded the clients don't actually get refused, instead their connection requests get dumped on the floor. Eventually they will end up retrying. This will only really matter if either you've got a horribly poorly written daemon or somebody's trying a Denial of Service (DOS) attack on it.

Accepting Connections

Finally! The server is able to accept connections by calling accept:


newsock = accept(s, (struct sockaddr *)&from, &fromlen);

For this call, s is, as you'd expect the socket that was returned oh so long ago by the socket call. The accept() call blocks until the client connects to the socket.

Connecting to the daemon

A client connects to the socket using the connect call. First it creates a socket using the socket call, then it connects it to the daemon's socket using connect:


connect(s, (struct sockaddr *)&server, sizeof(server));

The interesting thing about this call is that it returns a new socket. This means the daemon can communicate with the client using the newly created (and unnamed) socket, while continuing to listen on the old one.

At this point, the server normally forks a child process (or starts a thread) to handle the client and goes back to its accept loop.

The child doing the communication can either use standard read and write calls, or it can use send and recv. These calls work like read and write, except that you can also pass flags allowing for some options.


Last modified: Mon Nov 7 09:02:14 MST 2005