Before we have some papers describing comparitively recent IPC mechanisms, I'd like to spend a lecture talking about the low-level IPC primitives available on Unix systems.
Early versions of Unix were quite weak in their interprocess communications facilities. The only real IPC facility available in early Unices was the ``pipe.''
Pipes are creating using the pipe system call (not
surprisingly!). When a process calls pipe, a pipe is
created and assigned two file descriptors. One of these is used for
reading; the other for writing.
Pipes are used in the standard shell construct
When a command like this is given to the shell, the shell creates a pipe before forking% prog1 | prog2
prog1 and prog2. The
``write to'' end of the pipe is connected to prog1's
stdout, while the ``read from'' end of the pipe is
connected to prog2's stdin. Now, output from
prog1 is fed to prog2's input.
This mechanism is very inflexible, not least because (1) only processes with a common ancestor can be connected with a pipe, and (2) that common ancestor has to know that its descendants will wish to communicate (in the example, the shell serves as the common ancestor, and the command line it was given tells it to have the children communicate).
Closely related to the pipe is the ``named pipe,'' also called a
FIFO. A FIFO is an IPC channel that is given a name in the file
system space, using the mkfifo system call (there is also
a mkfifo command that is just a wrapper around the
mkfifo system call). Once a FIFO has been created
processes can open it just like a file, and write to it or read from
it. The only thing is, the data that is written is not actually
written to a file; it's maintained in a buffer by the kernel.
Named pipes were a huge step forward, but still suffered from only being able to be used between two processes on a single system, not over a network.
The Berkeley 4BSD series was an almost unbelievable advance in Unix development. Among the features it added were virtual memory, shared memory, and sockets. Sockets were added in 4.2BSD. There is a lot more you can do with the interprocess communications capabilities in Unix than I'm going to talk about here; the Advanced 4.4BSD Interprocess Communication Tutorial and relevant man pages go into a lot more detail. In fact, nearly all of the information to follow is taken directly from that document.
Conceptually, internet sockets on a Unix system look like a numbered array of interprocess communication channels -- so there is a socket 0, socket 1, socket 2, and so forth. They pretty much expect to be used in a client-server relationship; a daemon wishing to provide a service creates a socket and listens to it; a client program connects to the socket and makes requests. The daemon is also able to send messages back to the client.
Even though the process of establishing a socket is asymmetrical, the actual use of the socket doesn't have to be - it's sort of like making a phone call. Making the call is asymmetrical (somebody is the caller), but the conversation needn't be.
The best way I could think of to introduce sockets is to discuss the
socket system call. It looks like this:
s = socket(domain, type, protocol);
The domain is either AF_UNIX or
AF_INET. An AF_UNIX socket can only be used
for interprocess communications on a single system, while a
AF_INET socket can be used for communications between
systems. We're only going to be worrying about AF_INET
sockets.
The type specifies the characteristics of communication
on the socket. SOCK_STREAM creates a socket that will
reliably deliver bytes in-order, but does not respect messages
boundaries; SOCK_DGRAM creates a socket that does respect
message boundaries, but does not guarantee to deliver data reliably,
uniquely (so a packet may get delivered multiple times), or in order.
A SOCK_STREAM socket corresponds to TCP; a
SOCK_DGRAM socket corresponds to UDP
The protocol selects a protocol. Ordinarily this is
0, allowing the call to select a protocol. This is
almost always the right thing to do, though in some special cases you
may want to select the protocol yourself. Remember that this refers
to the underlying network protocol: such well-known protocols as
http, ftp, and ssh are all built on top of tcp, so tcp is the right
choice for any of those protocols. And if you specify protocol
0 with SOCK_STREAM, that's what you get.
For example:
will create a socket that will use TCP to communicate.s = socket(AF_INET, SOCK_STREAM, 0);
The return value (s) is a file descriptor for the
socket.
At this point we've created a socket, but we haven't given it a name
so it isn't very useful. We give the socket a name using the
bind system call:
This call gives the socket a name. For an internet socket, thebind(s, name, namelen);
name is a struct defined as
struct sockaddr_in { sa_family_t sin_family; /* address family: AF_INET */ u_int16_t sin_port; /* port in network byte order */ struct in_addr sin_addr; /* internet address */ }; /* Internet address. */ struct in_addr { u_int32_t s_addr; /* address in network byte order */ };
sin_family is always AF_INET;
sin_port is the port number, and sin_addr is
the IP address. It's a bit of a surprise to me that you have to
specify the family; after all, that was already specified when the
socket was created. It takes a bit of thought to realize why the IP
address has to be specified - it's quite common for a machine to have
more than one IP address; for instance, the gateway machine in my
house has one address for the in-home network, and a second address
for talking to my ISP. You can get the list of IP addresses for a
host with the gethostbyname() call.
One little wrinkle on this is that port numbers below 1024 are reserved - that means only processes with an effective user id of 0 (ie the root) can bind to those ports.
Once the socket has been created and bound, the daemon needs to
indicate that it is ready to listen to it. It does this with
(surprise!) the listen system call, as in
listen(s, 5);
The main thing this does is to set a limit on how many would-be clients can be queued up trying to connect to the socket (the limit in this example is 5). If the limit is exceeded the clients don't actually get refused, instead their connection requests get dumped on the floor. Eventually they will end up retrying. This will only really matter if either you've got a horribly poorly written daemon or somebody's trying a Denial of Service (DOS) attack on it.
Finally! The server is able to accept connections by calling
accept:
newsock = accept(s, (struct sockaddr *)&from, &fromlen);
For this call, s is, as you'd expect the socket that was
returned oh so long ago by the socket call. The
accept() call blocks until the client connects to the socket.
A client connects to the socket using the connect call.
First it creates a socket using the socket call, then it
connects it to the daemon's socket using connect:
connect(s, (struct sockaddr *)&server, sizeof(server));
The interesting thing about this call is that it returns a new socket. This means the daemon can communicate with the client using the newly created (and unnamed) socket, while continuing to listen on the old one.
At this point, the server normally forks a child process (or starts a
thread) to handle the client and goes back to its accept
loop.
The child doing the communication can either use standard
read and write calls, or it can use
send and recv. These calls work like
read and write, except that you can also
pass flags allowing for some options.