Networks and Topologies

Some Terms

there are two kinds of people: those who divide the world up into two kinds of people, and those who don't

There are a large number of ways we can classify networks, depending on what we're interested in. Most of these are orthogonal...

Regular vs. Irregular
A regular network topology is defined in terms of some sort of regular graph structure (such as rings, meshes, hypercubes, etc); an irregular topology isn't. We tend to talk more about regular topologies, since it's possible to analyze them reasonably; even discovering the topology of an irregular network can be a challenge. Regular topologies are used in applications such as parallel processors and small LANs; irregular topologies are used in larger LANs and in internets (the Internet is extraordinarily irregular -- and there are even tools out there for attempting to discover the topology of the internet near your site!).

Static vs. Dynamic
There are two basic ways to construct a network: we can use the processors themselves as the routing nodes, or we can let the processors and memory sit ``outside'' the network and have specialized switching nodes transfer the messages. The former is a static network; the latter is a dynamic network (the idea is that with a static network you can only send a message to your neighbors, while in a dynamic network you can drop a message with routing information and the network can get it anywhere). As it turns out, we can easily find examples of both dynamic and static networks for nearly any topology we care to come up with.

Circuit Switching vs. Packet Switching
Second, there are two basic ways to set up the communication paths in a network: we can put each packet on the net with either routing information or just information about its destination, or else we can set up all the switches once and let all the packets follow the path that was established. The first way is called packet switching, the second is circuit switching. Historically the phone system used circuit switching; now just about everything uses packet switching.

Note: while just about everything today is packet-switched, the way it is normally presented to the user is through virtual circuits.

Source-based routing vs. others
We also have a choice of sending a packet along with just the destination address, leaving the network to figure out how to get the data to its destination, or actually specifying the full route in the header. The old Usenet bang-paths specifying e-mail addresses were an example of source-based routing. Today, we almost never see source-based routing; we always let the network links do the routing (note: the term ``source-based'' routing has been recycled in recent years to refer to making routing decisions based on the source of a packet. This is a completely different and unrelated use of the term, and is in fact used in an environment that is not source-based as we are using it here).

Store-and-forward vs. Wormhole routing
Normally, we think of data being shipped through a network a packet at a time: we send the packet to the first intermediate node, then on to the second, and so forth. This is called store-and-forward routing. An alternative which has become popular recently is wormhole routing. Remember that ordinarily, a packet contains a header with routing information followed by a payload containing the actual data, probably followed by a checksum or something to guarantee integrity. This implies that once the header has arrived at a node, it's possible to make routing decisions and pass it along immediately, rather than waiting for the entire packet to arrive first. This is called wormhole routing, in analogy with a worm crawling through a wormhole. Wormhole routing dramatically reduces latency, but creates new possibilities for deadlock.

Blocking vs. non-blocking vs. rearrangeable
In a dynamic network, a question that arises is whether it's possible to realize any permutation of the sources and destinations. If not, it's a blocking network; if so, it's non-blocking. One last, somewhat counterintuitive to my mind, possibility is that it's possible to have a network in which there is more than one possible path from sources to and you have to find the right one to take to avoid blocking. This is called a rearrangeable network, because you can rearrange your paths to fix blocking.

Topologies

Good-fast-cheap: pick two.

Analogy to interconnetion networks: bandwidth, latency, cost: pick 2.

All of these factors are usually expressed in big-O notation, to maintain technological independence. So the latency is measured by the number of hops required from source to destination and the cost is measured by the required number of switches or network interfaces. There are two ways to measure bandwidth: We can measure the aggregate bandwidth (how many hosts can simultaneously send messages) for a performance measure, or we can measure the bisection bandwidth (how many links have to break before the network is cut in two halves) for a fault-tolerance measure. I've always like total bandwidth precisely because it measures performance; I have to admit, though, that it's maybe not as interesting as bisection bandwidth since just about every network topology has an aggregate bandwidth of O(n).

We can put some bounds on just how good any of the three parameters can get, and use them for gold standards: obviously, we can't do better than constant latency. Since we have n processors, and each of them needs an interface to the network, our cost can't get better than O(n). And even if we come up with a network that could support better than O(n) bandwidth, the processors can't provide data faster than that, so O(n) is our best bandwidth.

There are standard examples of networks that each meet two of the three criteria well:

Bus, Ring, and Completely Connected topologies

As we can see, with a bus every node is connected to a single wire. If we assume the time to communicate on that wire is negligible then the distance from any node to any other node is the same: one "hop." The bus is also as cheap as we can get: every node has exactly one network interface. Unfortunately, we can only send one message on the bus at a time.

The ring quite directly trades latency for aggregate bandwidth. The cost is the same as for the bus, as each node has exactly one transmitter and one receiver. The bandwidth is much better than for the bus, since every node can simultaneously be sending a message to it neighbor. Unfortunately, the latency becomes very bad since we typically have to send a message across several intermediate nodes for it to reach its destination.

Finally, the completely connected network optimizes both bandwidth and latency, but is very expensive. The distance between any two nodes is exactly one (since they're directly connected), and every node can be sending a message simultaneously. Unfortunately, every node needs a network tranceiver connecting it to every other node.

We can summarize the costs of these three networks in the following table. The table shows what an ideal (but not realizable) network's behavior would be for each of the parameters, and compares these three real networks to it. In all cases, N is the number of nodes in the network.

TopologyBandwidthLatencyCostBest At
AggregateBisection
idealO(N)O(N)O(1)O(N)All, of course!
busO(1)O(1)O(1)O(N)latency, cost
ringO(N)O(1)O(N)O(N)bandwith, cost
completely
connected
O(N)O(N)O(1)O(N2)bandwidth, latency

The ring and the crossbar both provide good examples of how we can have both static and dynamic examples of topologies. You can either implement a ring in which each processor is directly connected to the next, or one in which there is a ring of switches, and a processor sends a message by dropping it into the switches. Likewise, a crossbar is a dynamic network; the static equivalent is a completely connected graph of all the processors (the wires in the static network take the place of the switches in the dynamic).

One last point to make on comparing these topologies, and deciding between them, is that the relative importance of the parameters is changing over time. Not long ago, the cost of a crossbar network of any reasonable size was completely prohibitive; today, a 32-node crossbar isn't unreasonable at all. Similarly, the latency of a mesh was far too slow; using wormhole routing has made it into a contender.

Hypercubes

Now, let's try to get a compromise between these. So, instead of picking two, we'll pick part of all three. There is a whole family of networks that turn out to be topologically equivalent to hypercubes of various sizes.

We can start by looking at a static hypercube network: put several interface cards on each node, and connect them directly (this is what the Caltech Cosmic Cube did, remember).

Building a Hypercube

We can see the recursive construction of a hypercube in the following sequence of images. The algorithm for constructing a hypercube of dimension n is:

if (n == 0)
    draw a node
else
    draw two hypercubes of dimension n-1
    put arcs between corresponding nodes of the two hypercubes

In the figure, the new arcs connecting the two subcubes in each step are shown with darker lines.

This also gives us a scheme for building up a unique address for each node in an n-dimensional hypercube:

use the first bit to decide which of the two sub-hypercubes the node is in.
use the remaining n-1 bits as an address within that sub-hypercube.

The following figure shows how this scheme labels the nodes in a 3-cube:

cube (3-d hypercube)

The 3-cube has eight nodes, which this figure labels in binary from 000 to 111. Notice that the least significant bit (the 1's place) is used for the left-right dimension; the next bit (the 2's place) is the up-down dimension, and the most significant bit (the 4's place) is forward-back.

We can get from any node to any other node in at most log N hops (unless there is contention) so the latency is is O(log N). There are N processors, each with log2N interfaces, so the cost is O(N log N). and all the processors can use their links simultaneously, so our aggregate bandwidth is O(N). The bisection bandwidth is O(log N).

The translation from a static hypercube to a dynamic one is a step that tends to cause a lot of heartburn. So let's take a look at how we can do this.

First, let's impose some arbitrary restrictions on how we transport stuff: on any given cycle, we will only use the links along one direction. So we'll go front/back (if needed) on the first cycle, up/down on the second, and left/right on the third. This means we only process the most significant address bit on the first cycle, the second bit on the second cycle, and so on until you can swap the least significant bit on the last cycle.

We can draw a flow diagram showing how this goes on each cycle, like this:

multistage hypercube

The idea is that this shows how a packet is routed between the nodes in each time unit. The dark line shows an example, the route from node 110 to node 011. On time unit 1, node 110 transfers the data to node 010. On time unit 2, the data stays put. On time unit 3, node 010 transfers the data to node 011.

Now we can take the last step of the communication, and replace the direct communication between the nodes with a switch box:

Change last level to switch boxes

We can do the same thing for the other levels, but this will require rerouting some wires to make the switch boxes fit:

Omega network

If we look carefully at the spaghetti nest of wire leading to each switch stage, we can see that it performs a perfect shuffle -- so we have a shuffle-exchange network. This is a blocking network -- can't simultaneously route 2->3 and 6->1, for instance. If the switch boxes are also capable of broadcasting (selecting one of their inputs and sending a message on both outputs), this is an "Omega" network.

We can also go back to a static shuffle-exchange network, like this:

Static shuffle-exchange

In the static shuffle-exchange, the curved arcs take the same role as the shuffles between the switch boxes in the Omega, and the straight arcs take the same role as the switch boxes. This gives us, once again, an O(log n) latency and an O(n) aggregate bandwidth; the bisection bandwidth is always always 4 (so it's O(1)) and the nodes always have 3 links, so the cost is O(n).

Note that most pictures of the static shuffle-exchange show links from 000 around back to itself, and similarly for 111.

One other interesting property of the perfect shuffle: it does the same thing as left-rotating the address of the node by one bit.

Multidimensional Meshes and Toroids

A mesh is a generalization of the hypercube, in which we have more than two nodes along a dimension. The most popular meshes are 2- and 3-dimensional meshes; here's a picture of a 2-d mesh:

2D Mesh

Meshes have O(n) cost, O(sqrt(n)) bisection bandwidth, O(n) aggregate bandwidth, and O(sqrt(n)) latency. This latency was regarded as unacceptable when store-and-forward was the order of the day, but they have become quite popular as wormhole routing has become more common.

If we wrap the ends around, we have a toroid instead of a mesh; a two-dimensional toroid is normally drawn so it looks a lot like a donut but I'm not going to try to render it like that!

2D Toroid

A ring is actually a one-dimensional toroid.