How the Internet Works

A very basic introduction, from old CS 111 notes…

This page is supplementary and captures the fundamentals of what we presented in class about how the basic communication layers of the Internet work.

The Internet is a great example of using layered abstractions to solve a complex problem. Each layer solves one piece of the problem and then the next layer up can simply ignore the details below and build upon the abstraction to offer the next set of capabilities. The fundamental layered model is shown below:

Layer	Purpose	Quality	Names and Words
Application	Transfer meaningful information between two specific applications	Reliable data streams (typically)	HTTP, SMTP, many other application protocols
Transport	Transfer data between two applications	Reliable data streams (typically)	TCP, port numbers
Network	Transfer data between two global devices	Unreliable data packets	IP addresses, router, NAT
Data-link (Local Area Network)	Transfer data between two local devices	Unreliable data packets	hub, switch, wireless access point, MAC addresses, WEP, WPA

Starting at the lowest level, the data-link layer is responsible only for delivering data between two local devices. This is what an Ethernet hub or switch does, and what a wireless access point does. An Ethernet hub is a “dumb” device that simply relays an electrical signal, while a switch is smarter and only sends the data on the port that the target device is connected to. All local area communication uses MAC addresses to identify devices, which are normally written in hexadecimal with colons between the bytes (the Ethernet interface on my computer has the MAC address 00:1E:C9:BB:92:37). MAC addresses are assigned by the manufacturers to each network device (Ethernet interface card or WiFi card) – this means they end up being “randomly” distributed around the world, which in turn means that, even though they are unique, they would be terrible to use for global data communication! Since WiFi signals are broadcast and anyone can thus snoop and listen to the traffic, various encryption schemes are used such as “Wired Equivalent Privacy” (WEP) and WPA to protect traffic from snoopers.

The network layer is built on top of the local area communication abstraction, and it encapsulates the ability to deliver data between devices anywhere in the world. It is the backbone of the Internet, and is the narrow center of the hourglass model in the Blown To Bits appendix A. The network layer is implemented by the IP protocol, and this protocol is packet-oriented, meaning that it delivers small data packets from one device to another. IP is unreliable, meaning that packets can be lost or even duplicated! The network layer uses IP addresses to identify devices, and these are assigned by groups to entities such as NMSU, which means that IP addresses are organized throughout the world and thus make very good addresses for global data delivery. IP addresses are four bytes and are normally written in “dotted decimal” notation. One of my department computers has the IP address 128.123.64.36, and indeed NMSU owns all of the IP addresses beginning with 128.123! Even though when surfing the web we use names like www.nmsu.edu for computers, it really is the IP address that matters (a system called DNS, the Domain Name System, is responsible for translating from a name to an IP address).

To deliver data packets globally, the Internet is made up of a bunch of connected devices called routers. The routers inspect the destination IP address of every data packet, figure out what direction that address exists on, and sends it on to the next router in the right direction. Eventually the data packet gets to a router that is on the local network of the device, and that router then uses the local area network (data-link layer) capability to deliver the data packet to the destination device.

Today we have many many devices on the Internet, more than we have available IP addresses. But each one gets an IP address! How does this work? Well, it is called Network Address Translation, or NAT. The key thing to realize is that the proper IP address must exist only on the public Internet, but in private a network can use any IP address they want. So, for example, NMSU assigns private IP addresses to devices that connect on AggieAir, and then when a device want to communicate out into the public Internet, the router for AggieAir temporarily replaces the private IP address with a public one, and when the data communication comes back it does the reverse. This way many many devices can share a few public IP addresses, and so we can have more devices than actual public IP addresses!

Now that the network layer can deliver data to any global device, we can build the capability to deliver data to a real application – after all, it is our applications (like our email program, or our web browser, or our multi-player game) that need to communicate, not our devices. The transport layer implements this capability. With the device identified by an IP address, the transport layer then identifies applications by port numbers. Well-known applications typically use an assigned well-known port to identify them. For example, web servers use port 80, and email servers use port 25. Your web browser can use any port, because it first contacts the web server, and in doing so it tells the web server what port it is using. It is generally only the servers that need well-known port numbers. The other thing that the transport layer does, and this is the heart of the TCP protocol, is to build a reliable, stream-oriented data communication service on top of the unreliable IP service. Applications do not want to deal with missing, duplicated, or out of order data – they want to send and receive data relliably and in order. TCP does this.

Finally, then each application must decide on its own data formats and protocols to communicate. So, for example, web servers and browsers use HTTP (HyperText Transfer Protocol), and email clients and servers use SMTP (Simple Mail Transfer Protocol) and IMAP (Internet Mail Access Protocol) or POP (Post Office Protocol). These application protocols are widely varied, but they build on TCP/IP and the data-link layer and only have to worry about how they deal with their own data, not with all the other network stuff – that’s been taken care of in the lower layers!

That’s the power of abstraction!