Device Drivers

Standard IO Operations

One of the real strengths of the abstractions provided by modern operating systems is the extent to which the same operations are used for access to all devices: so, a program can take its input from files, networks, keyboards.... all without modification to the program.

In the POSIX world, there are six important IO operations which must be supported by a device driver. This brief introduction will discuss what the functions do but will not attempt to fully describe the parameters or the possible errors. For complete information, read the man pages.

int open(const char *pathname, int flags, mode_t mode);
The open() function connects the process to a file or a device driver. Its return value, called a file descriptor, is passed to the other functions in this list as the fd parameter.

ssize_t read(int fd, void *buf, size_t count);
This function obtains data from the device or file. You pass it the address of the buffer you want to put the data in and the amount you want to read. The return value is the number of bytes returned to you.

It should be mentioned that the number of bytes can be less than requested without error. For instance, if you are reading from the keyboard, then (in the default mode) you will actually only have a single line of input returned to you. In some cases it may even return 0 bytes, which means the data source will not be providing any more data. It's only an error if it returns -1.

ssize_t write(int fd, const void *buf, size_t count);
This is, in some sense, the opposite of read(): it is used to write data out to a device or a file.

off_t lseek(int fd, off_t offset, int whence);
When you're reading or writing a file, the OS keeps track of where in the file you are at the moment. lseek() moves this pointer.

There are some devices which do not support lseek().

int ioctl(int fd, int request, ...);
ioctl() is a catch-all function for operations on devices that don't fit into the read(), write(), lseek() paradigm. The meanings of the parameters are completely up to the particular device driver.

int close(int fd);
This is used to disassociate the device or file from the process.

It is the responsibility of each device driver to have functions that provide these functions. This is accomplished by a C struct called a struct file_operations. I'll be using the line printer device driver as something as a running example here; on my Debian system, this driver is in /usr/src/linux-source-2.6.12/drivers/char/lp.c. In this file, we can see this definition as

static struct file_operations lp_fops = {
	.owner		= THIS_MODULE,
	.write		= lp_write,
	.ioctl		= lp_ioctl,
	.open		= lp_open,
	.release	= lp_release,
#ifdef CONFIG_PARPORT_1284
	.read		= lp_read,
#endif
};

lp_write, lp_ioctl and so forth are functions defined in the driver to provide functions described above, as well as a few others (the close() call doesn't end up needing to be directly implemented in the driver).

In Linux, when a device driver is initialized it calls either register_chrdev() for a character device or register_blkdev() for a block device (we'll be discussing block devices and character devices later). The parameters of these functions are a major number, the device name, and a pointer to a struct file_operations. Taking a quick look at the lp driver, we can see that it calls register_chrdev with the parameters

register_chrdev (LP_MAJOR, "lp", &lp_fops);

There are two arrays in the kernel, one of struct file_operations pointers for character devices and one of struct file_operations pointers for block devices. The register_chrdev() and register_blkdev() functions insert the struct file_operations pointers into the index specified by the major number.

In addition to the major number, which identifies the device type, each device is identified by a minor number which says which device of that type is being referred to.

It is, once again, the responsibility of the device driver to map the minor number to the particular device (in this case, one of the line printers). This is most normally done with an array in the device driver. In the line printer driver, the table is declared as

static struct lp_struct lp_table[LP_NO];

where struct lp_struct (declared in /usr/src/linux-src-2.6.12/include/linux/lp.h) contains flags and other parameters regarding the state of each line printer on the system and LP_NO is #defined to be 8 (with a reminder in the code that if you've got more printers than this you'd better change the #define!

Special Files

A program communicates with a device driver through a special file. As we discuss elsewhere, a special file is a file whose inode identifies it as being not an actual file, but instead as the connection to a device driver. The inode identifies the device as being either a block or a character device, and gives a major number and a minor number.

The major number in the inode is the same as the major number in a device driver, as described above. The minor number specifies which instance of the device should be associated with the special file. Typically, the devices will be numbered 0, 1, 2, and so forth. Taking a look on viper.cs.nmsu.edu, we can see

viper:104% ls -l /dev/lp*
crw-rw----  1 root lp 6, 0 2005-03-19 12:36 /dev/lp0
crw-rw----  1 root lp 6, 1 2005-03-19 12:36 /dev/lp1
crw-rw----  1 root lp 6, 2 2005-03-19 12:36 /dev/lp2

we can see three printers in /dev, named /dev/lp0, /dev/lp1, and /dev/lp2. The first character on the lines in the listing (the c) says these are character special files; the 6, 0, 6, 1 and 6, 2 identify the three special files with major device 6 and minor devices 0, 1, and 2.

This also demonstrates a flaw in the scheme: there are not, in fact, any line printers on viper. You won't find this out until you try to open() one. Some systems provide a way to make sure that only special files corresponding to actual devices appear in /dev; at home I run a daemon called udev which only creates special files when it recognizes that a device is actually present.

Character Devices

Character devices are the sorts of things we normally think of as "real" IO devices: things like keyboards, mice, serial ports...

In general, character device drivers are pretty simple: write() just copies the data into an internal buffer, while read() just copies an internal buffer into a user-provided buffer. An interrupt handling routine transfers the data between the driver's buffer and the device.

Block Devices

The main point of block devices is that they are in some sense structured repositories of data. They are addressable, in the sense that we can specify a location to put data at, and they really don't behave like IO devices, as instead of communicating with the real outside world they give us a place to store and retrieve data (with read-only devices serving as a special case).

Because of this, we can be much more aggressive in optimizing access to block devices. We can schedule reads and writes to occur in completely different orders than specified by a program; we can buffer data within the operating system. So long as the program-order semantics specified are honored, the device driver has done the right thing.

Disk Drives

It's probably a good idea to review disk hardware here a minute...

The basic idea is that we have a disk with magnetic material on it, and read/write head that flies above the disk. The surface of the disk is divided up into "tracks", defined by where the head is, and "sectors", which are divisions of the tracks. Here's a picture of a 500 GB Seagate Barracuda:

Seagate Barracuda ST3500641AS

The disk drive will normally have several disks (called platters) like I described above, on a single spindle. The corresponding tracks on the different platters are called cylinders; disks have historically been addressed using a three-dimensional cylinder-head-sector scheme. Recent disks have been addressed using a logical block number on the disk.

One of the things that needs to be mentioned about disks is that they are just about the only part of computer systems which haven't been speeding up much over the years. Comparing the 18GB disk described in the book with the Seagate in the picture, they actually rotated the disks at the same speed, and the average seek time is slower on the newer disk!

Block Device Drivers

Block device drivers uses buffers to store data for transfer to and from the device. One buffer corresponds to a block of data on the disk. A structure called a buffer_head servers as a descriptor for each buffer; it contains information such as whether the buffer is empty, up to date, dirty, etc.

When something needs to read or write a disk block, it creates a block device request in the "high-level" device driver. This request specifies the block to be operated on, and the operation (read or write) to be performed. The request is never satisfied immediately; instead, it is scheduled to be performed. The kernel control path never has to wait for the request to be satisfied (though, on a read, the process may be put in a resource wait state).

Once the request has been created by the high-level driver, it is passed to a low-level driver to actually be satisfied. The low-level driver is actually interrupt-driven; when the disk satisfies one request there is an interrupt, and the low-level driver is activated to provide the disk with another request. A "strategy routine" takes pending requests and passes them to the disk drive.

Each block device driver maintains its own request queues; there will be one for each physical block device, and the queues are ordered so that the requests attempt to optimize disk behavior.


Last modified: Thu Nov 17 17:45:16 MST 2005