One of the real strengths of the abstractions provided by modern operating systems is the extent to which the same operations are used for access to all devices: so, a program can take its input from files, networks, keyboards.... all without modification to the program.
In the POSIX world, there are six important IO operations which must be supported by a device driver. This brief introduction will discuss what the functions do but will not attempt to fully describe the parameters or the possible errors. For complete information, read the man pages.
int open(const char *pathname, int flags, mode_t mode);- The
open()function connects the process to a file or a device driver. Its return value, called a file descriptor, is passed to the other functions in this list as thefdparameter.
ssize_t read(int fd, void *buf, size_t count);- This function obtains data from the device or file. You pass it the address of the buffer you want to put the data in and the amount you want to read. The return value is the number of bytes returned to you.
It should be mentioned that the number of bytes can be less than requested without error. For instance, if you are reading from the keyboard, then (in the default mode) you will actually only have a single line of input returned to you. In some cases it may even return 0 bytes, which means the data source will not be providing any more data. It's only an error if it returns -1.
ssize_t write(int fd, const void *buf, size_t count);- This is, in some sense, the opposite of
read(): it is used to write data out to a device or a file.
off_t lseek(int fd, off_t offset, int whence);- When you're reading or writing a file, the OS keeps track of where in the file you are at the moment.
lseek()moves this pointer.There are some devices which do not support
lseek().
int ioctl(int fd, int request, ...);ioctl()is a catch-all function for operations on devices that don't fit into theread(),write(),lseek()paradigm. The meanings of the parameters are completely up to the particular device driver.
int close(int fd);- This is used to disassociate the device or file from the process.
It is the responsibility of each device driver to have functions that
provide these functions. This is accomplished by a C struct called a
struct file_operations. I'll be using the line printer
device driver as something as a running example here; on my Debian
system, this driver is in
/usr/src/linux-source-2.6.12/drivers/char/lp.c. In this
file, we can see this definition as
static struct file_operations lp_fops = {
.owner = THIS_MODULE,
.write = lp_write,
.ioctl = lp_ioctl,
.open = lp_open,
.release = lp_release,
#ifdef CONFIG_PARPORT_1284
.read = lp_read,
#endif
};
lp_write, lp_ioctl and so forth are
functions defined in the driver to provide functions described above,
as well as a few others
(the close() call doesn't end up needing to be directly
implemented in the driver).
In Linux, when a device driver is initialized it calls either
register_chrdev() for a character device or
register_blkdev() for a block device (we'll be discussing
block devices and character devices later). The parameters
of these functions are a major number, the device name, and a
pointer to a struct file_operations. Taking a quick look
at the lp driver, we can see that it calls
register_chrdev with the parameters
register_chrdev (LP_MAJOR, "lp", &lp_fops);
There are two arrays in the kernel, one of struct
file_operations pointers for character devices and one of
struct file_operations pointers for block devices. The
register_chrdev() and register_blkdev()
functions insert the struct file_operations pointers into
the index specified by the major number.
In addition to the major number, which identifies the device type, each device is identified by a minor number which says which device of that type is being referred to.
It is, once again, the responsibility of the device driver to map the minor number to the particular device (in this case, one of the line printers). This is most normally done with an array in the device driver. In the line printer driver, the table is declared as
static struct lp_struct lp_table[LP_NO];
where struct lp_struct (declared in
/usr/src/linux-src-2.6.12/include/linux/lp.h) contains
flags and other parameters regarding the state of each line printer on
the system and LP_NO is #defined to be 8
(with a reminder in the code that if you've got more printers than
this you'd better change the #define!
A program communicates with a device driver through a special file. As we discuss elsewhere, a special file is a file whose inode identifies it as being not an actual file, but instead as the connection to a device driver. The inode identifies the device as being either a block or a character device, and gives a major number and a minor number.
The major number in the inode is the same as the major number in a device driver, as described above. The minor number specifies which instance of the device should be associated with the special file. Typically, the devices will be numbered 0, 1, 2, and so forth. Taking a look on viper.cs.nmsu.edu, we can see
viper:104% ls -l /dev/lp*
crw-rw---- 1 root lp 6, 0 2005-03-19 12:36 /dev/lp0
crw-rw---- 1 root lp 6, 1 2005-03-19 12:36 /dev/lp1
crw-rw---- 1 root lp 6, 2 2005-03-19 12:36 /dev/lp2
we can see three printers in /dev, named
/dev/lp0, /dev/lp1, and
/dev/lp2. The first character on the lines in the
listing (the c) says these are character special files; the
6, 0, 6, 1 and
6, 2 identify the three special files with major
device 6 and minor devices 0, 1, and 2.
This also demonstrates a flaw in the scheme: there are not, in fact,
any line printers on viper. You won't find this out until you try to
open() one. Some systems provide a way to make sure that
only special files corresponding to actual devices appear in
/dev; at home I run a daemon called udev
which only creates special files when it recognizes that a device is
actually present.
Character devices are the sorts of things we normally think of as "real" IO devices: things like keyboards, mice, serial ports...
In general, character device drivers are pretty simple:
write() just copies the data into an internal buffer,
while read() just copies an internal buffer into a
user-provided buffer. An interrupt handling routine transfers the
data between the driver's buffer and the device.
The main point of block devices is that they are in some sense structured repositories of data. They are addressable, in the sense that we can specify a location to put data at, and they really don't behave like IO devices, as instead of communicating with the real outside world they give us a place to store and retrieve data (with read-only devices serving as a special case).
Because of this, we can be much more aggressive in optimizing access to block devices. We can schedule reads and writes to occur in completely different orders than specified by a program; we can buffer data within the operating system. So long as the program-order semantics specified are honored, the device driver has done the right thing.
It's probably a good idea to review disk hardware here a minute...
The basic idea is that we have a disk with magnetic material on it, and read/write head that flies above the disk. The surface of the disk is divided up into "tracks", defined by where the head is, and "sectors", which are divisions of the tracks. Here's a picture of a 500 GB Seagate Barracuda:
The disk drive will normally have several disks (called platters) like I described above, on a single spindle. The corresponding tracks on the different platters are called cylinders; disks have historically been addressed using a three-dimensional cylinder-head-sector scheme. Recent disks have been addressed using a logical block number on the disk.
One of the things that needs to be mentioned about disks is that they are just about the only part of computer systems which haven't been speeding up much over the years. Comparing the 18GB disk described in the book with the Seagate in the picture, they actually rotated the disks at the same speed, and the average seek time is slower on the newer disk!
Block device drivers uses buffers to store data for transfer
to and from the device. One buffer corresponds to a block of data on
the disk. A structure called a buffer_head servers as a
descriptor for each buffer; it contains information such as whether
the buffer is empty, up to date, dirty, etc.
When something needs to read or write a disk block, it creates a block device request in the "high-level" device driver. This request specifies the block to be operated on, and the operation (read or write) to be performed. The request is never satisfied immediately; instead, it is scheduled to be performed. The kernel control path never has to wait for the request to be satisfied (though, on a read, the process may be put in a resource wait state).
Once the request has been created by the high-level driver, it is passed to a low-level driver to actually be satisfied. The low-level driver is actually interrupt-driven; when the disk satisfies one request there is an interrupt, and the low-level driver is activated to provide the disk with another request. A "strategy routine" takes pending requests and passes them to the disk drive.
Each block device driver maintains its own request queues; there will be one for each physical block device, and the queues are ordered so that the requests attempt to optimize disk behavior.