The Implementation of the Python Programming Language
Clinton Jeffery and Richard Saunders
(C) Copyright 2004, Clinton Jeffery and Richard Saunders.
This prepublication material is not to be copied or linked
without permission.
Contents
- Chapter 1: Introduction
- Part I: the interpreter front-end
- Chapter 2: interactive prompt environment
- Chapter 3: lexical analysis
- Chapter 4: syntax analysis
- Chapter 5: semantic analysis
- Chapter 6: code generation
- Part II: the virtual machine
- Chapter 7: execution model
- Chapter 8: instruction set
- Chapter 9: binary representation of code
- Chapter 10: code loading and linking
-
- Part III: the runtime system
- Chapter 11: data representation
- Chapter 12: built-in functions
- Chapter 13: classes and bindings
- Chapter 14: memory management
- Part IV: instrumentation
- Chapter 14: tracing
- Chapter 15: profiling
- Chapter 16: debugging
Preface
What, another book on Python? There are lots of books on how
to write programs using Python, but this book is not
one of them!
Instead, this book is a guide to how the Python language is implemented,
as of Python version 2.3.4. It is intended for Python language
maintainers and extenders, for implementors of other similar
languages, and for advanced students interested in programming
language compilers and interpreters.
The book originated when the authors began a project to extend
Python's execution monitoring capabilities, and discovered the
profound lack of documentation on the implementation. We wrote
this book as our own study guide, and also to support the study
of Python's implementation by our own students.
This book's authorial biases are those of outsiders: we are expert
programmers and language implementors who can write Python programs, but
not Python experts, nor part of the Python cult. We will be as fair and
objective as we can, striving for accuracy and higher quality than some
of the Python-related documentation we have seen.
Chapter 1: Introduction
Python Origins and Design Influences
Python's history has no doubt been told better elsewhere.
Python is a descendant of ABC, developed at CWI in the Netherlands
by Guido van Rossum. ABC was designed for simplicity and interactivity,
in some ways an improvement over BASIC for beginning programmers.
Unlike ABC, Python was intended to be suitable for experts writing
many kinds of programs.
Python's start, as an unsupported one-student hobby effort,
predates and resembles Linus Torvald's invention of Linux.
In addition to its simplicity and interactivity, Python's popularity as a
scripting language is also due to its clean object-oriented design, which
allows Python programs to be far more readable and maintainable than similar
scripting languages such as Perl and Tcl. Also, unlike the authors of Perl
and Tcl, van Rossum bothered to actually learn from and occasionally borrow
the best ideas from important languages including Lisp and Icon.
Arguably, van Rossum came up with a better syntax than Lisp, and a more
extensible but weaker and clumsier syntax and semantics than Icon.
In any case, Python is much better for having learned from other languages,
something untrue of a surprising number of scripting languages.
Organization of the Source Code
Python's source code is organized into around 250 directories in a
fairly broad-but-shallow hierarchy. Many of the directories are
named for specific platforms, such as PC or Mac. Others, such as
Demo, Doc, and Misc contain non-code which we will not consider here.
The Include/ and Python/ directories are obvious places for a new reader
of the C implementation to start. Other directories with core C code
are Modules/, Objects/, and Parser/. Lib/ contains Python standard
library code. Python's Grammar/Grammar gives a non-YACC EBNF, which
is implemented handwritten code in Modules/parsermodule.c
Header Files
Include/ contains around 70 C header files comprising around 9K lines of
code. About 50 of the .h files are less than 100 lines in size. The
logical starting point for looking at Python's headers is Python.h,
which mainly includes other header files, including a small
number of "universal" system includes, as well as most
(~45) of the other Python header files from this directory.
Source Files
The Python/ directory contains around 50 C source files comprising some
30K lines of code. About half of these files are under 100 lines long,
and several of these small files contain semistandard functions that are
in the standard C libraries on some but not all platforms.
Configuration and Building
Chapter 11: Data Representation
The PyObject Structure
Type PyObject, defined in Include/object.h, provides a uniform interface
for all Python objects. It includes a reference count, a pointer to an
object type, and optionally at interpreter build-time, a pair of object
pointers with which every heap object in the system can be inserted into
a doubly-linked list.
Chapter 12: Built-in Functions
File Python/bltinmodule.c contains a great deal of the C code for
functions that are built-in to the runtime system. Functions in
this file take and return pointers to PyObject instances for the
most part; their prototypes vary slightly depending on whether they
operate on a single parameter, or on an arbitrary number of
parameters.
Built-in Methods
Built-in methods are are all C static functions, visible through a static
array builtin_methods[], which in turn is used to initialize module
__builtin__. Built-in methods are exemplified by the abs()
function, given below. The C function is named builtin_abs(), but that
is just a wrapper providing the interpreter with a standard parameter
signature. The actual logic for calculating the absolute value of a
Python object is determined by calling an internal function
PyNumber_Absolute(), defined in Objects/abstract.c, and used in three
other places in the runtime system, in addition to being used to
implement the built-in function.
static PyObject *
builtin_abs(PyObject *self, PyObject *v)
{
return PyNumber_Absolute(v);
}
PyDoc_STRVAR(abs_doc,
"abs(number) -> number\n\
\n\
Return the absolute value of the argument.");
Chapter 14: Memory Management
Memory management is one of the most important areas of the Python runtime
system, and fortunately, one of the best documented in the source code.
Extensive documentation can be found in comments in Objects/obmalloc.c.
Python uses three progressively higher level abstraction levels on top of
the C malloc() library: a raw memory allocator, a generic object allocator,
and a type-specific allocator.
Raw Memory Allocator
Object Allocator
The object allocator lives in Objects/obmalloc.c. Allocations are rounded
up to the nearest 8-byte blocksize and categorized by size, up to the
large-object threshhold of 256 bytes. There are 33 different size
classes, classes 0-31 corresponding to block sizes 8-256, and then a
final class holding blocks > 256 bytes.
Type-specific Allocators
Allocation of course must be followed by constructor/initialization code.
This magic is found in Objects/object.c/_PyObject_New(), which
takes a type object, malloc's the correct space for an instance, and
calls the init() for that instance.
By inserting instrumentation into _PyObject_New(), it is possible to report
memory allocations by type and size. Warning: memory allocations are a
frequent event! Performance may be significantly impacted!
Visualizing Memory Allocation
The complexity and importance of this
subsystem make it a ripe target for visualization.
Garbage Collection
Historically, Python used reference counting to perform its garbage
collection, causing problems in the small fraction of programs that
created cyclic data structures. Eventually, Python followed the same
evolutionary path as Lisp, and added cycle detection and collection.
However, Python remains primarily a reference-counting system. One
can expect the biggest impact of this decision is the performance
loss associated with pervasive Py_INCREF and Py_DECREF macros throughout
the implementation, especially adding two memory reads/writes to every
assignment. A lesser side-effect is the addition of a reference
count field to every heap object in the system.