The key point of the instruction set architecture is that it is the interface from the programmer to the hardware. Over time, views as to how best to design an instruction set to make the best use of the hardware have evolved:
Early, primitive instruction sets simply exposed the hardware as best they could. The ISA was really just the operations performed by the underlying hardware
In the early 1960s, IBM saw the benefits of producing a compatible line of computers, with a common instruction set but a different implementation. The IBM 360 was the first such family of computers; IBM still builds computers today that are descended from that computer. This led Gene Amdahl (chief architect of the 360) to define computer architecture specifically as the instruction set. This is the definition that is still in use today (even though courses in "computer architecture" typically cover both the architecture and the implementation).
Also, this led to microprogrammed implementations of ISAs: in essence, there is a small, primitive "computer" which is programmed to emulate the architected ISA.
Once the ISA was seen as a separate entity from the implementation, the question of what made an effective ISA became possible, and relevant. In the 1970s, a number of characteristics were believed to be important to the creation of a good instruction set: orthogonality and compactness were seen to be of paramount importance in those days:
Orthogonality meant that instructions and operands were seen as separate "dimensions", and that any instruction should apply to any type of data (integer, floating point, register, immediate....).
Compactness meant that the instruction set encoding should take up as little memory space as possible. At the time, memory was almost inconcievably expensive by today's standards; paying something on the order of a dollar a byte wouldn't be unreasonable in the mid-1970s. Having a program not take up much space was vitally important!
As we describe these views, it's important to remember that a significant amount of programming was done in assembly language at that time. An instruction set that was understandable to a human being was a Very Good Thing.
An alternative, but entirely complementary, view was that the ISA should be designed so that it was easy to compile code written in a high level language to the ISA. Relying on intuition from human language translation, in which translating a document from one language to another is easier if the languages were more similar, having an ISA which directly expressed the concepts in high level languages was also seen as a virtue. In fact, there were even research efforts directed toward developing computers that would directly execute high level language code.
It's also important to mention some other factors impacting the instruction set design. Just as instructions were created that would directly support high level language constructs, instructions were also created that would directly support operating system functionality. So, for instance, we can see individual instructions capable of saving or restoring an entire processor context, for use in context switching. Also, at times the goals of performance, understandability, and high level language support would be in conflict in many ways, it could be said that the genius of the really great designers lay in how they balanced these goals. An example here is the vector instructions of the Cray-1. Here, a decision was made to create an underlying design capable of extremely high performance by pipelining operations on vectors; a second decision was made to implement this by exposing the feature in the ISA in spite of the difficulties this created for the compiler writers (the Cray FORTRAN compiler was indeed capable of recognizing many "vectorizable" loops, and generating the appropriate vector instructions).
The convergence of the views stated above - that ISAs should be easy for humans to understand, similar to high level languages, and provide OS support - led to the so-called CISC instruction set philosophy. In this philosophy of instruction set design, an instruction set would tend to have a great number of individual instructions (one can even see the number of instructions available in the ISA appear as a feature in advertising!), and the instructions themselves were capable of performing quite powerful operations.
One can see, for instance, instructions in the VAX to directly generate array indexes; instructions to save and restore processor context; and even instructions to manipulate doubly-linked lists and to evaluate polynomials (!).
There was a reaction to this mentality in the late 1970s and early 1980s. A number of research projects (exemplified by the Berkeley RISC and Stanford MIPS projects) examined the underlying assumptions of the CISC design philosophy, and found them wanting. It was seen that many of the capabilities of the CISC instructions sets went unused in actual compilers, and that the vast majority of programming was being done in high level languages. A conscious decision was made to revert to the view of an ISA as a means of exposing the hardware, while designing the underlying hardware to more directly support high level languages.
In addition, the cost of memory had come down substantially. While still hideously expensive by today's standards, the cost of memory in the late 1970s and early 1980s was extremely cheap from the perspective of the early 1970s. Consequently, compactness was no longer as important a goal in an instruction set, and it became reasonable to expect to code a sequence of instructions to perform the same operations that would have been performed by a single instruction in earlier instruction sets.
The Berkeley RISC (in addition to giving its name to the "Reduced Instruction Set Computer" design philosophy) exemplifies this, for good and ill.
We can see direct hardware implementation of high level language features in its overlapped register windows. The execution stack and activation records are directly implemented in the registers: at any given moment, 24 registers containing stack values are visible. Instructions exist to move the "window" of visible registers; this lets us create or destroy an activation record in a single, fast instruction.
We can also see the underlying hardware being exposed in the "delayed branch". The RISC was pipelined, so by the time it determined that it needed to take a branch another instruction had already been fetched. This leads to a decision: the prefetched instruction can be "flushed", wasting an execution cycle, or it can be executed. RISC took the latter choice here; the execution following a branch is executed before the branch takes place.
Unfortunately, this ties later implementations to the semantics of earlier ones. As pipelines get longer, and more instructions are prefetched, it becomes necessary to emulate the pipeline of the original RISC, and delay exactly one slot. This might be regarded as the worst of both worlds.
Finally, we can see the reduced importance of compactness in
the simplified instruction set. RISC sets strict bounds on
orthogonality: one typically can't execute any instruction
on any data anymore; instead, only load and
store instructions can operate on memory, so
data has to be brought into registers to be able to work
with it (this is called a load/store architecture). The
variety of addressing modes is also very limited, with only
indexed addressing allowed in loads and stores, and only
register direct and immediate addressing allowed in
arithmetic. The old, complex addressing modes can be
emulated through sequences of instructions (this has been
summarized by Hennessy and Patterson as "provide primitives,
not solutions").
Also, with a view to making implementation simpler at the expense of code density, instructions are typically all the same size. In a CISC machine, if one wished to use register indirect addressing, one could simply use register indirect addressing (in the VAX, this would take four bits to specify the addressing mode and four bits to specify the register, for a total of one byte). In a RISC machine, one would use indexed addressing with an index of 0 -- wasting all the bits used to specify the index.
The philosophy of the 1990s, sometimes called "post-RISC", was really a continuation of the goals of the RISC processors. More careful quantification of goals continued, and more careful consideration was given to what parts of the hardware should be exposed, and how they should be exposed.
Architectures showing this view are the DEC Alpha and Intel IA-64.
Alpha took the approach of defining a very clean, careful serial semantics avoiding hidden state and dependencies between instructions. So, for instance, there were no delayed branches nor condition codes. The intent here was to allow the hardware to discover the actual dependencies at run-time, and permit maximum instruction-level parallelism.
IA-64 takes the opposite view, esposing the parallelism to the programming and giving explicit control over parallel execution in the ISA.
While every instruction set is different, they can be discussed and classified in broad "families".
The first, and most widely used, distinction is based on the number of operands available to specify common arithmetic operations such as addition.
push xyzzy # push first operand
push blarg # push second operand
add # add them together
pop res # put result in location 'res'
In some ways, we might say that a stack machine epitomized the CISC design philosophy: since we think of expression evaluation in terms of a stack model, we should directly support that stack model in the instruction set. I've even got one early-1980s computer architecture text that makes the assertion that in the future we will look askance at instruction sets that use registers (!).
Unfortunately, a stack-based instruction set introduces dependencies - typically false dependencies - between virtually all the arithmetic instructions. These instruction sets are virtually unknown today. We still regard expression evaluation in terms of a stack, but we try to use the registers effectively to avoid unnecessary conflicts.
ld xyzzy # load first operand into accumulator
add blarg # add second operand to it
st res # store result to res
Accumulator machines are still seen down in the 8-bit microcontroller world. The very popular Freescale (nee Motorola) HC11 family might be called a one-and-a-half address machine, since it has two accumulators.
mov xyzzy, res # copy first operand to final destination
add blarg, res # add second operand to it
Notice that in this example, no registers are used! Two-operand computers would have several registers, and might require at least one of the operands to be in one of the registers (this latter is true of the Intel x86 family).
These machines also typically have very powerful addressing mode support: rather than simply specifying a memory address, quite intricate arithmetic could be performed on the addressing itself. A DEC PDP-11 could emulate a stack machine with its addressing modes! A stack add would involve
add (sp)+ (sp)
This instruction would use register 6 (the stack pointer) as a pointer into memory, read the first operand, and increment the stack pointer. It would then read the second operand (also using the stack pointer as a pointer into memory), add the operands, and put the result into the location pointed to by the SP.
The Intel x86 family is, of course, the dominant instruction set in use today. It's a two-operand instruction set; it has an addressing mode that allows it to do a array lookup in a local variable in a single operand of an instruction! On the other hand, it is restricted to requiring that at least one of its operands must be in memory for most arithmetic instructions.
The RISC computers were all three-operand computers, but with the restriction that all operands of arithmetic instructions had to be in registers, with separate instructions required to load and store the registers. These were referred to as load-store instruction sets.