The issue of efficiency is becoming of major importance in the area of parallel implementation of logic programming. It is relatively straightforward to produce a naive parallel models, eventually based on intuitive views of the execution model. This is the case, for example, of the process view adopted in some of the seminal works on parallel Prolog [119, 17, 18], in which each subgoal was treated as a separate process, and the shared variables used as communication channels--a useful view to understand the model, but a very inefficient approach if directly used as design principle.
As already mentioned in a previous section, one of the objectives of an efficient parallel implementation is its sequential efficiency, i.e. its ability to maintain during sequential execution a speed comparable to that of the state-of-the-art sequential implementations. Achieving this target is far from being easy, and very few systems can actually claim of being able to obtain similar results.
Supporting parallelism imposes various costs, which translates on overhead w.r.t. the standard sequential behaviour. Furthermore, due to the inability of the engine of realizing whether parallelism will be actually exploited or not, part of this overhead will be paid even if the actual execution is carried on by a single agent. Merit of a good parallel model is to force most of the necessary overhead to occur at the moment in which the actual parallelism is exploited--avoiding it in case of sequential execution.
Parallel execution in a logic programming system can be viewed as the parallel traversal of an and-or tree. An and-or tree has or-nodes and and-nodes. Or-nodes are created when there are multiple ways to solve a goal. An and-node is created when a goal invokes several conjunctive subgoals. The or- and and-nodes represent sources of parallelism, i.e. points of the execution where it is possible to fork parallel computations. Figure 11 illustrates an and-or tree for a logic program.
Parallelism allows overlapping of exploration of different parts of the and-or tree (which are instead sequentialized in sequential computation). Nevertheless, as mentioned earlier, this does not always translate to an improvement in performance. This happens mainly because:
So far we have intuitively identified the main sources of overhead in a parallel computation--the management of additional data structures, and the need to traverse a complex tree structure.
Overheads in the traversal of the tree can be tackled by simplifying as much as possible the structure of the computation (e.g. reducing the level of nesting of parallelism), and by collecting information that may guide the traversal (and prune useless parts of the tree). Management of the data structures can be made more effective by reusing data structures whenever possible and avoiding allocation of new structure until they are strictly necessary.
These observations can be actually concretized into various optimizations (some have been illustrated in various works [83, 86, 23]), allowing improvements in performance and reductions in parallel overhead.
In the rest of this section we will present a more specific discussion on overheads emerging in the different forms of parallelism.