One of the major issues in compile-time analysis, which is valid for both or- and and-parallelism, is Granularity Control.
Intuitively, logic programming may offer a great deal of parallelism, but the question is whether it is worth to actually exploit every instance of parallelism present in the execution. Exploiting an instance of parallel execution (i.e., allowing remote execution of a certain piece of computation) introduces an unavoidable amount of overhead. If this overhead is larger than the actual speedup produced by having a parallel execution, then the resulting parallel execution may turn out to be slower than the sequential one. For example, if we consider having a parallel execution with n instances of work, each of size , while the remaining part of the execution requires , then a sequential execution will take while a perfect parallel execution will require where overhead is the overhead required for acquiring and starting one remote execution. What we want is . By simplifying the formulas we can obtain
which gives an idea about how large should be the parallel computation in order to allow a speedup to occur.
This introduces the need of granularity control, that is allowing parallel execution to take place exclusively if the size of the parallel task is sufficiently large to overcome the overhead introduced. For example, in the case of and-parallelism, we would like to modify the parallel annotation in order to include a test on the size of the parallel tasks before spawning them. A modified annotations for the Fibonacci recursive clause will look as follows:
fib(N-1, R1) & fib(N-2,R2)
else
fib(N-1,R1), fib(N-2,R2)
endif
R is R1+R2.
fib(N,Res) if N > 5 then
where parallel execution is not allowed if the argument is below 5.
Introducing granularity control involves two activities: (i) measuring the actual cost of a piece of computation; and (ii) selecting a proper threshold under which no parallel execution will be allowed.
The second activity requires measuring the overhead involved with spawning a parallel computation. This threshold can be either fixed by the programmer, and tuned it by evaluating the performance of the system with different thresholds, or computed automatically by the system, e.g. by running both a sequential and a parallel program and comparing the performances obtained.
The first activity is the more complex one. It requires being able to estimate the cost of computations. It would be desirable to have a compile-time support in doing this. No compiler will ever be able to actually compute the exact cost of a computation, being this an undecidable property. On the other hand, what we really need is not the exact cost, but a reasonable upper bound. And this should be computed efficiently.
Various works regarding the use of terms size to guide the partitioning of programs for parallel execution have been presented in the context of functional programming [62], where the solution is relatively easier. In the context of logic programming two main approaches have been proposed, one proposed by Debray and Lin [28] and based on a direct estimation of the cost of each predicate, and one proposed by Tick [110] and based on an estimation of the distribution of the work between the different subgoals.