One of the major issues in compile-time analysis, which is valid for both or- and and-parallelism, is Granularity Control.
Intuitively, logic programming may offer a great deal of parallelism, but the question is
whether it is worth to actually exploit every instance of parallelism present in the
execution. Exploiting an instance of parallel execution (i.e., allowing remote execution
of a certain piece of computation) introduces an unavoidable amount of overhead. If this
overhead is larger than the actual speedup produced by having a parallel execution, then
the resulting parallel execution may turn out to be slower than the sequential one.
For example, if we consider having a parallel execution with n instances of work, each
of size , while the remaining part of the execution requires
, then a sequential
execution will take
while a perfect parallel execution will require
where overhead is the overhead required for acquiring and starting one remote
execution. What we want is
.
By simplifying the formulas we can obtain
which gives an idea about how large should be the parallel computation in order to allow a speedup to occur.
This introduces the need of granularity control, that is allowing parallel execution to take place exclusively if the size of the parallel task is sufficiently large to overcome the overhead introduced. For example, in the case of and-parallelism, we would like to modify the parallel annotation in order to include a test on the size of the parallel tasks before spawning them. A modified annotations for the Fibonacci recursive clause will look as follows:
fib(N-1, R1) & fib(N-2,R2)
else
fib(N-1,R1), fib(N-2,R2)
endif
R is R1+R2.
fib(N,Res)
if N > 5 then
where parallel execution is not allowed if the argument is below 5.
Introducing granularity control involves two activities: (i) measuring the actual cost of a piece of computation; and (ii) selecting a proper threshold under which no parallel execution will be allowed.
The second activity requires measuring the overhead involved with spawning a parallel computation. This threshold can be either fixed by the programmer, and tuned it by evaluating the performance of the system with different thresholds, or computed automatically by the system, e.g. by running both a sequential and a parallel program and comparing the performances obtained.
The first activity is the more complex one. It requires being able to estimate the cost of computations. It would be desirable to have a compile-time support in doing this. No compiler will ever be able to actually compute the exact cost of a computation, being this an undecidable property. On the other hand, what we really need is not the exact cost, but a reasonable upper bound. And this should be computed efficiently.
Various works regarding the use of terms size to guide the partitioning of programs
for parallel execution have been presented in the context of functional programming
[62], where the solution is relatively easier. In the context of logic programming
two main approaches have been proposed, one proposed by Debray and Lin [28]
and based on a direct estimation of the cost of each predicate, and one proposed
by Tick [110] and based on an estimation of the distribution of the
work between the different subgoals.