Go to the first, previous, next, last section, table of contents.

Running Prolog in Parallel

This chapter describes what parallel Prolog execution is about, and how Muse can be used as a "prolog accelerator" to make programs run faster in parallel than they do sequentially.

Muse Intro: Introduction.
Run Example
Muse Exe: Main Issues of Parallel Execution.
Visualization Tools
Programming Considerations

Introduction

Logic programs offer many opportunities for the exploitation of parallelism, the main ones being OR-parallelism and AND-parallelism [Conery 83]. Each kind has its own advantages and disadvantages, but we have chosen to focus on OR-parallelism as a first step for a number of reasons. From our perspective, the wide range of potential applications and the very slight execution overhead have been the main reasons. See [Lusk et al. 90] for a fuller discussion about the advantages of OR-parallelism in the context of the Aurora system.

The goal with Muse, as with Aurora, has been to allow multiple processors to exploit the implicit OR-parallelism of Prolog programs with a minimum of overhead, preserving the full Prolog language with its standard semantics.

Running a Parallel Example

In this section we illustrate the parallelism in the well-known "8-queens" problem. This code is reproduced from [Sterling & Shapiro 86] with permission from the authors:

% file: queens.pl

queens(N, Qs) :-
        range(1, N, Ns),
        queens(Ns, [], Qs).

queens([], Qs, Qs).
queens(UnplacedQs, SafeQs, Qs) :-
        select(UnplacedQs, UnplacedQs1, Q),
        not_attack(SafeQs, Q),
        queens(UnplacedQs1, [Q|SafeQs], Qs).

not_attack(Xs, X) :-
        not_attack(Xs, X, 1).

not_attack([], _, _) :- !.
not_attack([Y|Ys], X, N) :-
        X =\= Y+N,  X =\= Y-N,
        N1 is N+1,
        not_attack(Ys, X, N1).

select([X|Xs], Xs, X).
select([Y|Ys], [Y|Zs], X) :- select(Ys, Zs, X).

range(N, N, [N]) :- !.
range(M, N, [M|Ns]) :-
        M < N,
        M1 is M+1,
        range(M1, N, Ns).

run :-
        statistics(walltime, [Start,_]),
        findall(X, queens(10, X), L),
        statistics(walltime, [End,_]),
        length(L, N),
        Time is End - Start,
        format('~d solutions in ~3d seconds.~n', [N,Time]).

Note the use of the walltime key (i.e. real elapsed time) in the call to statistics/2. The runtime key measures CPU time on the worker that happens to be executing, and so normally is not useful.

Below we show the result of running the queens problem first with 5 workers and then with a single worker. In this case the speed-up is 3.8.

% sicstus -P
SICStus -P 3 #0: Mon Oct 3 12:44:31 MET 1994
| ?- compile(queens).
{compiling /src/sicstus/muse/V3/examples/queens.pl...}

yes
| ?- muse_flag(num_workers,_,5).
| ?- run.
724 solutions in 2.760 seconds.

yes
| ?- muse_flag(num_workers,_,1).
| ?- run.
724 solutions in 10.400 seconds.

yes
| ?- Speedup is 10400/2760.

Speedup = 3.7681159420289854 ? 

yes
| ?- halt.

%

Main Issues of Parallel Execution

Muse Exe Model: The Muse Model.
Muse Exe Sched: Scheduling.
Muse Exe Cut: Cut, Side Effects and Suspension.

The Muse Model

The Muse model assumes a multiprocessor system with a number of processors or processes, called workers, with identical local address spaces, and some global address space shared by all workers. We use the term worker to represent a process or processor. The Muse model implements each worker as a sequential Prolog engine with its own WAM stacks in private memory. The Prolog program is stored in shared memory.

During execution, the actual work is to explore a search tree implicitly defined by the Prolog program. The available work is represented as choicepoints containing outstanding execution alternatives on one of the WAM stacks.

The work is performed concurrently by the workers. A distributed scheduler is responsible for dynamic load balancing of available work among available workers. Workers try to maximize the time they spend working and minimize the time they spend scheduling. When a worker is working, it adopts a depth-first left-to-right search strategy.

In the Muse model each worker keeps its own WAM stacks to avoid conflicting variable bindings in common parts of the search tree. Whenever a worker runs Prolog it therefore can use (more or less) the same machinery as the sequential SICStus Prolog implementation. The exceptions are (1) bookkeeping of local work each time a choicepoint is created, (2) testing for signals at predicate calls, and (3) synchronizing with other workers when taking work from a shared (public) choicepoint. The overhead for doing this is some few percent making Muse (nearly) as efficient as sequential SICStus Prolog.

When a worker (Q) runs out of work it must interrupt some busy worker (P) to get work. All of P's private choicepoints are then made public and the difference between P's and Q's state is copied from P to Q. Q can then simply simulate failure to backtrack for work.

Scheduling

The two main functions of the scheduler are to maintain the sequential semantics of Prolog and to match idle workers with the available work with minimal overhead. The sources of overhead in the Muse model include (1) copying a part of worker state, (2) making local choicepoints sharable, and (3) grabbing a piece of work (a task) from a shared choicepoint. Other important scheduling issues are to maximize the grain size of computations, and to prefer "promising" work over less promising work (see section Cut, Side Effects and Suspension).

Cut, Side Effects and Suspension

Cut has a semantics strictly compatible with sequential Prolog. Sometimes a cut may be (partly) invalid due to the fact that the current branch may be pruned by another cut of smaller scope [Hausman 90]. In this case the worker executes as much as possible of the cut operation, saves information in the public search tree on how to later on continue executing the cut operation, and then the worker continues with the current work.

Calls to predicates that produce side-effects are required to synchronize, i.e. to proceed only if they are on the leftmost branch of the entire tree. If a side-effect predicate is reached on a non-leftmost branch, that branch will have to suspend.

Suspending and resuming execution branches can be costly, as these actions involve copying state. The overhead can even outweigh the speedup gained from running in parallel. Therefore, the use of side-effects in parallel computations is discouraged. Also see section Programming Considerations.

The current work may be endangered (i.e. possibly pruned) by some other worker executing cut or raise_exception/1. Such work is called speculative work. If there exists untaken work that is less speculative, then it might be a good idea to suspend the current work and move to the less speculative one [Ali & Karlsson 92]. This functionality (called voluntary suspension) is implemented in Muse and it improves execution of "search for one solution" problems dramatically, possibly from speed-up 1 to nearly linear speed-up. Unfortunately the moving of workers introduces some overhead (typically below 10 percent). Therefore it is possible to turn voluntary suspension off for programs searching for all solutions.

Visualization Tools

The distribution provides two tools that enable one to obtain insight into the parallel behavior of his Prolog program. These tools can help in understanding "performance bugs", situations in which a program is executing correctly but not as fast as anticipated, due to an unforeseen restriction of the degree of parallelism in the program. Once such restrictions are understood they can often be removed by changing the program, resulting in greatly improved performance.

Both of these tools use log files, that is, they do not display graphics while the program is running, but rather display the contents of a file that is created during program execution. This method has several advantages.

A significant log file can be retained and studied for a long time.
The log file can be "played" far more slowly than the original program ran, allowing careful and detailed study.
The graphics tracing program need not run on the same machine as Muse itself.

In this manual we will not discuss the various graphics tracers in detail, but rather we will leave the details of each to its own manual. We will, however, describe how to collect the log files each one needs.

Must
Visandor: Visandor.
Vimust: Vimust.

Must

The Must graphics tracing facility [Karlsson 92] shows the Prolog search tree as it is expanded and contracted in parallel by Muse workers.

Having created a version of Muse with the Must tracing option, the call "muse_trace(Query)" is used for running the query "Query" with tracing. After the query is run, there will be a file (with extension `.mt') containing a trace of the last query execution. The distribution only contains the Sun-4 executable. It can be used both under SunOS 4.x (Solaris 1.x) and SunOS 5.x (Solaris 2.x). The program itself has a dialog box to request the filename of the trace file or you can just specify the file on the command line.

Visandor

The Visandor graphics tracing program was originally developed for the &-Prolog system [Hermenegildo and Greene 90]. It is supplied in binary executable form for Sun-4 computers. More information about that tool and related work is available via WWW from: `http://www.clip.dia.fi.upm.es', or via email from <clip@clip.dia.fi.upm.es>.

The mt2vt tool may be used for converting a Must file to a Visandor file (with extension `.vt').

Vimust

The Must and Visandor tools can be combined into the Vimust tool. If the Must file (with extension `.mt') and its corresponding Visandor file (with extension `.vt') both exist, they can the be viewed together with the vimust tool.

Programming Considerations

It is possible to control the parallelism of a program by means of parallel and sequential declarations (see section Declarations). Sequential declarations may be necessary to prevent infinite loops in certain programming constructs, such as failure driven loops.

Calls of interpreted predicates are synchronized and all interpreted predicates are sequential. The debugger also sequentializes the execution.

All nondeterministic built-in predicates are sequential. However, if a parallel predicate is called by call/1, its execution is still parallel.

We have taken pains to preserve the observable "sequential" semantics during parallel execution. This is achieved through synchronizing all built-in predicates with side-effects. As the foreign predicates may involve side-effects, in principle they should be synchronized, too. We decided not to synchronize the foreign predicates as this often may not be necessary. (The user can call the muse_sync/0 built-in predicate immediately before the foreign predicate to achieve synchronization; see section Muse).

There are application areas where the principle of preserving sequential semantics seriously limits the available parallelism, since communication between parallel execution branches becomes virtually impossible. The principle may even slow down the execution, since suspending and resuming execution branches are costly operations. One way to overcome these limitations is to use the foreign language interface. As an experimental language extension, we have furthermore provided a set of primitives which can be used for non-synchronized communication. These primitives maintain a blackboard, i.e. a mapping from keys to terms, that can be read or written simultaneously by parallel execution branches. See section Blackboard Primitives.

Warning. The blackboard predicates, as well as predicates loaded via the foreign language interface, can be executed even if the current work is speculative. These predicates are cavalier, i.e. a call might be executed in a parallel execution, even if that call would never be reached in a sequential execution, if the code contains cuts (see section The Cut Symbol) or calls to raise_exception/1.

This problem can be partly cured by avoiding cuts while searching for solutions, or by inserting extra validity tests in the program before calling a cavalier primitive, tests which would be redundant in a sequential execution. Remember that the validity tests may themselves succeed out of order if they contain parallelism, so they should also be written without cuts and exceptions, or be explicitly sequentialized.

Ideally, the cavalier predicates should be asynchronous rather than cavalier. This means that they should suspend until they are no longer endangered by cuts or calls to raise_exception/1, but that functionality cannot be provided in the current release, at least.

Other limitations in Muse concern certain built-in predicates and the foreign language interface. The limitations are mentioned when describing these functions. They can be summarized as follows.

During parallel execution, predicates must not be defined (e.g. by assert/1) or abolished (by abolish/(1-2)) on the fly.
Fastcode compilation is not yet available.
The built-ins stream_select/3, stream_interrupt/3, undo/1, break/0 are not available.
Execution profiling (see section Execution Profiling) is not available.
The built-ins save/1, save/2, save_program/1, restore/1, load_foreign_resource/1, unload_foreign_resource/1, and load_foreign_files/2 require that Muse be adjusted to one worker.
The statistics/2 options runtime is not meaningful.
Prolog cannot be called from C.
The hook variable SP_read_hook is not available.

Muse is still an experimental system and is not formally supported.

Go to the first, previous, next, last section, table of contents.