A Couple of Meta-interpreters in Prolog

Motivation

Informally, an interpreter is a computer program that takes as its input other computer programs and performs the steps necessary to evaluate them. The concept of interpretation is pervasive in computer science both from a theoretical (e.g., proving equivalence of expressiveness) and practical perspective. Indeed, one can regard virtually all programs as interpreters for domain-specific languages. For example, a program reading in settings from a configuration file and adjusting itself accordingly is interpreting this "configuration language".

In the context of Prolog, the term meta-interpreter, or MI for short, is often used to denote an interpreter for a language similar or identical to Prolog itself. An MI that can interpret itself is called a meta-circular interpreter. Prolog is exceptionally well-suited for writing MIs due to various reasons: First and most importantly, Prolog programs can be naturally represented as Prolog terms and are hence easily read in, inspected and manipulated using built-in mechanisms. Second, Prolog's implicit computation strategy (chronological backtracking) can be used in interpreters to avoid explicit search (for matching clause heads etc.), allowing for concise specifications. Third, variables from the object-level (the program to be interpreted) can be treated as variables on the meta-level (the interpreter) - therefore, an interpreter can delegate handling of the interpreted program's binding environment to the underlying Prolog engine (in contrast to interpreters written in Lisp).

The following simple Prolog program will serve as a running example throughout this text. I state it here using default Prolog syntax, and we will subsequently consider various different representations for it:
    natnum1(0).
    natnum1(s(X)) :-
        natnum1(X).
Like many popular scripting languages, Prolog can evaluate arbitrary Prolog code dynamically like this:
     ?- Goal = natnum1(X), Goal.

    Goal = natnum1(0)
    X = 0 ;

    Goal = natnum1(s(0))
    X = s(0)

    Yes
You can make the call explicit using the Prolog built-in call/1 predicate, yielding the equivalent query "?- Goal = natnum1(X), call(Goal).". The call/n (n > 1) family of predicates lets you append additional arguments to Goal before it is called.

This mechanism (the so-called meta-call) is important for predicates like sublist/3, maplist/3, if/3 etc., and can be seen as a special case of meta-interpretation, namely interpretation in its coarsest sense: Let the underlying Prolog engine handle everything. Implicitly using features of the underlying engine is called absorption, while making features explicit is called reification. By using call/1, we absorb backtracking, handling of the binding environment, unification, handling of conjunctions etc., indeed, everything. We will see in this text various interpreters that differ in what they absorb and what they reify, and we will explore various extensions over what Prolog offers, implemented using MIs, far beyond what naive interpretation using "eval" can give you in other languages.

Vanilla Meta-interpreters

Let's start by making sure we fully understand the definition of natnum1/1. That definition consists of two clauses, the first of which degenerated into a fact that is implicitly treated as if it were stated like
    natnum1(0) :- true.
The Prolog built-in predicate clause/2 allows us to inspect the definition of the predicate natnum1/1:
    ?- clause(natnum1(Z), Body).

    Z = 0
    Body = true ;

    Z = s(_G254)
    Body = natnum1(_G254) ;

    No
As expected, exactly two solutions (bindings) for Body and Z are found on backtracking, corresponding to the two clauses of the predicate's definition.

Let us have a brief look at the case of more complicated clause bodies:
    complicated_clause(A) :-
        goal1(A),
        goal2(A),
        goal3(A).

    ?- clause(complicated_clause(Z), Body).

    Z = _G187
    Body = goal1(_G187), goal2(_G187), goal3(_G187) ;

    No
The body of complicated_clause/1 is in this case represented using a compound term with functor "," (which is declared as an infix operator, so "(A,B)" is equivalent to "','(A,B)") and arity 2, whose arguments are again either goals or compound terms of similar structure. Here is the precise Prolog definition of what a clause body can look like:
    is_body(true).
    is_body((A,B)) :-
        is_body(A),
        is_body(B).
    is_body(G) :-      % ambiguous, also matches "true" and "(_,_)"
        is_goal(G).



Using this knowledge, we can attempt to define a simple interpreter for pure Prolog programs:
    mi1(true).
    mi1((A,B)) :-
        mi1(A),
        mi1(B).
    mi1(Goal) :-
        Goal \= true,
        Goal \= (_,_),
        clause(Goal, Body),
        mi1(Body).
This is often called the vanilla MI because there's really nothing special to it in its current state. In a broader sense, all MIs that add no new ability to standard pure Prolog are sometimes called vanilla MIs, and we will for now limit our universe of discourse to this kind of MIs. Later, we will use these simple MIs as a basis for adding useful enhancements.

We can use this MI to run our simple example program:
    ?- mi1(natnum1(X)).

    X = 0 ;

    X = s(0) ;

    X = s(s(0))

    Yes
Notice that we had to guard the third clause from matching true/0 or conjunctions, because trying to inspect these entities using clause/2 would yield run-time errors. This is somewhat ugly, but we can make it uglier still using cuts (!/0) in the first two clauses. That would at least prevent useless choice-points from being remembered. However, the most obvious problem with both versions is that they cannot make use of (first-)argument indexing, which would help us select the (at most) one applicable clause for each non-variable term right away. We therefore have to conclude that the default representation that Prolog provides us with for clause bodies is somewhat "faulty" in that it fails to explicitly mark goals as a code element sui generis. As a consequence, we cannot express the test for goals as pattern matching, but rather assume that everything that doesn't unify with true or (_,_) is a goal. This is why the default representation, and any representation that suffers from similar problems, is often called "defaulty".

The obvious fix is using a custom, more suitable representation for clause bodies that explicitly distinguishes goals as such by equipping them with a special top-level functor, for which we arbitrarily choose g. What we want are bodies like this:
    is_body(true).
    is_body((A,B)) :-
        is_body(A),
        is_body(B).
    is_body(g(G)) :-
        is_goal(G).
Let us rewrite the definition of natnum1/1 a bit according to this specification:
    natnum2(0).
    natnum2(s(X)) :-
        g(natnum2(X)).
At this point, we can no longer use standard Prolog to execute a query like "?- natnum2(X).", because that would yield a runtime error (there is no predicate g/1). We can, however, use the following meta-interpreter to interpret the program:
    mi2(true).
    mi2((A,B)) :-
        mi2(A),
        mi2(B).
    mi2(g(G)) :-
        clause(G, Body),
        mi2(Body).
Let us compare the two MIs by using them to prove that 3 is a natural number. SWI-Prolog 5.5.25 tells me it needs 17 inferences for mi1/1:
    ?- X = s(s(s(0))), time(mi1(natnum1(X))).
    % 17 inferences, 0.00 CPU in 0.00 seconds (0% CPU, Infinite Lips)

    X = s(s(s(0))) ;

    No
In contrast, only 9 inferences are needed for the second MI (notice the "g"-functor that we use to manually distinguish the initial goal as such):
    ?- X = s(s(s(0))), time(mi2(g(natnum2(X)))).
    % 9 inferences, 0.00 CPU in 0.00 seconds (0% CPU, Infinite Lips)

    X = s(s(s(0))) ;

    No


In the following, I will write mi_clause/2 to indicate a predicate that is similar to the standard clause/2, except that it supplies us with an "appropriate" (MI-dependent) representation of clause bodies. For the MI at hand, it is defined like this:
    mi_clause(G, Body) :-
        clause(G, B),
        defaulty_better(B, Body).

    defaulty_better(true, true).
    defaulty_better((A,B), (BA,BB)) :-
        defaulty_better(A, BA),
        defaulty_better(B, BB).
    defaulty_better(G, g(G)) :-
        G \= true,
        G \= (_,_).
For one, using this predicate instead of clause/2 above lets us interpret (a subset of) normal Prolog code with the MI (without the efficiency advantage though, as the defaulty representation still has to be converted at run-time), and second, this predicate can be also used to statically convert normal Prolog code into the more suitable representation that we devised. Most of the time however, mi_clause/2 might be simply defined using a set of facts, like (for this MI):
    mi_clause(natnum(0), true).
    mi_clause(natnum(s(X)), g(natnum(X)).


The 2 MIs presented so far reify conjunction. They absorb unification and backtracking (which is implicit through clause/2). What we can therefore do in this place is changing the very aspect that we have gotten a hold of (i.e., handling of conjunction). Here is an MI that reverses the default execution order of goals:
    mi3(true).
    mi3((A,B)) :-
        mi3(B),    % notice this! intentionally first B, than A
        mi3(A).
    mi3(g(G)) :-
        mi_clause(G, Body),
        mi3(Body).
With minor adjustments to mi_clause/2 (adding the fact "defaulty_better(fail, fail)." and guarding G against the atom fail/0), and after adding the additional clause
    mi3(fail) :-
        fail.
this MI can be used to prove that - declaratively (i.e., taking into account commutativity of logical conjunctions) - the query "?- declarative_fail." fails given the predicate's definition:
    declarative_fail :-
        declarative_fail,
        fail.
Whereas the standard execution order can't derive this (and quickly leads to stack overflow in practice).

Another thing we are able to do at this point is selectively allowing or refusing to execute goals by changing the third clause of the interpreter to something like this:
    mi2(g(G)) :-
    ( safe_goal(G) ->
        mi_clause(G, Body),
        mi2(Body)
    ;
        format("Sorry, you can't execute ~w\n", [G]),
        fail
    ).
Now you can use the interpreter to prove "natnum1(X)" only if you add a fact like
    safe_goal(natnum1(_)).
to the database.



We can obtain an even simpler MI if we use lists of goals to represent conjunctions. In this representation, true/0 becomes the empty list, and our example program can be represented like this:
    natnum_list(0, []).
    natnum_list(s(X), [natnum_list(X)]).
Again, the conversion between runnable Prolog code and this (for our purpose) more convenient representation can be performed automatically using this definition of mi_clause/2:
    mi_clause(G, Ls) :-
        clause(G, Body),
        phrase(body_list(Body), Ls).


    body_list(true) -->
        [].
    body_list((A,B)) -->
        body_list(A),
        body_list(B).
    body_list(G) -->
        { G \= true },
        { G \= (_,_) },
        [G].
The obvious MI for this representation consists of only 2 clauses:
    mi_list1([]).
    mi_list1([G|Gs]) :-
        mi_clause(G, Body),
        mi_list1(Body),
        mi_list1(Gs).
We can easily make it tail-recursive:
    mi_list2([]).
    mi_list2([G0|Gs0]) :-
        mi_clause(G0, Body),
        append(Body, Gs0, Gs),
        mi_list2(Gs).
Look at this example to observe the difference:
    always_infinite :-
        always_infinite.

    ?- mi_list1([always_infinite]).
    ERROR: Out of local stack
    ?- mi_list2([always_infinite]).  % loops, constant stack space
We can further simplify the interpreter if we choose yet another representation for clauses ("normal" Prolog clauses can again be converted automatically of course, but we use facts here right away): Using list differences, appending the yet-to-be-proved goals can go hand in hand with expanding the clause body.
    mi_ldclause(natnum(0), Rest, Rest).
    mi_ldclause(natnum(s(X)), [natnum(X)|Rest], Rest).

    mi_list3([]).
    mi_list3([G0|Gs0]) :-
        mi_ldclause(G0, Remaining, Gs0),
        mi_list3(Remaining).
We will see that this clause representation has its advantages for the MI that reifies backtracking in a moment. For now, an example query:
    ?- mi_list3([natnum(X)]).

    X = 0 ;

    X = s(0) ;

    Yes



Here is an example MI that knows how to handle the two built-in predicates clause/2 and \=/2 in addition to pure Prolog:
    mi_circ(true).
    mi_circ((A,B)) :-
        mi_circ(A),
        mi_circ(B).
    mi_circ(clause(A,B)) :-
        clause(A,B).
    mi_circ(A \= B) :-
        A \= B.
    mi_circ(G) :-
        G \= true,
        G \= (_,_),
        G \= (_\=_),
        G \= clause(_,_),
        clause(G, Body),
        mi_circ(Body).
It can interpret itself and is thus a meta-circular interpreter:
    ?- mi_circ(mi_circ(natnum1(X))).

    X = 0 ;

    X = s(0)

    Yes


To generalize this pattern to all built-in predicates, we can use predicate_property/2 to identify them as such for calling them directly as in the next example:
    provable(true, _) :- !.
    provable((G1,G2), Defs) :- !,
        provable(G1, Defs),
        provable(G2, Defs).
    provable(BI, _) :-
        predicate_property(BI, built_in),
        !,
        call(BI).
    provable(Goal, Defs) :-
        member(Goal-Body, Defs),
        provable(Body, Defs).
provable(Goal, Defs) is true if Goal is provable with respect to the definitions Defs, which is a list of clauses represented as terms of the form Head-Body. Notice, however, that how !/0 (which is also classified as a built-in predicate) is interpreted by this MI does not match its intended meaning, and building an MI that handles cuts correctly requires a bit more work.

With the following additional definitions, we can use this MI to identify redundant facts in some predicate definitions:
    redundant(Functor/Arity, Reds) :-
        functor(Term, Functor, Arity),
        findall(Term-Body, clause(Term, Body), Defs),
        setof(Red, Defs^redundant_(Defs, Red), Reds).

    member_rest(M, [M|Rest], Rest).
    member_rest(M, [E|Es], [E|Rest]) :-
        member_rest(M, Es, Rest).

    redundant_(Defs, Fact) :-
        member_rest(Fact-true, Defs, Rest),
        once(provable(Fact, Rest)).
Given a predicate definition like:
    as([]).
    as([a]).   % redundant
    as([a,a]). % redundant
    as([A|As]) :-
        A = a,       % test built-in =/2
	5 is 2 + 3,  % test built-in is/2
	1 > 0,       % test built-in >/2
        as(As).
we can now ask for facts which are deducible from all (respective) remaining clauses and hence redundant:
?- redundant(as/1, Reds).

Reds = [as([a]), as([a, a])]



Let us reify backtracking now. We need to make explicit all alternative clause bodies that are normally found on backtracking, collecting them deterministically in one sweep using findall/3. A single alternative is represented as a list of goals, and the branches of computation that have yet to be explored are encoded as a list of alternatives. The interface predicate, mi_backtrack/1, takes as its argument a goal and creates the representation of the initial state of computation: A list, consisting of a single alternative, [G0]. Actually, the representation is [G0]-G0, and G0 is also passed as the second argument to the worker predicate for reasons that will become apparent momentarily.
    mi_backtrack(G0) :-
        mi_backtrack_([[G0]-G0], G0).
To perform a single step of the computation, we first collect all clause bodies whose heads unify with the first goal of the first alternative. To all found clause bodies, the remaining goals of that first alternative are appended, thus obtaining new alternatives that are subsequently prepended to the remaining alternatives to give the new state of computation. Using the clause representation that makes use of list differences, all this can be concisely expressed:
    resstep_([Alt|Alts0], Alts) :-
        findall(Gs-G, (Alt = [G0|Rest]-G,mi_ldclause(G0,Gs,Rest)), Gss),
        append(Gss, Alts0, Alts).

(SICStus Prolog provides findall/4, which can be used to combine the goals into one.)

Notice that leaves of the computation, i.e., alternatives that we are done with, automatically vanish from the list of alternatives as there is no goal to be proved for them any more and the unification "Alt = [G0|Rest]-G" thus fails, leading to Gss being unified with "[]" and carrying on with the remaining alternatives exclusively.

The simple glue between the two predicates is this:
    mi_backtrack_([[]-G|_], G).
    mi_backtrack_(Alts0, G) :-
        resstep_(Alts0, Alts1),
        mi_backtrack_(Alts1, G).
The code is straight-forward: If an alternative could be proved, i.e., no goals remain to be proved, a solution for the initial query is found and we can unify the query with that solution to generate the relevant bindings that are subsequently reported at the top-level to the user. This is why we needed to pass around the user's query. The second clause describes how the computation is carried on: The list of alternatives is transformed as described above, and the process continues.

Representing all alternatives explicitly lets us do a few interesting things, for example, guiding the inference process by reordering alternatives (giving higher priority to alternatives that seem more likely to succeed),implementing fair execution strategies like breadth-first computation (by not prepending, but appending new alternatives to the remaining ones) and so on.

Extending Prolog

If you want a feature that out-of-the-box Prolog does not provide, you simply augment one of the vanilla MIs with the desired functionality. For a start, we devise an MI that behaves exactly like standard pure Prolog and builds a proof-tree in parallel that makes explicit the inferences that lead to the proof.
    :- op(750, xfy, =>).  % syntactic sugar

    mi_tree(true, true).
    mi_tree((A,B), (TA,TB)) :-
        mi_tree(A, TA),
        mi_tree(B, TB).
    mi_tree(g(G), TBody => G) :-
        mi_clause(G, Body),
        mi_tree(Body, TBody).
Example query:
    ?- mi_tree(g(natnum1(X)), T).

    X = 0
    T = true=>natnum1(0) ;

    X = s(0)
    T = (true=>natnum1(0))=>natnum1(s(0)) ;

    X = s(s(0))
    T = ((true=>natnum1(0))=>natnum1(s(0)))=>natnum1(s(s(0)))

    Yes

Another group of extensions aims to improve the incomplete default computation strategy (depth-first). We start from an MI that limits the depth of the search tree:
    mi_limit(Goal, Max) :-
        mi_limit(Goal, Max, _).

    mi_limit(true, N, N).
    mi_limit((A,B), N0, N) :-
        mi_limit(A, N0, N1),
        mi_limit(B, N1, N).
    mi_limit(g(G), N0, N) :-
        N0 > 0,
        mi_clause(G, Body),
        N1 is N0 - 1,
        mi_limit(Body, N1, N).
An example query:
    ?- mi_limit(g(natnum1(X)), 3).

    X = 0 ;

    X = s(0) ;

    X = s(s(0)) ;

    No
Notice that we are limiting the depth of the search tree, not the number of solutions. (In this case, 3 solutions are found when I limit the search depth to at most 3, but this is a coincident.) Based on this MI, we can implement a complete search strategy, iterative deepening:
    seq(N, N).
    seq(N0, N) :-
        N1 is N0 + 1,
        seq(N1, N).

    mi_id(Goal) :-
        seq(0, N),
        mi_limit(Goal, N).
Consider the following example program:
    edge(a, b).
    edge(b, a).
    edge(b, c).
    edge(c, d).

    path(A, B, [e(A,B)|Rest], Rest) :-
        edge(A, B).
    path(A, C, [e(A,B)|Es], Rest) :-
        edge(A, B),
        path(B, C, Es, Rest).
And the example queries:
    ?- path(a, d, Es, []).
    ERROR: Out of local stack
        Exception: (32,302) path(a, d, _G193797, []) ?
    Action (h for help) ? abort
    % Execution Aborted


    ?- mi_id(g(path(a, d, Es, []))).

    Es = [e(a, b), e(b, c), e(c, d)]

    Yes
We see that iterative deepening finds a solution, whereas depth-first search simply dies due to lack of stack space (it wouldn't have found a solution if given infinite stack space either).

Some users of Prolog occasionally bemoan the omission of the occurs check in the default unification algorithms of today's common Prolog implementations, which can lead to unsound inference. Consider:
    occ(X, f(X)).
Without occurs check, the query
    ?- occ(A, A).
succeeds. Let us remedy this by changing the vanilla MI to use occurs check for unification of clause heads:
    mi_occ(true).
    mi_occ((A,B)) :-
        mi_occ(A),
        mi_occ(B).
    mi_occ(g(G)) :-
        functor(G, F, Arity),
        functor(H, F, Arity),
        mi_clause(H, Body),
        unify_with_occurs_check(G, H),
        mi_occ(Body).
And we get:
    ?- mi_occ(g(occ(A,A))).

    No
You can also use an MI similar to this one to reify the binding environment of variables. Simply thread the bindings through and add a term of the form "unify(G,H)" to the set of bindings in the third clause. The built-in predicate numbervars/3 can help you get rid of variables, if you want to reify unification.


As a consequence of Prolog's computation strategy, parsing with left-recursive grammars is problematic. We will now define an MI for DCGs that can handle left-recursion. Consider a simple grammar:
    dcgnumber(0).
    dcgnumber(1).

    expr(N) -->
        [N],
        { dcgnumber(N) }.
    expr(A+B) -->
        expr(A),
        [(+)],
        expr(B).
This grammar can be used to (unfairly) enumerate an arbitrary number of strings it describes:
    ?- phrase(expr(E), Ss).

    E = 0
    Ss = [0] ;

    E = 1
    Ss = [1] ;

    E = 0+0
    Ss = [0, +, 0]

    Yes
Parsing ground strings, however, leads to problems:
    ?- phrase(expr(E), [1,+,1]).

    E = 1+1 ;
    ERROR: Out of local stack
       Exception: (33,377) expr(_G100158, [1], _L500700) ? abort
    % Execution Aborted
Let us start by converting the grammar into a more suitable representation:
    dcg_clause(expr(N), [t(N),{dcgnumber(N)}]).
    dcg_clause(expr(A+B), [l,nt(expr(A)),t(+),nt(expr(B))]).
Notice the atom l in the body of the second clause. We use it to indicate that for this clause to apply, there must be at least one (therefore, one l) token left in the string to be parsed. We can use this information to automatically prune the search tree if we run out of tokens. The other terms are more or less self-explanatory: t/1 for terminals, nt/1 for non-terminals, {}/1 for goals.

The interface to the MI is defined like this:
    mi_dcg(NT, String) :-
        length(String, L),
        length(Rest0, L),
        mi_dcg_([nt(NT)], Rest0, _, String, []).
The actual worker predicates are defined like:
    mi_dcg(t(T), Rest, Rest, [T|Ts], Ts).
    mi_dcg({Goal}, Rest, Rest, Ts, Ts) :-
        call(Goal).
    mi_dcg(nt(NT), Rest0, Rest, Ts0, Ts) :-
        dcg_clause(NT, Body),
        mi_dcg_(Body, Rest0, Rest, Ts0, Ts).
    mi_dcg(l, [_|Rest], Rest, Ts, Ts).

    mi_dcg_([], Rest, Rest, Ts, Ts).
    mi_dcg_([G|Gs], Rest0, Rest, Ts0, Ts) :-
        mi_dcg(G, Rest0, Rest1, Ts0, Ts1),
        mi_dcg_(Gs, Rest1, Rest, Ts1, Ts).
We can now use the left-recursive grammar also for parsing as we wanted:
?- mi_dcg(expr(E), [1,+,1,+,1]).

E = 1+ (1+1) ;

E = 1+1+1 ;

No




On the Internet and in the literature, you can find many more MIs that extend Prolog in various ways, and indeed the standard way to prototype new features like exception handling, module systems, selectively delaying goals, checking for various kinds of infinite loops, profiling, debugging, type systems, constraint solving etc. is building a customized MI. This adds a certain amount of overhead (as a rule of thumb, expect one order of magnitude per layer of meta-interpretation) that can often later be compiled away using partial evaluation techniques.

Here is the file I used to test the code on this page: acomip.pl



Written Sept. 14th 2005
Author: Markus Triska (triska@gmx.at)

Main page