A Couple of Meta-interpreters in Prolog
Motivation
Informally, an interpreter is a computer program that takes as its
input other computer programs and performs the steps necessary to
evaluate them. The concept of interpretation is pervasive in computer
science both from a theoretical (e.g., proving equivalence of
expressiveness) and practical perspective. Indeed, one can regard
virtually all programs as interpreters for domain-specific languages.
For example, a program reading in settings from a configuration file
and adjusting itself accordingly is interpreting this "configuration
language".
In the context of Prolog, the term meta-interpreter, or MI for
short, is often used to denote an interpreter for a language similar
or identical to Prolog itself. An MI that can interpret itself is
called a meta-circular interpreter. Prolog is exceptionally
well-suited for writing MIs due to various reasons: First and most
importantly, Prolog programs can be naturally represented as Prolog
terms and are hence easily read in, inspected and manipulated using
built-in mechanisms. Second, Prolog's implicit computation strategy
(chronological backtracking) can be used in interpreters to avoid
explicit search (for matching clause heads etc.), allowing for concise
specifications. Third, variables from the object-level (the program to
be interpreted) can be treated as variables on the meta-level (the
interpreter) - therefore, an interpreter can delegate handling of the
interpreted program's binding environment to the underlying Prolog
engine (in contrast to interpreters written in Lisp).
The following simple Prolog program will serve as a running example throughout this text. I state it here using
default Prolog syntax, and we will subsequently consider various
different representations for it:
natnum1(0).
natnum1(s(X)) :-
natnum1(X).
Like many popular scripting languages, Prolog can evaluate arbitrary
Prolog code dynamically like this:
?- Goal = natnum1(X), Goal.
Goal = natnum1(0)
X = 0 ;
Goal = natnum1(s(0))
X = s(0)
Yes
You can make the call explicit using the Prolog built-in call/1
predicate, yielding the equivalent query "?- Goal = natnum1(X),
call(Goal).". The call/n (n > 1) family of predicates lets
you append additional arguments to Goal before it is called.
This mechanism (the so-called meta-call) is important for
predicates like sublist/3, maplist/3, if/3 etc.,
and can be seen as a special case of meta-interpretation, namely
interpretation in its coarsest sense: Let the underlying Prolog engine
handle everything. Implicitly using features of the underlying engine
is called absorption, while making features explicit is called
reification. By using call/1, we absorb backtracking,
handling of the binding environment, unification, handling of
conjunctions etc., indeed, everything. We will see in this text
various interpreters that differ in what they absorb and what they
reify, and we will explore various extensions over what Prolog offers,
implemented using MIs, far beyond what naive interpretation using
"eval" can give you in other languages.
Vanilla Meta-interpreters
Let's start by making sure we fully understand the definition of
natnum1/1. That definition consists of two clauses, the first
of which degenerated into a fact that is implicitly treated as
if it were stated like
natnum1(0) :- true.
The Prolog built-in predicate clause/2 allows us to inspect the
definition of the predicate natnum1/1:
?- clause(natnum1(Z), Body).
Z = 0
Body = true ;
Z = s(_G254)
Body = natnum1(_G254) ;
No
As expected, exactly two solutions (bindings) for Body and
Z are found on backtracking, corresponding to the two clauses
of the predicate's definition.
Let us have a brief look at the case of more complicated clause bodies:
complicated_clause(A) :-
goal1(A),
goal2(A),
goal3(A).
?- clause(complicated_clause(Z), Body).
Z = _G187
Body = goal1(_G187), goal2(_G187), goal3(_G187) ;
No
The body of complicated_clause/1 is in this case represented
using a compound term with functor "," (which is declared as an infix
operator, so "(A,B)" is equivalent to "','(A,B)") and arity 2, whose
arguments are again either goals or compound terms of similar
structure. Here is the precise Prolog definition of what a clause
body can look like:
is_body(true).
is_body((A,B)) :-
is_body(A),
is_body(B).
is_body(G) :- % ambiguous, also matches "true" and "(_,_)"
is_goal(G).
Using this knowledge, we can attempt to define a simple interpreter
for pure Prolog programs:
mi1(true).
mi1((A,B)) :-
mi1(A),
mi1(B).
mi1(Goal) :-
Goal \= true,
Goal \= (_,_),
clause(Goal, Body),
mi1(Body).
This is often called the vanilla MI because there's really
nothing special to it in its current state. In a broader sense, all
MIs that add no new ability to standard pure Prolog are sometimes
called vanilla MIs, and we will for now limit our universe of
discourse to this kind of MIs. Later, we will use these simple MIs as
a basis for adding useful enhancements.
We can use this MI
to run our simple example program:
?- mi1(natnum1(X)).
X = 0 ;
X = s(0) ;
X = s(s(0))
Yes
Notice that we had to guard the third clause from matching
true/0 or conjunctions, because trying to inspect these
entities using clause/2 would yield run-time errors. This is
somewhat ugly, but we can make it uglier still using cuts (!/0)
in the first two clauses. That would at least prevent useless
choice-points from being remembered. However, the most obvious problem
with both versions is that they cannot make use of (first-)argument
indexing, which would help us select the (at most) one applicable
clause for each non-variable term right away. We therefore have to
conclude that the default representation that Prolog provides us with
for clause bodies is somewhat "faulty" in that it fails to explicitly
mark goals as a code element sui generis. As a consequence, we
cannot express the test for goals as pattern matching, but rather
assume that everything that doesn't unify with true or
(_,_) is a goal. This is why the default representation, and
any representation that suffers from similar problems, is often called
"defaulty".
The obvious fix is using a custom, more suitable representation for
clause bodies that explicitly distinguishes goals as such by equipping
them with a special top-level functor, for which we arbitrarily choose
g. What we want are bodies like this:
is_body(true).
is_body((A,B)) :-
is_body(A),
is_body(B).
is_body(g(G)) :-
is_goal(G).
Let us rewrite the definition of natnum1/1 a bit according to
this specification:
natnum2(0).
natnum2(s(X)) :-
g(natnum2(X)).
At this point, we can no longer use standard Prolog to execute a query
like "?- natnum2(X).", because that would yield a runtime error
(there is no predicate g/1). We can, however, use the following
meta-interpreter to interpret the program:
mi2(true).
mi2((A,B)) :-
mi2(A),
mi2(B).
mi2(g(G)) :-
clause(G, Body),
mi2(Body).
Let us compare the two MIs by using them to prove that 3 is a natural
number. SWI-Prolog 5.5.25 tells me it needs 17 inferences for
mi1/1:
?- X = s(s(s(0))), time(mi1(natnum1(X))).
% 17 inferences, 0.00 CPU in 0.00 seconds (0% CPU, Infinite Lips)
X = s(s(s(0))) ;
No
In contrast, only 9 inferences are needed for the second MI (notice
the "g"-functor that we use to manually distinguish the initial goal
as such):
?- X = s(s(s(0))), time(mi2(g(natnum2(X)))).
% 9 inferences, 0.00 CPU in 0.00 seconds (0% CPU, Infinite Lips)
X = s(s(s(0))) ;
No
In the following, I will write mi_clause/2 to indicate a
predicate that is similar to the standard clause/2, except that
it supplies us with an "appropriate" (MI-dependent) representation of
clause bodies. For the MI at hand, it is defined like this:
mi_clause(G, Body) :-
clause(G, B),
defaulty_better(B, Body).
defaulty_better(true, true).
defaulty_better((A,B), (BA,BB)) :-
defaulty_better(A, BA),
defaulty_better(B, BB).
defaulty_better(G, g(G)) :-
G \= true,
G \= (_,_).
For one, using this predicate instead of clause/2 above lets us
interpret (a subset of) normal Prolog code with the MI (without the
efficiency advantage though, as the defaulty representation still has
to be converted at run-time), and second, this predicate can be also
used to statically convert normal Prolog code into the more suitable
representation that we devised. Most of the time however,
mi_clause/2 might be simply defined using a set of facts, like
(for this MI):
mi_clause(natnum(0), true).
mi_clause(natnum(s(X)), g(natnum(X)).
The 2 MIs presented so far reify conjunction. They
absorb unification and backtracking (which is implicit through
clause/2). What we can therefore do in this place is changing
the very aspect that we have gotten a hold of (i.e., handling of
conjunction). Here is an MI that reverses the default execution order
of goals:
mi3(true).
mi3((A,B)) :-
mi3(B), % notice this! intentionally first B, than A
mi3(A).
mi3(g(G)) :-
mi_clause(G, Body),
mi3(Body).
With minor adjustments to mi_clause/2 (adding the fact
"defaulty_better(fail, fail)." and guarding G against the atom
fail/0), and after adding the additional clause
mi3(fail) :-
fail.
this MI can be used to prove that - declaratively (i.e., taking into
account commutativity of logical conjunctions) - the query "?-
declarative_fail." fails given the predicate's definition:
declarative_fail :-
declarative_fail,
fail.
Whereas the standard execution order can't derive this (and quickly
leads to stack overflow in practice).
Another thing we are able to do at this point is selectively allowing
or refusing to execute goals by changing the third clause of the
interpreter to something like this:
mi2(g(G)) :-
( safe_goal(G) ->
mi_clause(G, Body),
mi2(Body)
;
format("Sorry, you can't execute ~w\n", [G]),
fail
).
Now you can use the interpreter to prove "natnum1(X)" only if you add
a fact like
safe_goal(natnum1(_)).
to the database.
We can obtain an even simpler MI if we use lists of goals to represent
conjunctions. In this representation, true/0 becomes the empty
list, and our example program can be represented like this:
natnum_list(0, []).
natnum_list(s(X), [natnum_list(X)]).
Again, the conversion between runnable Prolog code and this (for our
purpose) more convenient representation can be performed automatically
using this definition of mi_clause/2:
mi_clause(G, Ls) :-
clause(G, Body),
phrase(body_list(Body), Ls).
body_list(true) -->
[].
body_list((A,B)) -->
body_list(A),
body_list(B).
body_list(G) -->
{ G \= true },
{ G \= (_,_) },
[G].
The obvious MI for this representation consists of only 2 clauses:
mi_list1([]).
mi_list1([G|Gs]) :-
mi_clause(G, Body),
mi_list1(Body),
mi_list1(Gs).
We can easily make it tail-recursive:
mi_list2([]).
mi_list2([G0|Gs0]) :-
mi_clause(G0, Body),
append(Body, Gs0, Gs),
mi_list2(Gs).
Look at this example to observe the difference:
always_infinite :-
always_infinite.
?- mi_list1([always_infinite]).
ERROR: Out of local stack
?- mi_list2([always_infinite]). % loops, constant stack space
We can further simplify the interpreter if we choose yet another
representation for clauses ("normal" Prolog clauses can again be
converted automatically of course, but we use facts here right away):
Using list differences, appending the yet-to-be-proved goals can go
hand in hand with expanding the clause body.
mi_ldclause(natnum(0), Rest, Rest).
mi_ldclause(natnum(s(X)), [natnum(X)|Rest], Rest).
mi_list3([]).
mi_list3([G0|Gs0]) :-
mi_ldclause(G0, Remaining, Gs0),
mi_list3(Remaining).
We will see that this clause representation has its advantages for the
MI that reifies backtracking in a moment. For now, an example query:
?- mi_list3([natnum(X)]).
X = 0 ;
X = s(0) ;
Yes
Here is an example MI that knows how to handle the two built-in
predicates clause/2 and \=/2 in addition to pure Prolog:
mi_circ(true).
mi_circ((A,B)) :-
mi_circ(A),
mi_circ(B).
mi_circ(clause(A,B)) :-
clause(A,B).
mi_circ(A \= B) :-
A \= B.
mi_circ(G) :-
G \= true,
G \= (_,_),
G \= (_\=_),
G \= clause(_,_),
clause(G, Body),
mi_circ(Body).
It can interpret itself and is thus a meta-circular interpreter:
?- mi_circ(mi_circ(natnum1(X))).
X = 0 ;
X = s(0)
Yes
To generalize this pattern to all built-in predicates, we can
use predicate_property/2 to identify them as such for calling
them directly as in the next example:
provable(true, _) :- !.
provable((G1,G2), Defs) :- !,
provable(G1, Defs),
provable(G2, Defs).
provable(BI, _) :-
predicate_property(BI, built_in),
!,
call(BI).
provable(Goal, Defs) :-
member(Goal-Body, Defs),
provable(Body, Defs).
provable(Goal, Defs) is true if Goal is provable with
respect to the definitions Defs, which is a list of clauses
represented as terms of the form Head-Body. Notice, however,
that how !/0 (which is also classified as a built-in predicate)
is interpreted by this MI does not match its intended meaning, and
building an MI that handles cuts correctly requires a bit more work.
With the following additional definitions, we can use this MI to
identify redundant facts in some predicate definitions:
redundant(Functor/Arity, Reds) :-
functor(Term, Functor, Arity),
findall(Term-Body, clause(Term, Body), Defs),
setof(Red, Defs^redundant_(Defs, Red), Reds).
member_rest(M, [M|Rest], Rest).
member_rest(M, [E|Es], [E|Rest]) :-
member_rest(M, Es, Rest).
redundant_(Defs, Fact) :-
member_rest(Fact-true, Defs, Rest),
once(provable(Fact, Rest)).
Given a predicate definition like:
as([]).
as([a]). % redundant
as([a,a]). % redundant
as([A|As]) :-
A = a, % test built-in =/2
5 is 2 + 3, % test built-in is/2
1 > 0, % test built-in >/2
as(As).
we can now ask for facts which are deducible from all (respective)
remaining clauses and hence redundant:
?- redundant(as/1, Reds).
Reds = [as([a]), as([a, a])]
Let us reify backtracking now. We need to make explicit all
alternative clause bodies that are normally found on backtracking,
collecting them deterministically in one sweep using
findall/3. A single alternative is represented as a list of
goals, and the branches of computation that have yet to be explored
are encoded as a list of alternatives. The interface predicate,
mi_backtrack/1, takes as its argument a goal and creates the
representation of the initial state of computation: A list, consisting
of a single alternative, [G0]. Actually, the representation is
[G0]-G0, and G0 is also passed as the second argument to
the worker predicate for reasons that will become apparent
momentarily.
mi_backtrack(G0) :-
mi_backtrack_([[G0]-G0], G0).
To perform a single step of the computation, we first collect all
clause bodies whose heads unify with the first goal of the first
alternative. To all found clause bodies, the remaining goals of that
first alternative are appended, thus obtaining new alternatives that
are subsequently prepended to the remaining alternatives to give the
new state of computation. Using the clause representation that makes
use of list differences, all this can be concisely expressed:
resstep_([Alt|Alts0], Alts) :-
findall(Gs-G, (Alt = [G0|Rest]-G,mi_ldclause(G0,Gs,Rest)), Gss),
append(Gss, Alts0, Alts).
(SICStus Prolog provides findall/4, which can be used to combine the
goals into one.)
Notice that leaves of the computation, i.e., alternatives that we are
done with, automatically vanish from the list of alternatives as there
is no goal to be proved for them any more and the unification "Alt =
[G0|Rest]-G" thus fails, leading to Gss being unified with "[]"
and carrying on with the remaining alternatives exclusively.
The simple glue between the two predicates is this:
mi_backtrack_([[]-G|_], G).
mi_backtrack_(Alts0, G) :-
resstep_(Alts0, Alts1),
mi_backtrack_(Alts1, G).
The code is straight-forward: If an alternative could be proved, i.e.,
no goals remain to be proved, a solution for the initial query is
found and we can unify the query with that solution to generate the
relevant bindings that are subsequently reported at the top-level to
the user. This is why we needed to pass around the user's query. The
second clause describes how the computation is carried on: The list of
alternatives is transformed as described above, and the process
continues.
Representing all alternatives explicitly lets us do a few interesting
things, for example, guiding the inference process by reordering
alternatives (giving higher priority to alternatives that seem more
likely to succeed),implementing fair execution strategies like
breadth-first computation (by not prepending, but appending new
alternatives to the remaining ones) and so on.
Extending Prolog
If you want a feature that out-of-the-box Prolog does not provide, you
simply augment one of the vanilla MIs with the desired
functionality. For a start, we devise an MI that behaves exactly like
standard pure Prolog and builds a proof-tree in parallel that
makes explicit the inferences that lead to the proof.
:- op(750, xfy, =>). % syntactic sugar
mi_tree(true, true).
mi_tree((A,B), (TA,TB)) :-
mi_tree(A, TA),
mi_tree(B, TB).
mi_tree(g(G), TBody => G) :-
mi_clause(G, Body),
mi_tree(Body, TBody).
Example query:
?- mi_tree(g(natnum1(X)), T).
X = 0
T = true=>natnum1(0) ;
X = s(0)
T = (true=>natnum1(0))=>natnum1(s(0)) ;
X = s(s(0))
T = ((true=>natnum1(0))=>natnum1(s(0)))=>natnum1(s(s(0)))
Yes
Another group of extensions aims to improve the incomplete default
computation strategy (depth-first). We start from an MI that limits
the depth of the search tree:
mi_limit(Goal, Max) :-
mi_limit(Goal, Max, _).
mi_limit(true, N, N).
mi_limit((A,B), N0, N) :-
mi_limit(A, N0, N1),
mi_limit(B, N1, N).
mi_limit(g(G), N0, N) :-
N0 > 0,
mi_clause(G, Body),
N1 is N0 - 1,
mi_limit(Body, N1, N).
An example query:
?- mi_limit(g(natnum1(X)), 3).
X = 0 ;
X = s(0) ;
X = s(s(0)) ;
No
Notice that we are limiting the depth of the search tree, not the
number of solutions. (In this case, 3 solutions are found when I limit
the search depth to at most 3, but this is a coincident.) Based on
this MI, we can implement a complete search strategy,
iterative deepening:
seq(N, N).
seq(N0, N) :-
N1 is N0 + 1,
seq(N1, N).
mi_id(Goal) :-
seq(0, N),
mi_limit(Goal, N).
Consider the following example program:
edge(a, b).
edge(b, a).
edge(b, c).
edge(c, d).
path(A, B, [e(A,B)|Rest], Rest) :-
edge(A, B).
path(A, C, [e(A,B)|Es], Rest) :-
edge(A, B),
path(B, C, Es, Rest).
And the example queries:
?- path(a, d, Es, []).
ERROR: Out of local stack
Exception: (32,302) path(a, d, _G193797, []) ?
Action (h for help) ? abort
% Execution Aborted
?- mi_id(g(path(a, d, Es, []))).
Es = [e(a, b), e(b, c), e(c, d)]
Yes
We see that iterative deepening finds a solution, whereas depth-first
search simply dies due to lack of stack space (it wouldn't have found
a solution if given infinite stack space either).
Some users of Prolog occasionally bemoan the omission of the
occurs check in the default unification algorithms of today's common
Prolog implementations, which can lead to unsound inference. Consider:
occ(X, f(X)).
Without occurs check, the query
?- occ(A, A).
succeeds. Let us remedy this by changing the vanilla MI to use occurs
check for unification of clause heads:
mi_occ(true).
mi_occ((A,B)) :-
mi_occ(A),
mi_occ(B).
mi_occ(g(G)) :-
functor(G, F, Arity),
functor(H, F, Arity),
mi_clause(H, Body),
unify_with_occurs_check(G, H),
mi_occ(Body).
And we get:
?- mi_occ(g(occ(A,A))).
No
You can also use an MI similar to this one to reify the binding
environment of variables. Simply thread the bindings through and
add a term of the form "unify(G,H)" to the set of bindings in the
third clause. The built-in predicate numbervars/3 can help you
get rid of variables, if you want to reify unification.
As a consequence of Prolog's computation strategy, parsing with
left-recursive grammars is problematic. We will now define an MI for
DCGs that can handle left-recursion. Consider a simple grammar:
dcgnumber(0).
dcgnumber(1).
expr(N) -->
[N],
{ dcgnumber(N) }.
expr(A+B) -->
expr(A),
[(+)],
expr(B).
This grammar can be used to (unfairly) enumerate an arbitrary number of strings
it describes:
?- phrase(expr(E), Ss).
E = 0
Ss = [0] ;
E = 1
Ss = [1] ;
E = 0+0
Ss = [0, +, 0]
Yes
Parsing ground strings, however, leads to problems:
?- phrase(expr(E), [1,+,1]).
E = 1+1 ;
ERROR: Out of local stack
Exception: (33,377) expr(_G100158, [1], _L500700) ? abort
% Execution Aborted
Let us start by converting the grammar into a more suitable representation:
dcg_clause(expr(N), [t(N),{dcgnumber(N)}]).
dcg_clause(expr(A+B), [l,nt(expr(A)),t(+),nt(expr(B))]).
Notice the atom l in the body of the second clause. We use it
to indicate that for this clause to apply, there must be at least one
(therefore, one l) token left in the string to be parsed. We
can use this information to automatically prune the search tree if we
run out of tokens. The other terms are more or less self-explanatory:
t/1 for terminals, nt/1 for non-terminals, {}/1 for goals.
The interface to the MI is defined like this:
mi_dcg(NT, String) :-
length(String, L),
length(Rest0, L),
mi_dcg_([nt(NT)], Rest0, _, String, []).
The actual worker predicates are defined like:
mi_dcg(t(T), Rest, Rest, [T|Ts], Ts).
mi_dcg({Goal}, Rest, Rest, Ts, Ts) :-
call(Goal).
mi_dcg(nt(NT), Rest0, Rest, Ts0, Ts) :-
dcg_clause(NT, Body),
mi_dcg_(Body, Rest0, Rest, Ts0, Ts).
mi_dcg(l, [_|Rest], Rest, Ts, Ts).
mi_dcg_([], Rest, Rest, Ts, Ts).
mi_dcg_([G|Gs], Rest0, Rest, Ts0, Ts) :-
mi_dcg(G, Rest0, Rest1, Ts0, Ts1),
mi_dcg_(Gs, Rest1, Rest, Ts1, Ts).
We can now use the left-recursive grammar also for parsing as we wanted:
?- mi_dcg(expr(E), [1,+,1,+,1]).
E = 1+ (1+1) ;
E = 1+1+1 ;
No
On the Internet and in the literature, you can find many more MIs that
extend Prolog in various ways, and indeed the standard way to
prototype new features like exception handling, module systems,
selectively delaying goals, checking for various kinds of infinite
loops, profiling, debugging, type systems, constraint solving etc. is
building a customized MI. This adds a certain amount of overhead (as a
rule of thumb, expect one order of magnitude per layer of
meta-interpretation) that can often later be compiled away using
partial evaluation techniques.
Here is the file I used to test the code on this page: acomip.pl
Written Sept. 14th 2005
Author: Markus Triska (triska@gmx.at)
Main page