This chapter provides a brief introduction to the syntax and semantics of a certain subset of logic (definite clauses, also known as Horn clauses), and indicates how this subset forms the basis of Prolog.
The data objects of the language are called terms. A term is either a constant, a variable or a compound term.
The constants include integers such as
0 1 999 -512
Besides the usual decimal, or base 10, notation, integers may also be written in any base from 2 to 36, of which base 2 (binary), 8 (octal), and 16 (hex) are probably the most useful. Letters A through ! Z (upper or lower case) are used for bases greater than 10. E.g.
15 2'1111 8'17
all represent the integer fifteen.
There is also a special notation for character constants. E.g.
0'A 0'\x41 0'\101
are all equivalent to 65
(the character code for A).
`0'' followed by any character except \ (backslash) is thus
read as an integer. If `0'' is followed by \, the \
denotes the start of an escape sequence with special meaning
(see section Escape Sequences).
Constants also include floats such as
1.0 -3.141 4.5E7 -0.12e+8 12.0e-9
Note that there must be a decimal point in floats written with an exponent, and that there must be at least one digit before and after the decimal point.
Constants also include atoms such as
a void = := 'Algol-68' []
Constants are definite elementary objects, and correspond to proper nouns in natural language. For reference purposes, here is a list of the possible forms which an atom may take:
'can"t'
. Backslashes in the sequence denote escape sequences
(see section Escape Sequences).
{X}
is allowed as an alternative to {}(X)
. The form
[X]
is the normal notation for lists, as an alternative to
.(X,[])
. Variables may be written as any sequence of alphanumeric characters (including _) starting with either a capital letter or _; e.g.
X Value A A1 _3 _RESULT
If a variable is only referred to once in a clause, it does not need to be named and may be written as an anonymous variable, indicated by the underline character _. A clause may contain several anonymous variables; they are all read and treated as distinct variables.
A variable should be thought of as standing for some definite but unidentified object. This is analogous to the use of a pronoun in natural language. Note that a variable is not simply a writable storage location as in most programming languages; rather it is a local name for some data object, cf. the variable of pure LISP and identity declarations in Algol68.
The structured data objects of the language are the compound terms. A
compound term comprises a functor (called the principal functor of
the term) and a sequence of one or more terms called arguments. A
functor is characterized by its name, which is an atom, and its arity
or number of arguments. For example the compound term whose functor is
named point
of arity 3, with arguments X
, Y
and
Z
, is written
point(X, Y, Z)
Note that an atom is considered to be a functor of arity 0.
Functors are generally analogous to common nouns in natural language. One may think of a functor as a record type and the arguments of a compound term as the fields of a record. Compound terms are usefully pictured as trees. For example, the term
s(np(john),vp(v(likes),np(mary)))
would be pictured as the structure
s / \ np vp | / \ john v np | | likes mary
Sometimes it is convenient to write certain functors as operators--2-ary functors may be declared as infix operators and 1-ary functors as prefix or postfix operators. Thus it is possible to write, e.g.
X+Y (P;Q) X<Y +X P;
as optional alternatives to
+(X,Y) ;(P,Q) <(X,Y) +(X) ;(P)
The use of operators is described fully below (see section Operators).
Lists form an important class of data structures in Prolog. They are
essentially the same as the lists of LISP: a list either is the atom
[]
representing the empty list, or is a compound term with
functor .
and two arguments which are respectively the head and
tail of the list. Thus a list of the first three natural numbers is the
structure
. / \ 1 . / \ 2 . / \ 3 []
which could be written, using the standard syntax, as
.(1,.(2,.(3,[])))
but which is normally written, in a special list notation, as
[1,2,3]
The special list notation in the case when the tail of a list is a variable is exemplified by
[X|L] [a,b|L]
representing
. . / \ / \ X L a . / \ b L
respectively.
Note that this notation does not add any new power to the language; it simply makes it more readable. e.g. the above examples could equally be written
.(X,L) .(a,.(b,L))
For convenience, a further notational variant is allowed for lists of integers which correspond to character codes. Lists written in this notation are called strings. E.g.
"SICStus"
which represents exactly the same list as
[83,73,67,83,116,117,115]
As for quoted atoms, if a double quote character is included in the
sequence it must be written twice, e.g. 'can"t'
. Backslashes
in the sequence denote escape sequences (see section Escape Sequences).
A fundamental unit of a logic program is the goal or procedure call. e.g.
gives(tom, apple, teacher) reverse([1,2,3], L) X<Y
A goal is merely a special kind of term, distinguished only by the context in which it appears in the program. The (principal) functor of a goal identifies what predicate the goal is for. It corresponds roughly to a verb in natural language, or to a procedure name in a conventional programming language.
A logic program consists simply of a sequence of statements called sentences, which are analogous to sentences of natural language. A sentence comprises a head and a body. The head either consists of a single goal or is empty. The body consists of a sequence of zero or more goals (i.e. it too may be empty). If the head is not empty, the sentence is called a clause.
If the body of a clause is empty, the clause is called a unit clause, and is written in the form
P.
where P is the head goal. We interpret this declaratively as
Goals matching P are true.
and procedurally as
Goals matching P are satisfied.
If the body of a clause is non-empty, the clause is called a non-unit clause, and is written in the form
P :- Q, R, S.
where P is the head goal and Q, R and S are the goals which make up the body. We can read such a clause either declaratively as
P is true if Q and R and S are true.
or procedurally as
To satisfy goal P, satisfy goals Q, R and S.
A sentence with an empty head is called a directive (see section Directives: Queries and Commands), of which the most important kind is called a query and is written in the form
?- P, Q.
where P and Q are the goals of the body. Such a query is read declaratively as
Are P and Q true?
and procedurally as
Satisfy goals P and Q.
Sentences generally contain variables. Note that variables in different sentences are completely independent, even if they have the same name--i.e. the lexical scope of a variable is limited to a single sentence. Each distinct variable in a sentence should be interpreted as standing for an arbitrary entity, or value. To illustrate this, here are some examples of sentences containing variables, with possible declarative and procedural readings:
employed(X) :- employs(Y,X).
"Any X is employed if any Y employs X."
"To find whether a person X is employed,
find whether any Y employs X." derivative(X,X,1).
"For any X, the derivative of X with respect to X
is 1."
"The goal of finding a derivative for the expression X with
respect to X itself is satisfied by the result 1." ?- ungulate(X), aquatic(X).
"Is it true, for any X, that X is an ungulate and X is
aquatic?"
"Find an X which is both an ungulate and aquatic."
In any program, the predicate for a particular (principal) functor
is the sequence of clauses in the program whose head goals have that
principal functor. For example, the predicate for a 3-ary functor
concatenate/3
might well consist of the two clauses
concatenate([], L, L). concatenate([X|L1], L2, [X|L3]) :- concatenate(L1, L2, L3).
where concatenate(L1,L2,L3)
means "the list
L1 concatenated with the list L2 is the list L3".
Note that for predicates with clauses corresponding to a base case and a
recursive case, the preferred style is to write the base case clause
first.
In Prolog, several predicates may have the same name but different
arities. Therefore, when it is important to specify a predicate
unambiguously, the form name/arity
is used; e.g.
concatenate/3
.
Certain predicates are predefined by built-in predicates supplied by the Prolog system. Such predicates are called built-in predicates.
As we have seen, the goals in the body of a sentence are linked by the operator `,' which can be interpreted as conjunction ("and"). It is sometimes convenient to use an additional operator `;', standing for disjunction ("or"). (The precedence of `;' is such that it dominates `,' but is dominated by `:-'.) An example is the clause
grandfather(X, Z) :- (mother(X, Y); father(X, Y)), father(Y, Z).
which can be read as
For any X, Y and Z, X has Z as a grandfather if either the mother of X is Y or the father of X is Y, and the father of Y is Z.
Such uses of disjunction can always be eliminated by defining an extra predicate--for instance the previous example is equivalent to
grandfather(X,Z) :- parent(X,Y), father(Y,Z). parent(X,Y) :- mother(X,Y). parent(X,Y) :- father(X,Y).
---and so disjunction will not be mentioned further in the following, more formal, description of the semantics of clauses.
The token `|', when used outside a list, is an alias for `;'. The aliasing is performed when terms are read in, so that
a :- b | c.
is read as if it were
a :- b ; c.
Note the double use of the `.' character. On the one hand it is
used as a sentence terminator, while on the other it may be used in a
string of symbols which make up an atom (e.g. the list functor
./2
). The rule used to disambiguate terms is that a `.'
followed by layout-text is regarded as a sentence terminator
(see section Syntax of Tokens as Character Strings).
The semantics of definite clauses should be fairly clear from the informal interpretations already given. However it is useful to have a precise definition. The declarative semantics of definite clauses tells us which goals can be considered true according to a given program, and is defined recursively as follows.
A goal is true if it is the head of some clause instance and each of the goals (if any) in the body of that clause instance is true, where an instance of a clause (or term) is obtained by substituting, for each of zero or more of its variables, a new term for all occurrences of the variable.
For example, if a program contains the preceding procedure for
concatenate/3
, then the declarative semantics tells us that
?- concatenate([a], [b], [a,b]).
is true, because this goal is the head of a certain instance of the
first clause for concatenate/3
, namely,
concatenate([a], [b], [a,b]) :- concatenate([], [b], [b]).
and we know that the only goal in the body of this clause instance is
true, since it is an instance of the unit clause which is the second
clause for concatenate/3
.
Note that the declarative semantics makes no reference to the sequencing of goals within the body of a clause, nor to the sequencing of clauses within a program. This sequencing information is, however, very relevant for the procedural semantics which Prolog gives to definite clauses. The procedural semantics defines exactly how the Prolog system will execute a goal, and the sequencing information is the means by which the Prolog programmer directs the system to execute the program in a sensible way. The effect of executing a goal is to enumerate, one by one, its true instances. Here then is an informal definition of the procedural semantics. We first illustrate the semantics by the simple query
?- concatenate(X, Y, [a,b]).
We find that it matches the head of the first clause for
concatenate/3
, with X instantiated to [a|X1]
.
The new variable X1 is constrained by the new query produced,
which contains a single recursive procedure call:
?- concatenate(X1, Y, [b]).
Again this goal matches the first clause, instantiating X1 to
[b|X2]
, and yielding the new query:
?- concatenate(X2, Y, [])
Now the single goal will only match the second clause, instantiating
both X2 and Y to []
. Since there are no further
goals to be executed, we have a solution
X = [a,b] Y = []
i.e. a true instance of the original goal is
concatenate([a,b], [], [a,b])
If this solution is rejected, backtracking will generate the further solutions
X = [a] Y = [b] X = [] Y = [a,b]
in that order, by re-matching, against the second clause for concatenate, goals already solved once using the first clause.
Thus, in the procedural semantics, the set of clauses
H :- B1, ..., Bm. H' :- B1', ..., Bm'. ...
are regarded as a procedure definition for some predicate H, and in a query
?- G1, ..., Gn.
each Gi is regarded as a procedure call. To execute a query, the system selects by its computation rule a goal, Gj say, and searches by its search rule a clause whose head matches Gj. Matching is done by the unification algorithm (see [Robinson 65] which computes the most general unifier, mgu, of Gj and H. The mgu is unique if it exists. If a match is found, the current query is reduced to a new query
?- (G1, ..., Gj-1, B1, ..., Bm, Gj+1, ..., Gn)mgu.
and a new cycle is started. The execution terminates when the empty query has been produced.
If there is no matching head for a goal, the execution backtracks to the most recent successful match in an attempt to find an alternative match. If such a match is found, an alternative new query is produced, and a new cycle is started.
In SICStus Prolog, as in other Prolog systems, the search rule is simple: "search forward from the beginning of the program".
The computation rule in traditional Prolog systems is also simple: "pick the leftmost goal of the current query". However, SICStus Prolog and other modern implementations have a somewhat more complex computation rule "pick the leftmost unblocked goal of the current query".
A goal can be blocked on one ore more uninstantiated variables, and a variable may block several goals. Thus binding a variable can cause blocked goals to become unblocked, and backtracking can cause currently unblocked goals to become blocked again. Moreover, if the current query is
?- G1, ..., Gj-1, Gj, Gj+1, ..., Gn.
where Gj is the first unblocked goal, and matching Gj against a clause head causes several blocked goals in G1, ..., Gj-1 to become unblocked, then these goals may become reordered. The internal order of any two goals that were blocked on the same variable is retained, however.
Another consequence is that a query may be derived consisting entirely of blocked goals. Such a query is said to have floundered. The top-level checks for this condition. If detected, the outstanding blocked subgoals are printed on the standard error stream along with the answer substitution, to notify the user that the answer (s)he has got is really a speculative one, since it is only valid if the blocked goals can be satisfied.
A goal is blocked if certain arguments are uninstantiated and its predicate definition is annotated with a matching block declaration (see section Block Declarations). Goals of certain built-in may also be blocked if their arguments are not sufficiently instantiated.
When this mechanism is used, the control structure resembles that of coroutines, suspending and resuming different threads of control. When a computation has left blocked goals behind, the situation is analogous to spawning a new suspended thread. When a blocked goal becomes unblocked, the situation is analogous to temporarily suspending the current thread and resuming the thread to which the blocked goal belongs.
It is possible, and sometimes useful, to write programs which unify a variable to a term in which that variable occurs, thus creating a cyclic term. The usual mathematical theory behind Logic Programming forbids the creation of cyclic terms, dictating that an occurs-check should be done each time a variable is unified with a term. Unfortunately, an occurs-check would be so expensive as to render Prolog impractical as a programming language. Thus cyclic terms may be created and may cause loops trying to print them.
SICStus Prolog mitigates the problem by its ability to unify, compare
(see section Comparison of Terms), assert, and copy cyclic terms without looping.
The write_term/2
built-in predicate can optionally handle cyclic
terms; see section Input and Output of Terms. Unification with occurs-check and predicates
testing (a)cyclicity are available in a library package;
see section Term Utilities. Other predicates usually do not handle
cyclic terms well.
Besides the sequencing of goals and clauses, Prolog provides one other
very important facility for specifying control information. This is the
cut symbol, written !
. It is inserted in the program just
like a goal, but is not to be regarded as part of the logic of the
program and should be ignored as far as the declarative semantics is
concerned.
The effect of the cut symbol is as follows. When first encountered as a goal, cut succeeds immediately. If backtracking should later return to the cut, the effect is to fail the parent goal, i.e. that goal which matched the head of the clause containing the cut, and caused the clause to be activated. In other words, the cut operation commits the system to all choices made since the parent goal was invoked, and causes other alternatives to be discarded. The goals thus rendered determinate are the parent goal itself, any goals occurring before the cut in the clause containing the cut, and any subgoals which were executed during the execution of those preceding goals.
For example:
member(X, [X|_]). member(X, [_|L]) :- member(X, L).
This predicate can be used to test whether a given term is in a list. E.g.
| ?- member(b, [a,b,c]).
returns the answer `yes'. The predicate can also be used to extract elements from a list, as in
| ?- member(X, [d,e,f]).
With backtracking this will successively return each element of the list. Now suppose that the first clause had been written instead:
member(X, [X|_]) :- !.
In this case, the above call would extract only the first element of the
list (d
). On backtracking, the cut would immediately fail the whole
predicate.
x :- p, !, q. x :- r.
This is equivalent to
x := if p then q else r;
in an Algol-like language.
It should be noticed that a cut discards all the alternatives since the parent goal, even when the cut appears within a disjunction. This means that the normal method for eliminating a disjunction by defining an extra predicate cannot be applied to a disjunction containing a cut.
A proper use of the cut is usually a major difficulty for new Prolog programmers. The usual mistakes are to over-use cut, and to let cuts destroy the logic. A cut that doesn't destroy the logic is called a green cut; a cut that does is called a red cut. We would like to advise all users to follow these general rules. Also see section Programming Tips and Examples.
Operators in Prolog are simply a notational convenience. For
example, the expression 2+1
could also be written +(2,1)
.
This expression represents the data structure
+ / \ 2 1
and not the number 3. The addition would only be performed if
the structure were passed as an argument to an appropriate predicate
such as is/2
(see section Arithmetic).
The Prolog syntax caters for operators of three main kinds---infix, prefix and postfix. An infix operator appears between its two arguments, while a prefix operator precedes its single argument and a postfix operator is written after its single argument.
Each operator has a precedence, which is a number from 1 to 1200. The precedence is used to disambiguate expressions where the structure of the term denoted is not made explicit through the use of parentheses. The general rule is that it is the operator with the highest precedence that is the principal functor. Thus if `+' has a higher precedence than `/', then
a+b/c a+(b/c)
are equivalent and denote the term +(a,/(b,c))
. Note that the
infix form of the term /(+(a,b),c)
must be written with explicit
parentheses, i.e.
(a+b)/c
If there are two operators in the subexpression having the same highest precedence, the ambiguity must be resolved from the types of the operators. The possible types for an infix operator are
xfx xfy yfx
Operators of type xfx
are not associative: it is a requirement
that both of the two subexpressions which are the arguments of the
operator must be of lower precedence than the operator itself,
i.e. their principal functors must be of lower precedence, unless the
subexpression is explicitly parenthesized (which gives it zero
precedence).
Operators of type xfy
are right-associative: only the first
(left-hand) subexpression must be of lower precedence; the right-hand
subexpression can be of the same precedence as the main operator.
Left-associative operators (type yfx
) are the other way around.
A functor named name is declared as an operator of type Type and precedence precedence by the command
:- op(precedence, type, name).
The argument name can also be a list of names of operators of the same type and precedence.
It is possible to have more than one operator of the same name, so long as they are of different kinds, i.e. infix, prefix or postfix. An operator of any kind may be redefined by a new declaration of the same kind. This applies equally to operators which are provided as standard. Declarations of all the standard operators can be found elsewhere (see section Standard Operators).
For example, the standard operators +
and -
are declared by
:- op( 500, yfx, [ +, - ]).
so that
a-b+c
is valid syntax, and means
(a-b)+c
i.e.
+ / \ - c / \ a b
The list functor . is not a standard operator, but if we declare it thus:
:- op(900, xfy, .).
then a.b.c
would represent the structure
. / \ a . / \ b c
Contrasting this with the diagram above for a-b+c
shows the
difference between yfx
operators where the tree grows to the
left, and xfy
operators where it grows to the right. The tree
cannot grow at all for xfx
operators; it is simply illegal to
combine xfx
operators having equal precedences in this way.
The possible types for a prefix operator are
fx fy
and for a postfix operator they are
xf yf
The meaning of the types should be clear by analogy with those for infix
operators. As an example, if not
were declared as a prefix
operator of type fy
, then
not not P
would be a permissible way to write not(not(P))
. If the type
were fx
, the preceding expression would not be legal, although
not P
would still be a permissible form for not(P)
.
If these precedence and associativity rules seem rather complex, remember that you can always use parentheses when in any doubt.
Note that the arguments of a compound term written in standard syntax
must be expressions of precedence below 1000. Thus it is
necessary to parenthesize the expression P :- Q
in
| ?- assert((P :- Q)).
Note carefully the following syntax restrictions, which serve to remove potential ambiguity associated with prefix operators.
point (X,Y,Z)is invalid syntax.
:-(p;q),r.(where `:-' is the prefix operator) is invalid syntax. The system would try to interpret it as the structure:
, / \ :- r | ; / \ p qThat is, it would take `:-' to be a functor of arity 1. However, since the arguments of a functor are required to be expressions of precedence below 1000, this interpretation would fail as soon as the `;' (precedence 1100) was encountered. In contrast, the term:
:- (p;q),r.is valid syntax and represents the following structure.
:- | , / \ ; r / \ p q
Comments have no effect on the execution of a program, but they are very useful for making programs more readily comprehensible. Two forms of comment are allowed in Prolog: