File : PLSTD.MSS
Author : R.A.O'Keefe
Updated: 23 July 1984
Purpose: SCRIBE manuscript for the draft Prolog standard
Department of Artificial Intelligence
University of Edinburgh
DAI Working Paper No. @Parm(Text)
Date: @Value(Date)
Title: Draft Proposed Standard for Prolog Evaluable Predicates
Author: R. A. O'Keefe.
Abstract
There are several different implementations of Prolog which
resemble Dec-10 Prolog closely enough that it is worth trying
to move programs from one version to another. To facilitate
this, it is important to have a clear specification of the
semantics of the standard predicates (syntax is comparatively
unimportant). This paper presents such a specification for
term classification, construction, and comparison, for
arithmetic, and for control structures.
the term classification and construction and the arithmetic
evaluable predicates of a Prolog system. The proposed
standard extends Dec-10 Prolog in the interests of portability
between future systems.
This is a draft, published for comments.
This work was supported by a Commonwealth Scholarship.
Computing was done on the Edinburgh ICF KL-10 under SERC grant GR/C/06226.
Draft Proposed Standard for Prolog Evaluable Predicates
Introduction
Thanks in no small part to Chris Mellish's and Bill Clocksin's book
"Programming in Prolog", there are quite a few Prolog interpreters
around (and one or two compilers) which vaguely resemble the
Edinburgh Prologs. However, Dec-10 Prolog, like Algol 60, is an
improvement on most of its successors, and it is often pointlessly
difficult to make a correct and reasonable Dec-10 Prolog program
run under one of its imitators. The trouble is that people
implementing new Prolog systems have mistaken "Programming in
Prolog" (which is a textbook describing a "core" Prolog, and
not fully specifying much of that) for a specification of Prolog
in full.
Dec-10 Prolog is about to lose its supremacy. Already we have
PopLog, which makes up with its extensible screen editor and
its host language Pop-11 what it lacks in speed. Bill Clocksin's
and Lawrence Byrd's Prolog-X system, which has been running for
some time, has recently been rewritten in C, and is now the
fastest portable Prolog in the UK.
Bill Clocksin is adding the built in predicates it currently
lacks, and it should not be long before this is the Prolog of
choice on the UK AI machines.
Dave Bowen's NIP system (whose architecture
is virtually identical to that of Prolog-X) is running, and with
any luck should be usable within a few months. With assembly
code support it promises to be a factor of 2 faster than PopLog.
Most of all, Quintus Prolog, which is more efficient than
Dec-10 Prolog, is due to be released this year.
Then of course we have such systems as MU Prolog, which follow
Edinburgh syntax to some extent, but which come even closer to
logic. These systems necessarily redefine the semantics of
conjunction, but there is nothing to be gained by their having
different names or semantics for arithmetic operations.
Since a version of the Dec-10 Prolog parser is readily available
written in Dec-10 Prolog, any syntactic differences between Prolog
versions can be trivially overcome. What we need is a specification
of the semantics of the built in predicates.
A possible objection to such standardisation is that implementors
often think that they have a better idea of what some operation
should do, and don't want a standard telling them not to do their
thing. This objection is easily countered: all that a standard
demands is that operations called so-and-so shall exist doing
such-and-such. A standard does not say that these are the only
operations for doing such things. For example, Dec-10 Prolog has
a predicate read/1 which reads a term in Dec-10 syntax from the
current input stream, and which yields 'end_of_file' when the end
of the current input stream is reached. Some people would prefer
an input predicate which fails when the end of the input stream
is reached. Standard conformance doesn't stop them having such
a predicate, it just stops them calling it read/1. As another
example, the fact that Dec-10 Prolog's name/2 predicate works
for both atoms and integers has the unfortunate consequence that
atom(A) & name(A, S) & name(X, S) -> X = A
is not always true. And an implementor whose host language
supports strings particularly well might prefer it if the
"string" argument of name/2 were a string and not a sequence of
character codes. But neither of these reasons for dissatisfaction
with name/2 is licence to change it. Instead, a new standard
should define more satisfactory operations as well (as this one does).
This brings me to another point about this proposed standard.
It includes some things that are not in Dec-10
Prolog (at least, not built in, though they are available in the
library). It is clear that Dec-10 Prolog is not the last word.
It would be a good thing if we could agree on the improvements
to it. The additions I propose here are easily added to any
Prolog. Indeed, if I had the time it would be easy to add them
to Dec-10 Prolog, and I have added some of them to C Prolog.
Prolog versions of some of these new operations are available;
they require the current Dec-10 Prolog facilities.
The purpose of this document is to describe a set of evaluable
predicates which will facilitate the transport of programs
between Prolog implementations resembling Dec-10 Prolog.
Dec-10 Prolog itself does not conform to this standard, but
most Dec-10 Prolog programs will be happy with it.
This document is not yet complete, nor does it attain the rigour
needed by a true standard. I have decided to publish it as it
stands, because (a) I shall have no opportunity to do any further
work on it this year, (b) even in its present state it may be
useful, there being nothing remotely like it, and (c) while I
have put a good deal of thought into most of it, the Prolog
community may not agree with my principles, and we might as well
find that out before I waste any more effort on this project.
Error Reporting
This document does not describe how errors are signalled,
nor how they are handled. That should be standardised as well,
of course. What it does described is what errors can be
or must be reported.
An error is ignorable if it is possible to tell the Prolog
system to ignore it and do something sensible.
An error is continuable if after it has been reported to
the user it makes sense to proceed.
Simple Failure
The first kind of error is when a question is perfectly well
formed but the answer is "no". An example of this would be
"3 < 1". An implementation must not signal or report
an error when this happens. The specifications below use
fail to indicate this.
Undefined Predicates
The second kind of error is when the program calls a predicate
and there are no clauses for that predicate. Logically, this is
perfectly well defined. The goal should fail, and that is all
there is to it.
However, calling a predicate which has no clauses usually results
from a typing mistake in the source code. At Edinburgh this is not
much of a problem because of the cross-referencers, the syntax
checker in the editor, and the WLIST program. It is quite easy for
us to find most mis-spelled predicates before running the code.
Other Prologs are not so well endowed, and finding an undefined
predicate in a Prolog which is standard conforming but lacks the
Edinburgh environment or even a trace facility is amazingly painful.
It is therefore required of a standard Prolog that there be at
least two modes of operation. In one, calls to undefined predicates
simply fail. In the other, calls to undefined predicates produce an
error message, and this mode must be the default. The exact form
of the error message does not matter, but something like
! Undefined predicate: pljs/3
! pljs(3, 17, _273)
is recommended. It is redundant, but when the culprit call is very
large it may be difficult for the programmer to count the arguments.
As a general matter, it is recommended that all error messages have
the form
! <error phrase>: <detail>
! <the culprit goal>
and that some form of dialogue then be entered. In the case of undefined
predicates, the user must have the option of failing this call,
advising the Prolog system to fail all calls to this predicate in
future, or loading a file. There are many other options that could
usefully be provided.
To tell the system what mode to use, some command such as unknown(Old,New)
or flag(unknown,Old,New) should be used, where Old is unified with the Old
mode and the mode is then set to New. The form of the call is not (yet)
part of the standard, but that the key word involved is 'unknown' is
part of the standard, so that the programmer knows where to look in the
index. As the mode where undefined predicates are reported is the default,
the name of that mode is not part of the standard. (And there may be more
than one such mode.) However, the name of the mode where undefined predicates
just fail is defined to be 'fail'.
There is an obvious difference between calling a predicate which has never
had any clauses and calling a predicate which has had clauses but which
happens not to have any just at the moment thanks to the use of 'retract'.
The former is very likely to be a mistake, the latter is likely to be
"normal" data base hacking. It is straightforward for an implementation
to distinguish between these two cases if this problem is considered in the
initial design. It is therefore defined that each predicate starts life as
"undefined", that any sort of 'assert' changes its status to "defined" and
that nothing else does, and that 'abolish' changes its status back to
"undefined" and that nothing else does.
There is a built in predicate in Dec-10 Prolog and C-Prolog:
current_predicate(Name, Goal)
(see section 4.6 of the Dec-10 Prolog manual) which recognises user
predicates. Up till now it has meant that the predicate in question
has at least one clause. However, it would be more useful if it
meant that the predicate in question has at some time had clauses, for
then a Prolog interpreter written in Prolog could easily report undefined
predicate errors, the Prolog pretty-printer could distinguish between a
command to print a predicate which currently has no clauses and a
command to print a non-predicate, and so on.
Note that the requirement that only assertions can make a predicate
defined does not preclude a declaration
:- data F/N.
which indicates a predicate which will be dynamically built and changed,
as we can write
declare_data(F, N) :-
functor(Goal, F, N),
( current_predicate(F, Goal)
; assert(Goal), retract(Goal)
), !.
This works with either definition of current_predicate.
Type Failure
The third kind of error is when a predicate is given
arguments of the wrong type. An example of this would be
"X is fred".
If you regard a built in predicate as an
implicit form of a possibly infinite table, it is clear
that logically the response to such questions should
be simple failure. If for example, we ask "?- plus(X,a,Y)"
it is clear that there are no substitutions for X or Y which
could ever make this true, and the appropriate response is
simply to fail.
However, for debugging it may be more
helpful to forget about Prolog's "totally defined semantics"
and to produce an error report. The specifications below
use type_fail(ArgNo,ExpectedType) to indicate this.
The exact form of the error message is not specified, as a
Prolog program should have no way of getting its hands on
such a message, but it might look like
! Type Error: argument 2 of plus/3 is not an integer:
! plus(_273, a, _275)
An implementation should be able to provide both forms of
behaviour at the programmer's option. This could be a global
flag set by e.g. flag(type_fail,_,fail) or flag(type_fail,_,error).
Or if there is some way of providing handlers for errors, it could
be by letting the programmer supply a handler for type errors
which just fails. There should be a distinction between
continuable errors, where there is an obvious way of proceeding,
and non-continuable errors, where either success or failure would
yield the wrong answers in some model. Type failures are
continuable.
Overflow
The fourth kind of error is arithmetic overflow. Ideally, a
Prolog implementation should support arbitrary precision
integers (bignums) and even rational numbers (ratios).
Furthermore, the distinction between various sizes of integer
should be completely invisible to a Prolog program.
(Thus there should be no smallp/fixp/bigp predicates in Prolog,
only integer/1.) If this is done, arithmetic overflow cannot
happen in integer arithmetic. (Though storage overflow
can.) The only Prolog implementations I know of which support
bignums are those embedded in a Lisp system which already had
them. Dec-10 Prolog, PopLog, Prolog-X, NIP, and David
Warren's "New Engine" all support N-bit 2s-complement arithmetic
only. Some of them can't even detect arithmetic overflow.
In an N-bit 2s-complement system without overflow detection,
it is possible for X and Y to be positive but for X+Y to be
negative. This has in fact been something of a nuisance in
Dec-10 Prolog (where N=18 exception during the calculation of
a single expression where N=36), and is the reason why I wrote
the "LONG" package. (This package is in the public domain.
It implements bignums and ratios in Prolog. A host language
implementation is bound to be more efficient, but LONG is at
least a place to start.)
Any Prolog implementation should support at least 16-bit
integers. C Prolog's 29 bits has proven adequate for most
purposes so far, and the same is true of Prolog-X's and
NIP's 24 bits. However, when bignums (indistinguishable
from small integers) are not available, integer overflow
should be signalled as a fault. (Not as an error, the
program has a perfectly good meaning, but as a fault,
the implementation isn't good enough to run the program.)
The specifications below do not indicate overflow, because
the values to be calculated are perfectly well defined.
If an implementation cannot represent these values, it
should signal overflow.
An implementation which lets the programmer supply handlers
for exceptions might use this as a hook for bignums, e.g.
the programmer might be able to say
handle overflow in plus(X,Y,Z) where nonvar(X)
and nonvar(Y) by calling eval(Z is X+Y).
But getting this right would require similar hooks into
type failures as well, and the whole thing would end up as
an incredible mess. So overflows should not be continuable.
(It would be wrong to fail an overflowed goal, because there
is an answer. It would be equally wrong to succeed,
because any number the implementation can represent would be
wrong.)
Given Knuth volume 2, the UNIX "mint" package, and the LONG
package written in reasonably portable Prolog, there is very
little difficulty in implementing bignums. Bignums have
finally found their way into a Lisp "standard" (Common Lisp);
it is time they found their way into Prolog.
Even so, we are still left with division by zero.
Suppose we ask "X is 7/0". That is, find me an X such that
0*X=7. Clearly, there is no such X, and by the reasoning
used on type failures this error should be continuable and
ignorable, but should also be reported if the programmer
prefers. I like the "totally defined semantics" of Prolog,
and would prefer on aesthetic grounds that division by 0
should simply fail. (See the next subsection for times(0,X,0).)
When using Prolog as a theorem prover, this is certainly the
behaviour one wants. But the default should be to report an
error. The code below uses zero_divide to indicate such
faults.
Arithmetic overflow has another aspect: floating point.
In an imperative or functional programming language, floating
point behaves itself reasonably well, though it is regrettable
that it satisfies so few of the field axioms.
The IEEE floating point standard doesn't just define a family of
storage formats.
More importantly, it defines the arithmetic operations (and square
root) so that IEEE
floating point arithmetic obeys more useful axioms than most.
Any Prolog hardware which offers floating point should certainly
adopt the IEEE standard. The IEEE standard also specifies the
exceptions to be raised.
Floating point exceptions include
domain errors, such as sqrt(-1). These should be handled in the same
way as integer division by 0.
overflows. These should be handled like integer overflows, that is,
they must be detected, and must not be continuable.
underflow. It is common practice to replace an underflowed result
by 0. There are two problems with this. The first is the fact that
the ordered field laws
X > 0 & Y > 0 => X*Y > 0
X < 0 & Y > 0 => X*Y < 0
X*Y = 0 => X = 0 or Y = 0
and so on can be falsified. In a language allegedly based on logic
this would be a pity. The second is that underflow represents a
complete "loss of precision" and indicates a poor formulation of a
calculation problem. As there is a vast literature on numerical
analysis, and a good deal is known about coping with the defects of
floating point, it would be a great pity if this indication of a
badly expressed algorithm were not brought to the programmer's notice.
There may of course be other floating point exceptions, such as
trying to perform arithmetic on an IEEE NaN value.
With the exception of domain errors (like division by 0), these
represent the failure of the implementation to handle a well defined
value, and so should be presented as faults rather than errors.
No faults are continuable.
Instantiation Faults
The fifth kind of error is when a question has too many variables in it.
Now this logically speaking is no error at all. The question "plus(X,X,Y)"
has a perfectly good set of solutions {(X,Y)|X is an integer and Y=2*X}.
As the Herbrand Universe of a set of clauses is countable, it is evident
that any Prolog implementation could in principle backtrack over all
possible solutions, even for "is" (except for floating point). So if an
implementation decides not to do so, its failure to compute solutions
for such questions is an implementation fault and not a logical
error.
There are of course excellent reasons for declining to do all this
backtracking. The implementation cost is very high (especially for
backtracking over all possible ground arithmetic expressions in 'is'),
and the benefit is negligible. If a programmer genuinely wants code
that backtracks over all the integers, it is easy enough to write
a generator. For example:
gen_nat(N) :- % gen-erate nat-ural
nonvar(N), % if we aren't to generate it
!, % demand that it is an integer
integer(N), N >= 0. % and non-negative.
gen_nat(N) :- % otherwise, generate an
gen_nat(0, N). % integer >= 0
gen_nat(L, L).
gen_nat(L, N) :- % generate natural > L
succ(L, M), % M = L+1
gen_nat(M, N). % generate natural >= M
gen_int(I) :- % gen-erate int-eger
nonvar(I), % if we aren't to generate it
!, % demand that it is an integer.
integer(I).
gen_int(0). % generate 0
gen_int(I) :- % generate +/- N for N > 0
gen_nat(1, N),
( I = N
; plus(I, N, 0) % I+N = 0
).
It is also better style to use an explicit generator, so that the
human reader can better grasp what the algorithm is up to.
The basic problem is that when there are infinitely many solutions
to a single goal, blind chronological backtracking such as Prologs
generally use will stick at that goal, when the cause of the failure
may lie elsewhere entirely. Reporting an instantiation fault instead
is a good way to ask the programmer for more help in the control
component of the program.
A system such as MU Prolog should not report instantiation faults,
but should delay such goals until more variables are instantiated.
When such a system deadlocks (because all the goals it might work
on are delayed), that is its equivalent of reporting an instantiation
fault.
If there is enough information in the goal to determine that it cannot
succeed, it should fail. So an instantiation fault should generally
correspond to a question with at least one answer (which the Prolog
doesn't think it can afford to find). If the system supports user
defined error handlers, and if an error handler can generate solutions,
then it might be an idea to allow the user's error handler to propose
solutions and use the built in code to check them. But when the
user's handler/generator fails, it will have failed after a finite
number of solutions, and most instantiation faults correspond to
questions with an infinite number of solutions, so the error should
be re-signalled (by-passing the user's handlers). Instantiation
faults are not failure-continuable, though they are success-continuable.
Instantiation faults are indicated in the specifications below
by instantiation_fault.
Unimplemented data types
This proposed standard describes some data types (rational and complex
numbers) not currently in any Edinburgh-family Prolog, and one (strings)
which is only in the more recent generation of compilers (PopLog, NIP,
Prolog-X). The point of defining some of the predicates which work with
these values is to make sure that no-one uses the names for anything else.
A particularly good example of this is complex numbers. If there is a
reason for having floating point numbers in Prolog, exactly the same
reason can be applied to argue for complex numbers. "The host language
has floats" is not a reason, it is an excuse. If the host language has
floats, Prolog can have complex numbers. The second volume of R.P.van de
Riet's book on formula manipulation in Algol could be consulted to find
out how to implement some of the basic functions with acceptable accuracy,
and there are many other sources which could be consulted. Indeed, a
FORTRAN or Common-Lisp based Prolog would get complex numbers for free.
Complex numbers being a well-defined easily implemented and useful
extension, we must not use the name "complex" when we mean what
standard texts call "compound" terms.
If a Prolog implementation does not include some data type, what is it
to do if a predicate pertaining to such a type is called? In the case
of a recognition predicate, this is already defined. No argument can
be of the required type, so the predicate will fail. But if the
predicate is supposed to construct an object of the desired type, say
if we call number_chars(X,"1.2") in a system which lacks floating point,
what then?
Recall the approach to arithmetic overflow. The case is the same:
there is a well defined result which the machine cannot represent.
Arithmetic overflow and unimplemented types are both instances of
representation faults. (That is, the system admits its inadequacy.
Representation faults are not programmer errors.) By the same
logic, no form of representation fault can be continued, even by
failing.
In some machine ranges the less expensive members of the family
implement floating point arithmetic in software, and the order code
is so arranged that an appropriate interpreter will be called when
a program tries to use an unimplemented instruction. By analogy,
an obvious thing to do would be to allow the user to write handlers
for unimplemented type faults and to let ia implement ia's own
complex arithmetic. This we must not do.
The first reason for this is that there is simply no way that the
user can possibly do it. The user would have to represent the
missing data type using the data types are are available. But
that means that ia cannot construct an object which will be recognised
as belonging to the data type ia is trying to supply. Worse than that,
it must be recognised as some other data type. We have found
this to be a problem with the current implementation of rational
arithmetic in Prolog; no system code recognises that rational
numbers are supposed to be atoms, which can lead not only to inefficiency
but to error.
I have elsewhere proposed a mechanism called "seals" for allowing the Prolog
programmer to define ia's own new atomic types. But that still isn't
good enough, because the standard operations are required either to have
their standard effect or to report an appropriate fault. They may not
produce any error message not explicitly permitted herein, other than
storage overflow. (Which is hereby explicitly permitted anywhere.)
We cannot guarantee this of user code; indeed we can guarantee nothing
about user code. Now a program that calls user code is asking for
trouble, and Prolog can't prevent it getting trouble. But a program
which calls standard predicates is entitled to get the standard effect
and no other.
A third reason that it is undesirable is that it is a waste of time.
If anyone wants a new data type added to the standard, either at
least one implementation should be well known in the computer science
community (which applies to strings, rational numbers, and complex
numbers), or else ia should publish the implementation of ia's
data type in some generally known language (this might apply to
updatable arrays, Shimon Cohen has met this requirement). We need
a Prolog standards group whose function is to distribute references
to this and other public implementation information to would-be implementors.
The principle here is that if you want other people to support your
data structure, you have to tell them how.
A Design Principle
Some built in predicates are control structures. ','/2, ';'/2,
'->'/2, '!'/0, once/1, '\+'/1, forall/2.
Some built in predicates have to do with I/O.
Some built in predicates have to do with the data base.
But many of the rest are implicit representations of perfectly
good logical relations. An example is length/2, which could be
defined in Prolog as
length([], 0).
length([_|Tail], N) :-
length(Tail, M),
succ(M, N).
or, to avoid runaway recursion, as
length(List, Length) :-
nonvar(Length),
!,
length_gen(Length, List).
length(List, Length) :-
length_chk(List, Length).
length_chk([], 0).
length_chk([_|Tail], N) :-
length_chk(Tail, M),
succ(M, N).
length_gen(0, []).
length_gen(N, [_|Tail]) :-
succ(M, N),
length_gen(M, Tail).
This is a very useful predicate, and to get the utmost speed, it might
be coded in the host language, or even in micro-code. We would like
such an implementation to have the best affordable approximation to
its logical semantics.
The arithmetic predicates are a case where the relation represented
cannot be described in Prolog. If numbers are represented using
successor notation, then Prolog can indeed express all the arithmetic
operations. But otherwise, even for the relation N=M+1, there are
infinitely many values for M, and to describe this relation in Prolog
we would need an infinite number of clauses.
Still, the intended relation is perfectly clear.
The design principle for such built in operations is this:
if there is sufficient information in a question to determine
that it has a unique answer, the implementation should yield
that answer.
if there is sufficient information in a question to determine
that it has NO answer, the implementation should fail.
If neither of these cases holds, but the implementation can tell
that there is a finite number of solutions, it may enumerate them.
But it need not.
If the implementation cannot tell that there is
a finite number of solutions, or if it doesn't check, it should
either adopt some other control strategy (such as delaying) or
it should report an instantiation fault. It must NOT backtrack
over an infinite set of solutions.
Unfortunately, Dec-10 Prolog has not followed this design principle.
For example, the length/2 predicate in Dec-10 Prolog only works one
way: length(X,4) reports an instantiation fault instead of computing
the unique solution X=[_,_,_,_]. However, there is an important
consequence of adopting this principle: it is more complete than
Dec-10 Prolog, so programs which run in Dec-10 Prolog will run in an
implementation which follows this principle. Many "Prologs" are
less complete than Dec-10 Prolog. Another benefit is that this
principle determines uniquely what each predicate should do, the
implementor and standardiser to not have to make and justify choices
for each of dozens of predicates.
I have tried to make each C-coded predicate in C Prolog follow this
principle. length/2 is coded in Prolog, but in a non-structure-sharing
Prolog it is easy enough to code it in C so that it obeys this rule.
Note that determining that there is no solution may involve detecting
a type failure. For example, "succ(a, X)" has no solution, so the
principle says it must fail, and the discussion on errors says it
may report a type failure on the way.
Some Naming Conventions
These conventions are recommended in the interests of clarity.
They are followed in this standard.
Code which is not intended for use outside a particular group may
use any conventions you please.
When defining a new data type, pick some reasonably short word to
name it. I'll symbolise it here by D. When used in predicate
and constructor names the word should be entirely in lower case,
and if there is other text in the predicate or function name it
should be set off by underscores. When used in variable names
it should be capitalised, and other text may be set off by
underscores but need not be. For example, having invented a
new type "basket", we might have predicates "is_basket",
"basket_to_bag", "fill_basket", "price_basket", a constructor
"basket" and "empty_basket". We might use variable names
Basket, Basket1, FullBasket, New_Basket, and so on. In published
code abbreviations such as "bkt_size" or "BstLen" should be
avoided. (They might be bucket size or beast length.) But there
is no reason not to use an abbreviation of an English word as the
type name; the objection is to further abbreviations of that. So
bkt_size is ok as a predicate name if the data type is called bkt.
If the code contains explicit type declarations for some predicates,
the variable name convention need not apply to those predicates.
A predicate to recognise objects of type D should either be
called "D", or "is_D". The is_D form is preferred. Unfortunately
it is too late to do anything about the standard term type names.
A recognition predicate should only enumerate objects of the given
type if there are finitely many of them. It is generally a bad idea
for "basic" predicate to be capable of unbounded backtracking. If the
Prolog implementation supports delaying, recognition predicates should
delay when given variables.
If a data type represents some sort of collection, predicates
D_size, D_member, D_memberchk,
check_D, part_D and map_D may be defined, where
D_size(D, Size) :-
bind Size to the number of elements in D, or
if D is a mapping, to the number of elements in
the domain of D.
D_member(Element, D) :-
recognise or enumerate elements of the collection.
D_memberchk(Element, D) :-
recognise elements of the collection, do not
enumerate. May delay if Element is unbound,
but need not.
check_D(Pred, D) :-
check that Pred(X) is true for each element X of D.
part_D(Pred, D, TrueD, FailD) :-
distribute the elements of D into two objects
TrueD/FailD of the same type according as
Pred(Elem) succeeds/fails.
map_D(Pred, Input_D, Output_D) :-
construct a new object of type D,
where corresponding elements I and O of
Input_D and Output_D are related by Pred(I,O).
These conventions need thinking about. There could also be some others.
Given D_member, we can define bounded existential quantification as
some_D(Pred, D) :-
D_member(Element, D),
Pred(Element),
!.
Note the presence of the cut. Since there is no "official" way to
find out what Element was involved, the only effect of the cut should
be to eliminate redundant proofs of the same fact. However, if Pred
(which can be a compound term) contains variables, or if it has side
effects, or if the collection D contains variables, the program might
be able to detect the difference. If it matters, the programmer can
easily write the calls to D_member and Pred explicitly, and ia then
has the successful element as well. Similarly, we can define bounded
universal quantification as
each_D(Pred, D) :-
forall(D_member(Element,D), Pred(Element)).
Again, this will not bind any free variables in Pred or D, and not only
will such bindings not be propagated to the caller, they are not kept
from one Element to the next. In contrast, the check_D predicate does
bind free variables. For example, suppose we have a predicate
divisible_by(X,Y) which means that X is divisible by the non-unit Y. Then
check_list(divisible_by(Y), [72,6,30,1296])
might instantiate Y to 6, 3, and 2, while
each_list(divisible_by(Y), [5,3,7])
might succeed, because each of the numbers has a non-unit factor, leaving
Y uninstantiated.
Some data structures represent finite functions in
various ways. Note that map_D predicates only act on the results
of such functions, i.e. if I represents the function f and Pred encodes the
function g, and map_D(Pred,I,O) succeeds, then O represents the function
g o f.
If D and R are two data types, and there is an obvious mapping from
D to R (such as from a list to a set), the conversion predicate
should be called D_to_R. In particular, it is recommended that
all new data types should have D_to_list and list_to_D conversions
whenever this makes sense. If the conversion is a partial bijection,
so that given an object of type D there is always a unique object of
type R to which it is related, and given an object of type R there is
at most one object of type D to which it is related (but there need not
be any such object), the "to_" may be omitted and the name D_R
used. Note that this is still directional. The user is entitled
to assume that
for all D_obj there is a unique R_obj such that
D_R(D_obj, R_obj)
but not to assume that
* for all R_obj there is a unique D_obj such that
* D_R(D_obj, R_obj)
If this second rule is true, there should also be a predicate R_D.
Why have I fussed about with all these conventions? First, there is a
whole pile of library predicates whose names need rationalising.
Second, this proposed standard has to introduce some new predicates
to go with the new types (particularly strings), and I don't want to
have to defend each name individually, and I particularly don't
want further extensions to have individually defended names. If
these conventions are adopted, there will be little or no choice
for the name of a predicate once we have decided what it does.
Think of the benefit to a Prolog user trying to find out if there
is a predicate to frobboz zeelunks. Since these conventions do not
allow the abbreviation of a type name, the user has only to ask
?- cp "*zeelunk*".
(known as 'apropos' in many Lisps) to find out. We should
further rule that operation names (make, test, check [hence
"chk" in memberchk, as a different operation is meant],
map, size, portray, ...) may not be abbreviated as well.
{NB: a language like CLU achieves this effect by having
compound names: <type>$<function> e.g. integer$assign.
We might want to have multi-type operations, such as
T1_T2_sameSize.)
Determining the Type of A Term
Prolog has a number of predicates for classifying terms. These
predicates are called "meta-logical", because they view terms as
data structures, and not as logic would view them. In particular,
they discriminate between variables and other terms, a distinction
which is completely meaningless from a logical point of view, and
can only be understood with reference to the current "state" of a
procedural computation.
This distinction between variables and other terms is important for
simple depth-first strategies such as Dec-10 Prolog, Prolog-X, NIP,
and PopLog use. However, it is very little use in a more complete
interpreter such as MU Prolog. Even in a more complete interpreter,
it is convenient to have say a predicate for accepting any atom.
This standard proposes a single new type testing predicate which
such a system can use, which has a logical reading, unlike the
other predicates of this section which have only a procedural reading.
The term "types" can be arranged in a tree, where type A appearing as
an ancestor of type B means that all terms of type B are also of type
A, and types are otherwise incompatible. Here is the tree:
true
|
+---------+---------+
| |
nonvar var
|
+---------+--------------------------+
| |
compound atomic=constant
|
+----------------+---------+---------+
| | |
number string atom=symbol
|
+-------+-------+
| |
rational complex
| |
integer float
Every standard Prolog implementation is required to support the
var, nonvar, compound, atom, and integer types, and to supply all
these type testing predicates. A recognition predicate for an
unsupported type must always simply fail, but the compiler or
other tools may issue a warning when the program is compiled or
otherwise loaded. There may be other atomic types, such as the
data base references of C Prolog and Prolog-X. There may be other
numeric types. There is no standard type for a rational number
which is not an integer, as this may be defined in Prolog by
proper_rational(Q) :-
rational(Q),
\+ integer(Q).
and similarly there is no standard type for a complex number
with a non-zero imaginary part, as this may be defined by
proper_complex(F) :-
complex(F),
\+ float(F).
Var, nonvar
var(X) succeeds if X is a variable, and fails if it is not.
nonvar(X) fails if X is a variable, and succeeds if it is not.
true(X), which always succeeds, and var(X) are the only
type testing predicates which accept variables. All the other
type testing predicates fail for variables. Note that this
means that
...
\+ number(X),
...
number(X)
can actually succeed, provided X is a variable at the time of the
first test and is suitably instantiated at the time of the second.
This sort of behaviour is what makes these predicates "metalogical"
instead of being equivalent to infinite tables.
simple, compound, atomic
A compound term is a term which has arguments. That is
compound(Term) -> exists N. exists A. arg(N, Term, A).
No other terms have arguments. In particular, arg may not
be used to select characters of a string or atom, not may it be
used to extract the real and/or imaginary parts of a complex number.
A standard Prolog implementation must support compound terms
with up to 200 arguments, and may support any greater arity.
See the section on term construction.
A standard Prolog implementation may not provide a "tuple"
data type. Compound terms suffice for this purpose. If the
implementation is capable of supporting tuples of some length,
it is capable of supporting compound terms of that arity (perhaps
less 1).
A simple term is anything other than a compound term. That
is, simple is exactly defined by
simple(Term) :-
\+ compound(Term).
Dec-10 Prolog uses the name "atomic" to mean a non-variable term
other than a compound term. In a mixed language system this can
be confusing. Logicians normally use the name "constant" for
such terms. For compatibility, a standard Prolog implementation must
supply atomic/1 defined as if by
atomic(Term) :-
nonvar(Term),
\+ compound(Term).
but it is recommended that constant/1 be supplied as well, with
identical meaning.
It is unfortunate that a number of writers have invented new names
for compound terms. 'complex' we've already dealt with as being
needed for complex numbers. Another popular not-invented-here name
for compound terms has been "structures". This is a particularly
unhelpful name, as all kinds of terms are data structures, and
when we are talking about implementations of trees, graphs, and so
on in Prolog we may want to distinguish between a compound term and
the data structure it represents. It is almost impossible to make
the distinction between these two concepts clear if you call them
both structures! And of course there are control structures also
represented by compound terms. Also, there may be data structures
which Prolog regards as atomic because pattern matching cannot look
inside them, but which the host language regards as data structures,
such as strings, arrays, hash tables, streams, ... Denying that a
Lisp array which Prolog has its hands on is a structure helps no-one.
There is not the least need to steal this
useful and harmless word, as "compound term" conveys precisely the
notion of a term which is composed from other terms. Therefore,
in the interests of having a teachable language,
a standard Prolog may not use the predicate structure(X) to
recognise compound terms, and term_type(X, T) may not bind T=structure.
There is nothing to stop a sufficiently foolish programmer putting
structure(X):-compound(X).
in his own programs. What is forbidden is having this in a standard
Prolog system or in the public Prolog library.
Atoms and Strings
An atom is a constant which can serve as the function symbol of
a compound term. This name can be confusing in a mixed language
system, where an "atom" might be anything other than a list, or
in a modern Lisp would be any constant. The name "symbol" is
recommended as an alternative, but "atom" must be provided.
An atom is uniquely identified by its name, which is a sequence
of characters. In a Prolog system which has modules of some sort,
this should still be true; the module system should pertain to
predicates, not to terms. This means that if a Prolog implementation
is embedded in a language (such as ZetaLisp) which has a module
system which pertains to atoms, the name of an atom is the fully
qualified name. Thus 'user:outer-module:middle:inner:final' might be
the name of an atom which could in a suitable context be input as
'final'. An implementation must allow an atom to have up to 255
characters in its name. There is no standard upper bound.
A string is also a sequence of characters. However, it is possible
for two strings to have the same characters and yet be distinguishable
in some fashion, by the host language. A Prolog program must
not be able to distinguish between strings containing the same
characters without calling a host-language routine.
An implementation is not obliged to implement strings at all.
If it does, it must allow a string to have up to 255 characters.
There is no standard upper bound.
Strings may be mutable, that is, it may be possible for the
sequence of characters associated with a string to change. This
would be the case with strings implemented as Pop11 byte vectors,
as Common Lisp strings or byte vectors, or as InterLisp strings.
This fact is the reason why a string cannot be used as the
function symbol of a compound term. Atom names must not be
mutable; an implementation may use the same data structures for
atom names as for strings, but should ensure that routines which
change strings never get their hands on atom names. In extreme
cases this may entail writing a new symbol table just for Prolog.
The availability or otherwise of strings as a distinguishable
data structure is one question. Whether "abc" is a list of
ASCII character codes or something else in another. An implementation
may provide a string data structure and a convenient notation for
lists of character codes both.
The characters in an atom name or a string are in ASCII. An
implementation is not required to support character codes
outside the range 0..127. However, any byte size and any
integer range that includes 0..127 may be used. If it is necessary
to interface Prolog to an environment using the EBCDIC character code,
Prolog must use the 8-bit ASCII character set internally, and translation
in each direction must be done according to the following tables
(taken from document X3J11/83-045):
8-bit ASCII to EBCDIC EBCDIC to 8-bit ASCII
00 00 40 7c 80 20 c0 76 00 00 40 20 80 c3 c0 7b
01 01 41 c1 81 21 c1 77 01 01 41 a0 81 61 c1 41
02 02 42 c2 82 22 c2 78 02 02 42 a1 82 62 c2 42
03 03 43 c3 83 23 c3 80 03 03 43 a2 83 63 c3 43
04 37 44 c4 84 24 c4 8a 04 9c 44 a3 84 64 c4 44
05 2d 45 c5 85 15 c5 8b 05 09 45 a4 85 65 c5 45
06 2e 46 c6 86 06 c6 8c 06 86 46 a5 86 66 c6 46
07 2f 47 c7 87 17 c7 8d 07 7f 47 a6 87 67 c7 47
08 16 48 c8 88 28 c8 8e 08 97 48 a7 88 68 c8 48
09 05 49 c9 89 29 c9 8f 09 8d 49 a8 89 69 c9 49
0a 25 4a d1 8a 2a ca 90 0a 8e 4a d5 8a c4 ca e8
0b 0b 4b d2 8b 2b cb 6a 0b 0b 4b 2e 8b c5 cb e9
0c 0c 4c d3 8c 2c cc 9b 0c 0c 4c 3c 8c c6 cc ea
0d 0d 4d d4 8d 09 cd 9c 0d 0d 4d 28 8d c7 cd eb
0e 0e 4e d5 8e 0a ce 9d 0e 0e 4e 2b 8e c8 ce ec
0f 0f 4f d6 8f 1b cf 9e 0f 0f 4f 7c 8f c9 cf ed
8-bit ASCII to EBCDIC EBCDIC to 8-bit ASCII
10 10 50 d7 90 30 d0 9f 10 10 50 26 90 ca d0 7d
11 11 51 d8 91 31 d1 a0 11 11 51 a9 91 6a d1 4a
12 12 52 d9 92 1a d2 aa 12 12 52 aa 92 6b d2 4b
13 13 53 e2 93 33 d3 ab 13 13 53 ab 93 6c d3 4c
14 3c 54 e3 94 34 d4 ac 14 9d 54 ac 94 6d d4 4d
15 3d 55 e4 95 35 d5 4a 15 85 55 ad 95 6e d5 4e
16 32 56 e5 96 36 d6 ae 16 08 56 ae 96 6f d6 4f
17 26 57 e6 97 08 d7 af 17 87 57 af 97 70 d7 50
18 18 58 e7 98 38 d8 b0 18 18 58 b0 98 71 d8 51
19 19 59 e8 99 39 d9 b1 19 19 59 b1 99 72 d9 52
1a 3f 5a e9 9a 3a da b2 1a 92 5a 21 9a 5e da ee
1b 27 5b ad 9b 3b db b3 1b 8f 5b 24 9b cc db ef
1c 1c 5c e0 9c 04 dc b4 1c 1c 5c 2a 9c cd dc f0
1d 1d 5d bd 9d 14 dd b5 1d 1d 5d 29 9d ce dd f1
1e 1e 5e 9a 9e 3e de b6 1e 1e 5e 3b 9e cf de f2
1f 1f 5f 6d 9f e1 df b7 1f 1f 5f 7e 9f d0 df f3
8-bit ASCII to EBCDIC EBCDIC to 8-bit ASCII
20 40 60 79 a0 41 e0 b8 20 80 60 2d a0 d1 e0 5c
21 5a 61 81 a1 42 e1 b9 21 81 61 2f a1 e5 e1 9f
22 7f 62 82 a2 43 e2 ba 22 82 62 b2 a2 73 e2 53
23 7b 63 83 a3 44 e3 bb 23 83 63 b3 a3 74 e3 54
24 5b 64 84 a4 45 e4 bc 24 84 64 b4 a4 75 e4 55
25 6c 65 85 a5 46 e5 a1 25 0a 65 b5 a5 76 e5 56
26 50 66 86 a6 47 e6 be 26 17 66 b6 a6 77 e6 57
27 7d 67 87 a7 48 e7 bf 27 1b 67 b7 a7 78 e7 58
28 4d 68 88 a8 49 e8 ca 28 88 68 b8 a8 79 e8 59
29 5d 69 89 a9 51 e9 cb 29 89 69 b9 a9 7a e9 5a
2a 5c 6a 91 aa 52 ea cc 2a 8a 6a cb aa d2 ea f4
2b 4e 6b 92 ab 53 eb cd 2b 8b 6b 2c ab d3 eb f5
2c 6b 6c 93 ac 54 ec ce 2c 8c 6c 25 ac d4 ec f6
2d 60 6d 94 ad 55 ed cf 2d 05 6d 5f ad 5b ed f7
2e 4b 6e 95 ae 56 ee da 2e 06 6e 3e ae d6 ee f8
2f 61 6f 96 af 57 ef db 2f 07 6f 3f af d7 ef f9
8-bit ASCII to EBCDIC EBCDIC to 8-bit ASCII
30 f0 70 97 b0 58 f0 dc 30 90 70 ba b0 d8 f0 30
31 f1 71 98 b1 59 f1 dd 31 91 71 bb b1 d9 f1 31
32 f2 72 99 b2 62 f2 de 32 16 72 bc b2 da f2 32
33 f3 73 a2 b3 63 f3 df 33 93 73 bd b3 db f3 33
34 f4 74 a3 b4 64 f4 ea 34 94 74 be b4 dc f4 34
35 f5 75 a4 b5 65 f5 eb 35 95 75 bf b5 dd f5 35
36 f6 76 a5 b6 66 f6 ec 36 96 76 c0 b6 de f6 36
37 f7 77 a6 b7 67 f7 ed 37 04 77 c1 b7 df f7 37
38 f8 78 a7 b8 68 f8 ee 38 98 78 c2 b8 e0 f8 38
39 f9 79 a8 b9 69 f9 ef 39 99 79 60 b9 e1 f9 39
3a 7a 7a a9 ba 70 fa fa 3a 9a 7a 3a ba e2 fa fa
3b 5e 7b c0 bb 71 fb fb 3b 9b 7b 23 bb e3 fb fb
3c 4c 7c 4f bc 72 fc fc 3c 14 7c 40 bc e4 fc fc
3d 7e 7d d0 bd 73 fd fd 3d 15 7d 27 bd 5d fd fd
3e 6e 7e 5f be 74 fe fe 3e 9e 7e 3d be e6 fe fe
3f 6f 7f 07 bf 75 ff ff 3f 1a 7f 22 bf e7 ff ff
But what about other alphabets, such as Hebrew, Greek, Cyrillic, Kanji?
The answer is that there is nothing in this standard to prohibit their
use. The range of character codes must be at least 0..127; it may be
more. In particular, a 16-bit character set may be used. All that
this standard requires is that the codes 0..127 bear their ASCII
meanings. In order to cope with extended alphabets it may be necessary
to have both 8-bit and 16-bit strings; as atom names are immutable they
may be stored in the most economical format.
Numbers
A Prolog implementation must support at least 16-bit integers.
There is no requirement that any other form of number be handled.
It is strongly urged that arbitrary precision integers (bignums)
should be supported, it's easy enough in all conscience. Floating
point numbers should have at least the precision and range of the
host machine's native single precision floating point format if it
has one, or of IEEE single precision if it has not. Common Lisp
"double" is recommended.
A rational and a complex number may never unify or compare equal.
A rational number equal to some integer is that integer. There
is no way for a Prolog program to distinguish between 273 and 273/1,
nor for that matter between 273 and 546/2. Similarly, a complex
number equal to some float (that is, having a zero imaginary part)
is that float. There is no way for a Prolog program to
distinguish between 27.3 and 27.3I0.0.
Other types of Terms
Other types of terms may be added. However, the compound, atom,
var rational, and complex classes are closed. New types (such as Pop or
Lisp arrays, hash tables, flavours, pointers to compiled code,
data base references, functor pointers, you name it) must be
classified as constants, and may but need not have their own
recognition predicates. It is recommended that Prolog should
not in general be extended to cope with host language data
structures; as long as Prolog can receive such structures in arguments,
recognise that they are constants, and can pass them on in calls to
host language functions or subroutines, greater consistency and
portability is obtained by coding manipulations of such structures
in host language routines and calling them from Prolog. For
example, a Prolog cohabiting with Lisp should not have an "array"
predicate, but should be able to call
lisp(array(X))
Type Testing with Delays
Apart from the var/nonvar distinction, all the types described
above make perfectly good sense considered as infinite tables.
In a Prolog which uses a more flexible execution strategy, it
would be useful to have a way of type testing which will wait
if necessary until the term to be tested is instantiated.
The new predicate term_type/2 accomplishes this.
term_type(Term, Type) :-
var(Term),
wait_for_binding_of(Term),
% or report an instantiation fault
term_type1(Term, Type).
term_type(Term, Type) :-
nonvar(Term),
term_type1(Term, Type).
term_type1(Term, compound) :- compound(Term).
term_type1(Term, atom) :- atom(Term).
term_type1(Term, string) :- string(Term).
term_type1(Term, integer) :- integer(Term).
term_type1(Term, rational) :- rational(Term).
term_type1(Term, float) :- float(Term).
term_type1(Term, complex) :- complex(Term).
term_type1(Term, number) :- number(Term).
term_type1(Term, atomic) :- atomic(Term).
Except for the existence of the predicate term_type1, this is
to be taken literally. If term_type(X, T) is called, and X ends
up bound to 3, term_type is to enumerate the types integer,
rational, and atomic in that order. If the table is
extended with new types, more specific types are always to be
enumerated before more general types. Normally, T will be bound
in the call.
This predicate may be used in more conventional Prologs, where it
is to report an instantiation fault if its first argument is
a variable.
Getting at the Print Names of Constants
There are six predicates which can be used on atomic terms to construct
them given their print names or to obtain their print names given the
terms.
string_chars(String, ListOfChars)
if string(String) {
unify ListOfChars with [c1,...,cn] where String has n
characters and their codes are c1,....,cn.
}
if var(String) {
check that ListOfChars is a valid list of character codes
if it is too long, report a string overflow fault
unify String with a new string containing those codes
}
if nonvar(String) and \+string(String) {
type_fail(1, string)
}
To check that X is a valid list of character codes:
if var(X), report an instantiation fault.
if X = [], succeed.
if X = [H|T] {
if var(H), report an instantiation fault.
if \+ integer(H) or H < 0 or H > max_char, type_fail(2,chars).
check that T is a valid list of character codes.
}
otherwise type_fail(2,chars).
Note that this follows the naming convention: given any string there is
always exactly one list of character codes which corresponds to it, but
given a list of integers there may not be a corresponding string (the
list might be too long, or some of the integers might be outside the
range of character codes). Note also that the string is new. If
strings are mutable (and if they are not there really isn't the least
point in distinguishing them from atoms) this means that we are
guaranteed that changing this string will not affect any other string,
even if this one happens to have the same characters initially.
There ought to be some way of finding out the range of character codes.
max_char/1 seems like a good name, but we want something that fits in
with a characterisation of the floating point system. Copy Common Lisp?
atom_string(Atom, String)
if var(Atom) and var(String) report an instantiation fault.
if nonvar(Atom) and \+ atom(Atom) type_fail(1,atom).
if nonvar(String) and \+ string(String) type_fail(2,string).
if nonvar(Atom) {
unify String with a new string whose contents are
the character codes of the name of Atom
} else {
unify Atom with the atom whose name is String, creating
the atom if necessary.
}
This definition suggests that it is possible for an atom to have any
name at all that can be represented as a string. However, some Lisps
place different restrictions on strings and atom names. In particular,
many Lisps demand that any constant whose name looks like the name of
a number is that number, e.g. (FIXP (COMPRESS '(#\1))) is
likely to be true. This has not yet been an issue for Prolog, except
that in Dec-10 and C Prolog, the atom '1' which the reader can read
can be built in no other way. It would be particularly unfortunate
to take over Lisp semantics unchanged: there is no reason for a Prolog
programmer to suppose that '11Q' has anything to do with the integer 9.
If a Prolog implementation would return some object other than an atom
because of this or any other restriction, it must not do so, but
must report an implementation fault. That is, atom_string(X, "1")
may report a fault, but under no circumstances may it bind X to anything
other than an atom.
No form of case conversion is permitted. If the implementation can only
support monocase identifiers (since no standard predicate has anything
other than lower case letters and underscores in its name, and since
variable names do not need to be stored as atoms, so monocase atom names
could be used) it must react to upper case letters in a name by reporting
an implementation fault. Such an implementation can offer a case
converting version of atom_string/2 or atom_chars/2 or name/2, but it
must have some other name.
atom_chars(Atom, ListOfChars) :-
nonvar(Atom),
atom_string(Atom, String),
string_chars(String, ListOfChars).
atom_chars(Atom, ListOfChars) :-
var(Atom),
string_chars(String, ListOfChars),
atom_string(Atom, String).
Although atom_chars is defined in terms of atom_string, it is perfectly
permissible to make atom_chars the primitive and define atom_string in
terms of it, or even to omit atom_string entirely. There is a subtle
point of this definition: we only require the ListOfChars to be a ground
list when the Atom is unbound. When we know the Atom, we can use it to
fill in the variables in the ListOfChars. So
atom_chars(append, [97,X,X,101|R])
must succeed and bind X = 112, R = [110,100].
number_string(Number, String)
if var(Number) and var(String) report an instantiation fault.
if nonvar(Number) and \+ number(Number) type_fail(1,number).
if nonvar(String) and \+ string(String) type_fail(2,string).
if nonvar(String) {
if String cannot be parsed as a number, simply fail.
unify Number with the number represented by String
} else {
"print" Number and unify String with a new string of the result
}
number_chars(Number, ListOfChars) :-
var(Number),
string_chars(String, ListOfChars),
number_string(Number, String).
number_chars(Number, ListOfChars) :-
nonvar(Number),
number_string(Number, String),
string_chars(String, ListOfChars).
There are two subtle points here. The first is the same as with
atom_chars; number_chars(12,[X,50]) must succeed and bind X=49.
The second is what happens when the String or ListOfChars is a
valid string or character list, but the characters cannot be parsed
as a number? Calling number_string(X,"1.e") looks like some sort of
mistake. But when both arguments are bound there are two ways of
proceeding. We could take the number, "print" it into an internal
buffer, and compare the result with the given string. Or we could
convert the string to a number, and compare the result with the
given number. In fact it is the former method which is more general.
It lets us determine, for example, that number_string(1,"2/3") must
be false, without any need to implement general rational numbers.
It would be difficult to explain to a novice why number_string(1,"x")
should behave any differently from atom_string(fred,"x"). The
clinching argument in favour of doing nothing special when the string
or character list does not represent a number is that it would be
very difficult indeed to check the syntax of an non-ground list of
characters. For consistency then, "bad syntax" must fail in all cases.
There is a third subtle point which I'd better make clear in the code;
the "unification" of the String with the print name of the number (when
the number is instantiated) should ignore case. This is a tiresome
detail, and only matters if floats are implemented.
Here is the syntax of numbers:
<number> ::= <integer>
| <ratio>
| <float>
| <complex>
<integer> ::= <sign> <digits>
<sign> ::= <empty> | -
+
<digits> ::= (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9)
<ratio> ::= <integer> / <digits>
<float> ::= {as in Common Lisp}
<complex> ::= <float> I <float>
The first thing to note is that + is not a sign. It would make sense if
it was, but unfortunately
?- name(X, "+2"), atom(X).
succeeds in Dec-10 Prolog. Given that
?- X is +2.
succeeds in C Prolog, binding X to the integer 2, I feel that this is an
anomaly which should be eliminated.
The next thing to note is that because of the ruling that "bad syntax"
simply fails, number_string does not have to understand the syntax of
number types which are not implemented.
To be compatible with Common Lisp, the denominator in N/D should not be
0. In PRESS, however, we found it very useful to have +infinity (1/0),
-infinity/0), and undefined (0/0) values. Reproducing the behaviour of
the LONG package in a low-level implementation of bignums and ratios is
quite easy. On the other hand, an implementation which tries to share
the world with LISP could find it difficult to do this. So for the
moment this standard does not say whether X/0 must be permitted or not.
It should be the case that
p(X) :-
number_chars(X, L),
number_chars(L, Y),
Y = X.
succeeds for any number X. In the case of integers and ratios this
is easy, and if you can do it for floats it is easy to do it for
complex numbers. But for floats this is a very stringent requirement,
and it means that the floating point I/O routines supplied by the host
language (whether LISP, Pop, or C) are unlikely to be usable. There's
no helping that, it's a property we must have or give up any pretence
that floating point numbers can be handled in logic. Admittedly, it can
be very much clearer to round numbers to a few digits before showing them
to a human, and there really ought to be a standard predicate for doing
that. But these predicates must generate maximal precision, and for
some numbers a wee bit more. There is an MC (now CWI) report which gives
Algol 68 routines that could be used here.
A final point about number syntax is that number_string defines an
internal form for numbers. The external syntax of numbers
can be anything you please. Any number of extra leading zeros can
be permitted, and underscores can be allowed in numbers as well as
in atoms. This is a particularly good idea with bignums; it is
much easier to read 123_546_789_652 than 123546789652. And the
external syntax can support bases other than 10. Non-decimal notation
is not defined here because amongst other things name/2 doesn't
understand it; only the read/[1-2] routines of the Edinburgh family
Prologs understand it.
name(Constant, ListOfChars) :-
string(Constant),
!,
string_chars(Constant, ListOfChars).
name(Constant, ListOfChars) :-
number(Constant),
!,
number_chars(Constant, ListOfChars).
name(Constant, ListOfChars) :-
atom(Constant),
!,
atom_chars(Constant, ListOfChars).
name(Constant, ListOfChars) :-
var(Constant),
number_chars(Constant, ListOfChars),
!.
name(Constant, ListOfChars) :-
var(Constant),
!,
atom_chars(Constant, ListOfChars).
name(Constant, ListOfChars) :-
type_fail(1, atom-or-number).
As usual, it is the effect only which is being defined, and these
clauses need not be visible. The main reason for defining this
predicate is backwards compatibility. There is a rather strange
aspect of it. It was pointed out earlier that number_chars need
not recognise the syntax of unimplemented number classes. This
means that in an implementation such as Dec-10 Prolog which lacks
floats, name(X,"1.2e3") may bind X to an atom (as in fact it does)
instead of reporting a representation fault. In this case only,
this deviation from the rest of standard is permitted. But reporting a
representation fault is also permitted. A Prolog embedded in Lisp may
find it convenient to treat name/2 as the fundamental primitive, and
check that the result Lisp gives back has the right type. For example,
name(X, "(QUOTE (F))")
might call the Interlisp function PACKC and then check that the result
is an atom or number, and fail on finding that it isn't. name/2 is
required to work with any integer or unquoted Prolog identifier; a
program that uses it only for those can expect to fit in with any host
language. But new programs should generally use atom_chars or number_chars.
It is extremely useful to have a single predicate for converting a
constant to a list of ASCII codes. Given that name/2 is already strange,
adding the first clause so that it can convert a string to a list of
characters increases its usefulness without increasing its strangeness
too much. It is already the case that
nonvar(X), name(X, L), name(Y, L)
does not imply that X = Y; consider the case X='1'. So the fact that
name can convert a string to a list of characters but not conversely
is not additionally strange.
There are two predicates which "construct" numbers from other numbers.
*rational(Q, N, D) % NOT the real definition!
* if nonvar(Q) and \+rational(Q), type_fail(1, rational).
* if nonvar(N) and \+integer(N), type_fail(2, integer).
* if nonvar(D) and \+integer(D), type_fail(3, integer).
* if nonvar(N) and nonvar(D) {
* if the implementation cannot handle 1/0,0/0,-1/0 {
* if D = 0, zero_divide.
* }
* unify Q with the value of N/D
* } else
* if nonvar(Q) {
* let N1,D1 be such that N1/D1 = Q and gcd(N1,D1) = 1.
* if nonvar(D) {
* if D = 0 then {
* if D1 =/= 0 then fail.
* unify N with N1.
* } else {
* if gcd(|D|,D1) =/= D1 then fail.
* unify N with N1*D/D1.
* }
* } else
* if nonvar(N) then {
* if gcd(|N|,|N1|) =/= |N1| then fail.
* unify D with D1*N/N1.
* } else {
* unify N with N1, D with D1.
* }
* } else {
* report an instantiation fault.
* }
complex(C, R, I)
if nonvar(C) and \+complex(C), type fail(1, complex).
if nonvar(R) and \+float(R), type fail(2, float).
if nonvar(I) and \+float(I), type fail(3, float).
if nonvar(C) {
unify R with re(C), I with im(C).
} else
if nonvar(R) and nonvar(I) {
unify C with R+i*I.
} else {
report an instantiation fault.
}
There is no problem with complex/3. We could almost define it in Prolog
as
complex(c(R,I), R, I) :-
float(R), float(I),
I \== 0.0.
complex(R, R, 0.0) :-
float(R).
R and I cannot be rational numbers. If they could be, we'd run into
the problem discussed in the next paragraph.
There is quite a nasty little problem with rational/3.
For any given pair of integers N,D there is exactly one rational
number N/D (possibly including +infinity, -infinity, and undefined
as rational numbers). However, for any given rational number Q
there are infinitely many ways that of expressing it as the
quotient of two integers (with the exception of 0/0). According
to the design principle, then, rational(Q,N,D) for var(N) and var(D)
should report an instantiation fault or delay until at least one of
N and D is bound. But according to the description above, it will
succeed uniquely. This has the extremely nasty effect that
rational(1/2, N, D), D = 4
will fail (as rational/3 bound N=1, D=2) but
D = 4, rational(1/2, N, D)
will succeed. It could be argued that this is due to rational/3
trying to be too clever, and when nonvar(Q) it should not take the
other arguments into account. That would make both calls fail.
But that would mean that
rational(Q, 2, 4),
rational(Q, 2, 4)
would fail. If we ever hope to prove anything about Prolog programs,
this sort of behaviour in the primitives is totally unacceptable. The
conclusion I am forced to is that there is no one definition of
rational/3 which is satisfactory. Instead we have to use
divide(Q, N, D)
if nonvar(Q) and \+rational(Q), type_fail(1, rational).
if nonvar(N) and \+ integer(N), type_fail(2, integer).
if nonvar(D) and \+ integer(D), type_fail(3, integer).
if var(N) or var(D), instantiation fault.
unify Q with N/D.
rational(Q, N, D)
if nonvar(Q) and \+rational(Q), type_fail(1, rational).
if nonvar(N) and \+ integer(N), type_fail(2, integer).
if nonvar(D) and \+ integer(D), type_fail(3, integer).
if nonvar(Q) {
let N1,D1 be such that Q=N1/D1, D1 >= 0, gcd(N1,D1) = 1.
unify N=N1, D=D1.
} else
if nonvar(N) and nonvar(D) {
if D1 < 0 or gcd(N1,D1) =/= 1, simply fail.
unify Q with N1/D1.
} else {
instantiation fault.
}
The purpose of divide/3 is to construct a rational number by
performing the division. The purpose of rational/3 is to
construct a rational number by packing together two integers
which already have the right form, or to unpack a rational into
two integers. These definitions are free of the problems which
made the previous definition of rational/3 worthless.
Recall that these predicates may have to report a representation
fault if the complex or rational data types are not implemented.
In the interests of helping people debug their code, a representation
fault may be reported even when the result is representable, that is
rational(X, X, 1)
complex(X, X, 0.0)
may report a representation fault.
Constructing Compound Terms
There are three predicates which are used to construct compound terms
and take them apart, in addition, of course, to pattern matching.
They are functor/3, arg/3, and =../2.
functor(Term, Fsymbol, Arity)
if nonvar(Fsymbol) and \+atomic(Fsymbol), type_fail(2,atomic).
if nonvar(Arity) and (\+integer(Arity) or Arity < 0),
type_fail(3,natural).
if var(Term) {
if var(Fsymbol) or var(Arity), instantiation fault.
if Arity = 0 {
unify Term = Fsymbol.
} else {
if Arity > some limit, implementation fault.
construct a new term with principal functor
Fsymbol and arity Arity, all of whose
arguments are distinct unbound new variables
unify Term = this new term
}
} else {
if atomic(Term) {
unify Fsymbol = Term, Arity = 0.
} else {
compound(Term) is true.
unify Fsymbol = the principal function symbol of Term
and Artiy = the arity of Term
}
}
This looks amazingly complicated, but it mostly follows from the design
principle. If Term is a nonvariable, or Fsymbol and Arity are both
nonvariables, we have enough information to succeed at most once, otherwise
there is an instantiation fault, and instantiation faults should be checked
for before type failures. The Term could be anything. The Arity must be
a non-negative integer, and the call should type fail if it is not.
The Arity argument of functor/3 and the ArgNo argument of arg/3 MUST NOT
be integer expressions. The call MUST type fail if given such erroneous
values.
The inquisitive programmer is entitled to use functor(a(1,2),a,1+1) to
discover what a type failure looks like. When the Term is a variable,
permitting integer expressions here looks like a useful extension.
However, permitting integer expressions here would preclude efficient
compilation of a construct which must be implemented as efficiently
as possible, because although high level code may not use functor/3
much, low level support code uses it extensively. Generally, arithmetic
expressions are the kiss of death to efficiency in Prolog. Further, it
is difficult to come up with a single definition of what it means to have
an arithmetic expression here which makes sense in all cases. If
functor(T, a, 1+1)
were to be allowed, consistency would mean that
functor(a(1,2), a, X)
would have to be able to enumerate X=2, X=1+1, and all the other
integer expressions with value 2. This is obviously silly, and the
extra "generality" of allowing an integer expression when you are
building a term is spurious, as you can always write
N is Expr,
functor(Term, F, N)
The one thing which is not obvious is that functor(C,C,0) must succeed
for any constant C, so that if constant(C), functor(C,F,N) binds F=C,N=0,
and constant(T,C,0) binds T=C. Since there are values of N for which
functor(T,72.3,N) succeeds, we can only type fail if the function
symbol is a compound term.
An implementation is permitted to impose an upper bound on the arity
of a term. However, all arities up to 200 must be accepted.
This is the only Prolog primitive which needs to check this limit.
=.. is defined in terms of functor, so no separate check is needed.
Note that the error reported is an implementation fault, not a user
error. functor(T,a,9999) is perfectly well defined, if a Prolog
can't do it that's its fault, not the programmer's.
arg(ArgNo, Term, Argument)
if var(ArgNo) or var(Term), instantiation fault.
if \+integer(ArgNo) or ArgNo < 1, type fail 1.
if constant(Term) and \+ atom(Term), type fail 2.
if ArgNo > the arity of Term, simply fail.
unify Argument with the ArgNoth argument of Term.
Note that arg(1, a, X) simply fails, while arg(1, 0, X) type fails.
This is because 'a' might have come from a call functor(T,a,N)
where N could take on any non-negative value, and it makes sense
to consider it as the "empty" case of a "vector" class. But no
other atomic value has any related compound terms, so it makes
sense to type fail for them. A common pattern for low level code is
foo(Old, New, Context) :-
var(Old),
!,
var_foo(Old, New, Context).
foo(Old, New, Context) :-
nonvar_foo(Old, New, Context).
foo(Old, New, Context) :-
functor(Old, F, N),
functor(New, F, N),
foo(N, Old, New, Context).
foo(0, _, _, _) :- !.
foo(N, Old, New, Context) :-
succ(M, N),
arg(N, Old, OldArg),
foo(OldArg, NewArg, Context),
arg(N, New, NewArg),
foo(M, Old, New, Context).
The succ/2 call shields the arg/3 calls from N < 1. If the succ/2
call were moved down, and the cut were not there, the possibility of
arg/3 being called with N=0 would arise, and in this context we
would prefer arg(0, 72.3, X) just to fail. If we habitually use the
template shown (and if the compiler is smart enough the cut can be
dispensed with) it doesn't matter whether arg/3 fails or type fails
for N=0 or for general constants, and the usefulness of picking up
likely errors suggests that type failure is best.
=.. (pronounced "univ") converts a term to a list and vice versa.
It is defined in an appendix. That definition relies on length/2
(as defined above) and functor/3 to generate all appropriate
instantiation faults and type failures. It is interesting to note
that
X =.. [F,A1,...,An],
where the list is spelled out in the code, can be compiled nearly
as efficiently as
X = f(A1,...,An),
certainly without ever generating the actual list. While the general
case of univ is to be avoided as turning over more non-(local stack)
store than the use of functor and arg, this case is quite reasonable.
If it were generally appreciated that this can be compiled efficiently,
some of the demand for "variable functors" might go away, but I suppose
that is too much sense to hope for.
While these three predicates are all we actually need in the standard
for term hacking, there are three more that I would like to introduce.
same_functor(T1, T2) :-
same_functor(T1, T2, _, _).
same_functor(T1, T2, N) :-
same_functor(T1, T2, _, N).
same_functor(T1, T2, F, N) :-
nonvar(T1),
functor(T1, F, N),
functor(T2, F, N).
same_functor(T1, T2, F, N) :-
var(T1),
functor(T2, F, N),
functor(T1, F, N).
The reason for wanting these predicates is that they are reversible.
With them available, we can write
foo(Old, New, Context) :-
var(Old),
!,
var_foo(Old, New, Context).
foo(Old, New, Context) :-
nonvar_foo(Old, New, Context).
foo(Old, New, Context) :-
same_functor(Old, New, N),
foo(N, Old, New, Context).
foo(0, _, _, _) :- !.
foo(N, Old, New, Context) :-
succ(M, N),
arg(N, Old, OldArg),
arg(N, New, NewArg),
foo(OldArg, NewArg, Context),
foo(M, Old, New, Context).
and so propagate this reversibility up a level without too much pain.
In Dec-10 Prolog (where amongst other things we can't swap the arg/3
and foo/3 calls in the last clause as I just did) the demands of
"efficiency" force logic out into the cold when one writes low level
term hacking code. This should not be so. A standard conforming
implementation can defeat the point of these predicates by implementing
them exactly as written, but at least reversible code will still run.
The point of putting these predicates in the standard, of course, is to
prevent the names being used for something else.
Term Comparison
Dec-10 Prolog introduced the idea of a standard total ordering of
terms, defined (to quote the Dec-10 Prolog manual) as follows:
variables, in a standard order (roughly, oldest first - the order is
NOT related to the names of variables);
integers, from -"infinity" to +"infinity";
atoms, in alphabetical (i.e. ASCII) order;
compound terms, ordered first by arity, then by the principal
function system, then by the arguments in left to right order).
This predicate is defined because sort and keysort need it,
and they are defined because bagof and setof need them. The
existence of a total order on terms permits the the implementation
of efficient algorithms for many data structures, such as linear
time set operations, N^2lgN graph operations, and the like.
The fact that the ordering is defined for variables presents
difficulties. There are logical anomalies, such as the fact that
X @< 1
may be true at one time (because X is unbound) and false at
another (because X is bound to 72). Undoubtedly the most serious
problem is that there is an unstated assumption. So long as we do
not bind any variables, the order of two variables may not change.
The easy way to ensure this is the way C Prolog does it: don't have
a garbage collector. The next easier way to ensure it is the way
Dec-10 Prolog does it: have a compacting garbage collector for the
global/copy stack, and never mind what the heap does. But PopLog
cannot do either of these things. It has a garbage collector, and it
is a stop and copy one. This means that the addresses of variables
cannot be used to define their order. A third way is to make Prolog
variables into record structures (which PopLog already does) and to
timestamp them. This might seem rather costly, but then so is an
O(N^2) setof costly. Let's put that to one side for a moment.
Another problem is where do the other data types fit into this order?
In particular, strings, floats, and complex numbers?
As elsewhere, my response to this problem is to invent a new principle.
The principle here is the interval principle:
"Major" types should correspond to intervals in the standard order.
In the existing standard order, the major types are thus true, var,
nonvar, compound, atomic, number, atom.
This doesn't uniquely define where things like strings should go.
Strings are more like atoms than they are like any other data type,
so they should be adjacent to atoms in the standard order. But there
is another "data type" which is unnamed: the set of terms T for which
functor(T, F, N), atom(F)
is true. That should be an interval too. So strings cannot go between
atoms and compound terms. The only place left for strings is less than
any atom but greater than any other kind of constant.
The position of other data types (such as bit maps) is even less clear.
One thing which is absolutely clear is that nothing can be less than
variables, or that will spoil the interval property. It is the case in
Dec-10 and C Prolog that
var(X) :-
X @< _.
and it really would be a pity to lose this property. (The anonymous
variable is younger than any other variable, and the variable ordering
is by age.) So other data types must be greater than variables and
less than strings. But this doesn't define their order with respect
to numbers. Indeed, they could fall into two groups, one on either
side. However, another unnamed "type" might be of interest, the
"standard" constants, and that means that non-standard constants must
be less than numbers.
There is a problem with complex numbers.
There is no way that we can provide a total
order which is compatible with the standard arithmetic ordering on
complex numbers, as the standard mathematical definition has it that
no ordering on the complex numbers exists. But that needn't stop
us assigning whatever order we like, it just means that it has no
connection with algebraic laws. An obvious ordering which is easy to
implement is lexicographic order, ordering floating point numbers as
if they were complex numbers with 0 imaginary part. (The Common Lisp
manual says that the complex number #C(X,0) is the number X. This
should be part of the Prolog standard as well.)
The extended standard ordering is thus
variables, in some time-invariant order;
all "other" atomic objects, in some time-invariant
implementation-defined order;
integers and floating point numbers, ordered according to
numeric value (but see section 7.4 for 2 -vs- 2.0), and
complex numbers in lexicographic order;
strings, in lexicographic order;
atoms, in lexicographic order;
compound terms, ordered first by arity, then by function
symbol (always an atom), then by arguments from left to right.
The fundamental predicate here is
compare(R, X, Y)
If X comes before Y in the standard order, unify R with '<'.
If X is identical to Y, unify R with '='.
If X comes after Y in the standard order, unify R with '>'.
There are six term comparison predicates defined in terms of compare/3.
X @< Y :-
compare(<, X, Y).
X @>= Y :-
\+ compare(<, X, Y).
X == Y :-
compare(=, X, Y).
X \== Y :-
\+ compare(=, X, Y).
X @> Y :-
compare(>, X, Y).
X @=< Y :-
\+ compare(>, X, Y).
There are two kinds of Prolog implementation where these predicates
are not desirable. One is implementations like PopLog, where it is
difficult to order variables. The other is implementations like
MU Prolog which have coroutining capabilities, and would rather
wait until they can ascribe a definite order. One point of this
proposed standard is to ensure that certain predicates, even if not
implemented, shall not have their names used to mean something else.
It is therefore incumbent on me to propose some useful alternative
for these other systems so that they shan't make these names do the wrong
things. Therefore I define
term_compare(R, X, Y) :-
if X and Y are identical, unify R with '='.
otherwise, find the first place in a left to right scan
where they differ. If either one has a variable in that
place, wait for that variable to be bound and try again
if the implementation supports coroutining or report an
instantiation fault it it doesn't.
call compare(R, X, Y).
This is not quite the same as demanding that X and Y be ground.
term_compare(=,A+B,A+B) should succeed. What is required is that
the call be delayed until X and Y are either identical or clearly
different in a place where the ordering can be determined. When
term_compare eventually returns a result, it returns the same
result that compare would then. The point is that if term_compare
says that the ordering is such and such, it will never change its
mind. (Unlike compare/3 with variables.) The infix relations are
X $< Y :-
term_compare(<, X, Y).
X $>= Y :-
term_compare(R, X, Y),
(R = '=' ; R = '>').
X $> Y :-
term_compare(>, X, Y).
X $=< Y :-
term_compare(R, X, Y),
(R = '=' ; R = '<').
There is a subtle point here. X $>= Y is
not the same thing as not(X $< Y). The latter goal will be delayed
until X and Y are ground, or in a non-coroutining system will report an
instantiation fault if X and Y are not both ground. But the former only
requires them to be sufficiently instantiated to determine the result.
We would also like to have $= and $\=. However, they can proceed even
sooner than term_compare(R,X,Y). Namely, term_compare(R,X,Y) has to
wait until the first possible difference between X and Y is either
made definite or removed by instantiation. We can define $= and $\=
easily enough. A non-coroutining system should report an instantiation
fault instead of delaying.
X $= Y
unify X=Y. If this fails, fail.
If no variables were bound, succeed.
Otherwise, undo the bindings and wait
until some variable in X or Y is bound.
X $\= Y
unify X=Y. If this fails, succeed.
If no variables were bound, fail.
Otherwise, undo the bindings and wait
until some variable in X or Y is bound.
This gives us rather a lot of "equality" predicates. X = Y will try
to make X and Y the same. X $= Y will wait until someone else
makes them the same. X == Y wants them to be the same now. And
X =:= Y will evaluate two arithmetic expressions.
A lot of my code uses term comparison. Most of it should be using the
'term_compare' family rather than the 'compare' family. I do not enjoy
knowing that my "efficient" code for ordered sets depends on the
unchecked (though documented) assumption that the elements are "sufficiently"
instantiated. I really do think that the existing comparison family should
be reserved for meta-logical hackery and that most user code should be
using the term_compare family.
Sorting
Dec-10 Prolog has two sorting predicates:
keysort(PairList, SortedPairList) sorts a list of Key-Value pairs
so that the Keys are in standard order, retains duplicates,
and unifies the result with SortedPairList. This is needed for
bagof/3 so that solutions with the same bindings for free variables
can be grouped together. It is vital that this sort be stable.
sort(Random, Sorted) sorts the elements of the Random list into standard order,
eliminates duplicates, and unifies the result with Sorted. This is used by
setof/3 to convert a chronologically ordered list to a set representation,
and is used by the ordered-set library package to convert lists to sets.
C Prolog added two more predicates:
msort(Random, Sorted) sorts a list of terms as sort/2 does, but
does not eliminate duplicates. This is useful for putting bags into
a standard form.
merge(L1, L2, Merged) combines two ordered lists, retaining duplicates.
Now there are many variations on sorting that could be useful. It would
occasionally be useful to sort a list into descending order instead of
sorting it into ascending order and then reversing it,
to use other pairing symbols than '-', such as '=', or to use some
argument of an element other than the first as a key.
Therefore I have chosen to standardise, not the Dec-10/C Prolog routines,
but a more general pair sort/4 and merge/5 in terms of which the current
predicates can be defined. An appendix gives the definitions of these
predicates, and also definitions of sort/2, msort/2, keysort/2, merge/3
in terms of sort/4 and merge/5. The details of the code are not to be
regarded as part of the standard. I have a C version of these routines
which is quite efficient.
sort(Key, Order, Random, Sorted) sorts the list Random according to
the Key and Order specifications, and unifies Sorted with the result.
Key is a non-negative integer. If Key=0, it means that the entire
terms are to be compared. If Key=N>0, it means that the Nth arguments
of the terms are to be compared. Order says whether the list is to be
sorted into ascending (<, =<) or descending (>, >=) order and whether
duplicates are to be retained (=<, >=) or eliminated (<, >). The way
to remember the Order argument is that it is the relation which holds
between adjacent elements in the Sorted result.
A warning to implementors: don't use quicksort! Quicksort is
not the worst possible sorting method for Prolog but it comes
pretty close. Merge sort (or some variation on the natural merge)
in fact does fewer comparisons and less work generally; the
reason the people believe quicksort to be good is that papers
evaluating it have been concerned only with sorting arrays of
integers. Merge sort is also a lot simpler to code in Prolog,
as the appendix shows.
List Processing
There are four operations on lists which are so very useful
that they ought to be present in a Prolog system even before
any libraries have been loaded. The fact that they are not
so present means that a great many programs have tended to
include their own versions of these predicates, which causes
endless trouble when they are loaded into a Prolog where
they are present. It is time this was stopped.
length/2 has already been described. The others are
append([], L, L).
append([H|T], L, [H|R]) :-
append(T, L, R).
member(X, [X|_]).
member(X, [_|T]) :-
member(X, T).
memberchk(X, [X|_]) :- !.
memberchk(X, [_|T]) :-
memberchk(X, T).
The definitions of member/2 and memberchk/2 are to be taken
literally. Too many programmers have been unable to resist
the temptation to use memberchk(X, L) to add X to the end of
a list with an unbound tail for any other definition to be
acceptable.
append/3 is another matter. The clauses above define the values
it computes and the order it computes them in. However, it may
report an instantiation fault instead of entering a runaway
recursion, or delay in a coroutining Prolog, and it is strongly
recommended that it should. It may also type fail, while member/2
and memberchk/2 may not.
It is worth noting that if L is bound to a term [T1,...,Tn] (that
is, it is a list and does not end with a variable, then
append(L, X, Y) has at most one solution, whatever X and Y are, and
cannot backtrack at all.
append(X, Y, L) as at most length(L)+1 solutions, whatever X and Y
are, and though it can backtrack over these it cannot run away without
finding a solution. A Prolog implementation is required to find all
the relevant solutions. In this sense append/3 is not primitive, because
a primitive would be allowed to delay or instantiation fault.
append(X, L, Y), however, can backtrack indefinitely if X and Y are
variables. This may be reported as an instantiation fault.
The question does arise: what does functor([_|_], Cons, 2) bind
Cons to, and what does name([], Nil) bind Nil to?
Historically, the answers have been Cons=(.), Nil="[]".
There is little or nothing to gain in a new Prolog by changing
the Cons symbol. Indeed, several new Prolog systems represent
list cells in a special compact fashion where the Cons symbol or
functor is not actually stored. Having the Cons symbol as (.)
means that with a suitable operator declaration a.b.c.[] is an
acceptable way of writing the list [a,b,c], and it would be a
pity to lose this. A Prolog embedded in Lisp might as well
continue the Cons=(.) tradition. And of course there are
programs which ask functor(X,.,2) to test whether X is a
list instead of asking X=[_|_].
The name of [] is another matter. There is a substantial
benefit to Prologs embedded in Lisp or Pop if Nil="NIL"
or Nil="nil" is permitted, or perhaps even Nil="()".
Now there is already something strange about the token [] in
Prologt. There may be any number of spaces between the [ and
the ]; in fact [] is really a pair of tokens and is
recognised by the parser rather than the tokeniser. In Prolog
code there has never been anything to gain by knowing exactly
what atom represented the empty list, provided the reader always maps []
to the same atom. Therefore,
The empty list may be represented by one of the atoms '[]',
'()', 'nil', or 'NIL', the choice made once and for all by the
implementor. It may not be represented by any other atom.
The parser must turn the empty list [ ] into this atom.
So neither the truth nor the falsehood of []='[]' or []=nil
is guaranteed. But the truth of length([],0) is guaranteed
even though that of length(nil,0) is not, and the truth of
[]\='Nil' is guaranteed.
Operations on Strings
Edinburgh Prologs have not had any string operations. This is for an
excellent reason. They have lists, and they have atoms. Since lists
of ASCII codes can be used to represent strings, and since that
makes all the operations available for lists immediately available for
strings, using some other data structure would make strings harder
to use. One recent proposal smacks of the Wisdom of Gotham: it was
suggested that compound terms be used for strings, but that the binary
functor concerned should not be the list functor. The advantage of
that suggestion is that you don't save any space; if there is
special provision for lists (as there is in Quintus Prolog, a list cell
costing only 2 words instead of the 3 that general binary terms
require) you actually lose a substantial amount of space efficiency;
and you not only can't use append/3 to concatenate two strings, you
can't use grammar rules to do general pattern matching! Now the
ability to write SNOBOL-like patterns for string processing using Prolog
grammar rules is well worth keeping. Even if a Prolog implementation
does support strings as a "primitive" data structure, it would be
excessively foolish not to provide for the input of lists of ASCII
codes in some form of string notation.
A Prolog implementation is not required to support strings in any way.
Indeed, implementors are advised not to implement strings; that
way Prolog programmers will get more string processing power and
convenience from the language. However, if a Prolog system is
embedded in some other programming language which already provides
strings, or if it communicates with some other language using remote
procedure calls and has to be able to accept strings, then for the
interface with the other language it is useful to have strings as
a primitive data structure.
Strings can be compared using the term comparison predicates of
section 5.9. Conversion between strings and other objects can be
done with the predicates of section 5.7.
string_size(StringOrAtom, Size)
if nonvar(Size) and (\+integer(Size) or Size < 0),
type_fail(2, natural).
if atom(StringOrAtom) {
atom_chars(StringOrAtom, L), length(L, Size)
} else
if string(StringOrAtom) {
string_chars(StringOrAtom, L), length(L, Size)
} else
if var(StringOrAtom) instantiation_fault
else type_fail(1, string-or-atom).
string_char(CharNo, StringOrAtom, CharCode)
if nonvar(CharNo) and (\+ integer(CharNo) or CharNo < 1),
type_fail(1, natural).
if nonvar(CharCode) and (\+ integer(CharCode) or CharCode < 0),
type_fail(3, natural).
if nonvar(StringOrAtom) and \+string(StringOrAtom)
and \+atom(StringOrAtom), type_fail(2, string-or-atom).
if var(CharNo) or var(StringOrAtom), instantiation_fault.
succ(K, CharNo), length(P, K),
if atom(StringOrAtom) {
atom_chars(StringOrAtom, L),
append(P, [CharCode|_], L)
} else {
string_chars(StringOrAtom, L),
append(P, [CharCode|_], L)
}
substring(StringOrAtom, Offset, Length, SubString)
StringOrAtom must be a string or Atom
Offset must be an integer >= 0
Length must be an integer >= 0
if bound, SubString must be a string or atom.
atom(StringOrAtom) -> atom_chars(StringOrAtom, L)
; string_chars(StringOrAtom, L).
length(Drop, Offset), length(Keep, Length),
append(Drop, X, L), append(Keep, _, X),
atom(StringOrAtom) -> atom_chars(SubString, Keep)
; string_chars(SubString, Keep).
concat(A, B, C) <->
A is a string or atom, B is a string, atom or number,
C is the same sort of constant as A, and the
corresponding lists of characters satisfy append(La,Lb,Lc).
concat(A, B, C)
if nonvar(A) and \+(string(A) or atom(A)),
type_fail(1, string-or-atom).
if nonvar(C) and \+(string(C) or atom(C)),
type_fail(3, string-or-atom).
if var(B), instantiation_fault.
if \+(string(B) or atom(B) or number(B)),
type_fail(2, constant).
string(B) -> string_chars(B, Lb) ; name(B, Lb).
if nonvar(A) {
string(A) -> string_chars(A, La) ; atom_chars(A, La).
append(La, Lb, Lc).
string(A) -> string_chars(C, Lc) ; atom_chars(C, Lc).
} else
if nonvar(C) {
string(C) -> string_chars(C, Lc) ; atom_chars(C, Lc).
append(La, Lb, Lc).
string(C) -> string_chars(A, La) ; atom_chars(A, La).
} else {
instantiation_fault.
}
The first thing to note about these is that they can be applied
to atoms as well as to strings, so a Prolog system which does not
support strings should still support these predicates. In both
substring and concat the type of the result is the same as the type
of the first argument.
The name 'concat' was chosen instead of something with "string"
in it because 'concat' is already in the Prolog library with this
meaning. It allows the second argument to be a number because
'concat' mainly exists for gensym/2; one wants to call
concat(step, 47, StepName) and have StepName bound to step47.
In fact the Dec-10 Prolog version of this predicate lets you
call concat(10,12,X) and X will be bound to the integer 1012.
The specification above rules this out as a type failure. It
certainly isn't a particularly meaningful operation on numbers.
concat/3 is like append/3. Now append(A,B,C) can be used to solve
for any one of A,B,C given the other two. So one might expect that
concat(A,B,C) could be used to solve for any one of A,B,C given the
other two. Not so. If we know A and B, there is a unique C, and
if we know B and C there is at most one A which will do. But if
there is any solution for B given A and C, there are at least two
and may be three. For example, concat('a', B, 'a1') has the three
solutions B=1 {integer(B)}, B='1' {atom(B)}, B="1" {string(B)}.
Now in some particular implementation, it might be the case that
there were no strings, and that the name of an atom could not
resemble the name of a number. In that case B would have a unique
solution. But according to the section on representation faults,
the fact that the solution B="1" cannot be yielded is not because
the logic is different in this implementation, but because the
implementation is deficient. The solution's uniqueness being
accidental in such an implementation, concat('a',_,'a1') must
instantiation fault, otherwise a program using it will stop working in
an implementation with more data types.
There are lots of ways of defining substring. We could specify
the number of characters to drop as here, or the index of the
first character to keep (Drop+1). We could specify the number
of characters to keep as here, or the index of the last character
to keep (Drop+Keep). This definition is the simplest, and has
some nice algebraic properties. Any other substring extraction
predicate can easily be programmed in terms of this one. One
which is recommended is part_string:
part_string(StringOrAtom, Neck, Before, After) :-
string_size(StringOrAtom, Size),
Keep is Size-Neck,
substring(StringOrAtom, 0, Neck, Before),
substring(StringOrAtom, Neck, Keep, After).
NB: the operation is not string_part; that would take a string and
a predicate and part the characters according to some property.
The argument order of string_char was chosen to resemble arg/3.
Extending these predicates to accept lists of ASCII codes might
seem a useful thing to do. However, that would leave us with
the problem of distinguishing between the empty list [] and the
atom []. So the extension is ruled out.
Simple Integer Arithmetic
The predicates defined in this section are not part of Dec-10
Prolog. Though they are available in the Dec-10 Prolog library
(in the file EDXA::UTIL:ARITH.PL). I reluctantly added succ/2 to
C Prolog (copying it from Prolog-X) because of the extreme inefficiency
of 'is' in that interpreter. The inefficiency is not because it is
badly coded. Each individual 'is' call is slowed down by the necessity
for coping with floating point arithmetic and the use of a recursive
procedure in C to walk over the tree, even when there is only a tiny
tree to be walked over. Further, there is a global inefficiency due
to the fact that an arithmetic expression is a compound term, which
in C Prolog does not have to be constructed each time as it would in
say Prolog-X or NIP, but it does tie up global stack space which may
never be recovered. In other Prologs, the tree has to be constructed
and then interpreted.
One problem with 'is' is that in a goal such
as "X is Y+1", Y may at run time be instantiated to an arbitrary
ground arithmetic expression, so even if the compiler generates code
for arithmetic expressions it has to be prepared to call a tree-walker
for the variables in the expressions. The Dec-10 Prolog compiler gets
around this by insisting that in a compiled call to 'is' the
source variables must be bound to integers and may not be bound to
expressions. Only a compiler writer could imagine that this is in
any way justifiable; there is certainly no shadow of support for it
in any approach to the logic involved, and the discrepancy between
compiled and interpreted arithmetic has the usual baneful consequences.
Having introduced succ/2 to C Prolog, I found that I liked its
reversibility quite as much as I liked the fact that length/2
now ran twice as fast. So I added plus/3. And I found that I
liked that too. between/3 has been around for a long time as a
way of generating integers in a bounded range. The other predicates
were added to the set for completeness. divide/4 is particularly
useful.
These predicates have the following characteristics.
They work on integers, not on expressions.
They are invertible (that is succ(M,N) will solve for N given M
or for M given N) and so follow the "design principle" in a way
that "N is M+1" does not.
They can be compiled into efficient code with little effort.
For example, a compiler noticing that Sum is unbound in the
goal plus(X,Y,Sum) can generate the equivalent of "Sum is X+Y"
but without the necessity of invoking a tree walker to
evaluate X and Y. Further, as the arguments can only be constants
or variables (the compiler, detecting anything else, can replace
such a goal by code to report a type failure) there is no need to
build compound terms at run time or put these variables in the
global stack.
They do not seduce beginners into writing "Sum = X+Y", which is a
particularly nasty bug, as if Sum later on appears in an arithmetic
relation or an 'is' it will appear to have the right value.
A clause containing at most one of the comparison predicates and
at most one of the computation predicates is generally clearer
than a clause containing "is". This is of course a matter of
personal taste.
However, there are things you can do with 'is' that you can't do
with these predicates. Most notably, C Prolog, NIP, Prolog-X, and
PopLog all provide floating point. The efficiency of these predicates
is due to the fact that they do not work on floating point numbers,
only on integers. Also, the bitwise operations <<, /\, and so on,
are only available via "is". There is some sort of excuse for this.
Bitwise operations are less "logical" than simple integer arithmetic
(for example, right shifts are not the same as division by powers
of two on most machines, but on some machines they are), and floating
point arithmetic is notoriously ill-behaved.
The reason for including these predicates in this proposed standard is
that there are substantial efficiency gains from using them (in a Prolog
which implements them directly), and it would be a pity if one had to
choose between efficiency and portability (we know what would get
chosen, don't we?). There are also pedagogical advantages from
presenting these predicates first; their syntax and behaviour resembles
the syntax and behaviour of list-processing predicates such as append/3
far more than does "is".
The Dec-10 library file is given in an appendix, so any Prolog implementation
which supports "is" in a standard way can also support these predicates,
albeit inefficiently.
Comparison
lt(X, Y) <->
integer(X) & integer(Y) & "X is less than Y".
if nonvar(X) and not integer(X) then type fail.
if nonvar(Y) and not integer(Y) then type fail.
if var(X) then instantiation fault.
if var(Y) then instantiation fault.
if X @< Y then succeed else fail.
le(X, Y) <->
integer(X) & integer(Y) & "X is less than or equal to Y".
if nonvar(X) and not integer(X) then type fail.
if nonvar(Y) and not integer(Y) then type fail.
if var(X) then instantiation fault.
if var(Y) then instantiation fault.
if Y @< X then fail else succeed.
There are three ways of comparing two numbers.
lt(X, Y) works only for integers, fails for non integers,
and reports an instantiation fault if either argument is a
variable and the other is a variable or integer.
X @< Y compares arbitrary terms, and defines an order on
variables even. It generalises micro-PROLOG's "LESS"
predicate, and is indispensable for the implementation of
setof/3, bagof/3, and efficient implementation of sets and
finite maps. The fact that "1 @< b" is useful, but when
you intend to compare integers only it is even more useful to
say so.
X < Y compares arbitrary ground arithmetic expressions, by
evaluating both and then calling @< on the results. A Prolog
definition for it is given in the second appendix.
The term comparison predicate compare/3 is the fundamental
operation used to define all the comparison predicates. It is
adequately defined in the Dec-10 Prolog manual, except to note
that which ordering is given to variables isn't all that
important. PopLog could for example put a "timestamp" in the
spare word of a variable record and do comparison on that.
It is important to note that the "law"
le(X, Y) <-> \+ lt(X, Y)
does not hold; lt(a, b) is false (because a and b are not
integers), but that does not make le(b, a) true.
Because apply/2 and call/N for N>1 put the extra arguments at the
end of the goal, we can use lt(X) to express the predicate
"greater than X" and le(X) to express the predicate
"greater than or equal to X". In order to express the predicates
"less than X" and "less than or equal to X" we need
gt(X, Y) :-
lt(Y, X).
ge(X, Y) :-
le(X, Y).
These predicates are not otherwise necessary, but it can be clearer
to use them. While they are defined here by Prolog clauses
calling the primitives lt and le, error reports produced by
them should identify gt and ge as the culprits. If there is
any asymmetry in the implementation, this should not be documented,
in order to avoid distorting people's coding styles.
As these predicates are not yet in general use, there is time to
settle on better names for them.
The names that must not be used are <, =<, >, and
=, because PDP-11 Prolog and PopLog are alone in failing to
evaluate the arguments of those predicates as ground arithmetic
expressions (and a great nuisance it is trying to make Dec-10 or
C Prolog programs run in PopLog it is too).
When a compiler encounters <comparison>(X, Y), it should do the
following.
if X and Y are both integers, the goal should be replaced by
"true" or "fail" as appropriate. This case is most likely to
arise in automatically generated code such as that produced
by the unfolder.
if X or Y is a non-variable non-integer, the call should be
replaced by a type_fail call.
if X or Y is a variable which must be uninstantiated at this
point, the call should be replaced by an instantiation_fault call.
otherwise the compiler can generate the obvious machine code.
There should be an implementation subroutine with a fast
calling method that dereferences a variable and checks that
the result is an integer; all the arithmetic code can use it.
If anyone wants to make these operators, they are welcome to
do so, but questions of syntax apart from relation name and argument
order are beyond the scope of this document.
Enumeration
between(Lower, Upper, X) <->
integer(Lower) & integer(Upper) & integer(X) &
Lower =< X =< Upper.
if nonvar(Lower) and \+integer(Lower), type_fail(1,integer).
if nonvar(Upper) and \+integer(Upper), type_fail(2,integer).
if nonvar(X) and \+integer(X), type_fail(3,integer).
if nonvar(Lower) and nonvar(Upper) then
if Lower > Upper then fail else
if var(X) then {
try binding X to each of the integers Lower..Upper
in turn, but the search order is not specified
} else {
if Lower =< X =< Upper then succeed else fail
} else
if nonvar(X) then {
if nonvar(Lower) and Lower > X then fail else
if nonvar(Upper) and Upper < X then fail else
instantiation fault
} else {
instantiation fault.
}
range(Lower, X, Upper) :-
between(Lower, Upper, X).
The pseudo-code for between/3 looks quite complicated, but the Prolog
programmer doesn't have to understand it. All ia has to understand
is the logical definition. The code follows uniquely from applying the
design principle to that definition.
The utility of a predicate such as this is obvious. The only thing to
decide on is the name and the order of the arguments. There is no
incentive to place the Upper argument to the left of the Lower one,
so there are only three alternatives:
foo(X, Lower, Upper)
foo(Lower, X, Upper)
foo(Lower, Upper, X)
As with the question of whether it is a good idea to have gt as
well as lt, the ground of the decision is "what is a useful
partially applied form to give maplist/checklist/mapassoc/...?"
The answer is immediate: between(L,U) is a useful combination
and the others are not, because X is the only parameter which has
a finite solution set and can thus be generated.
So between/3 is uniquely determined as the best possible bounded
enumeration predicate. Why then have I included range/3? Just as
a gesture of politeness to Harry Barrow, who is the only Prolog
programmer not at Edinburgh ever to contribute to the Prolog
subroutine library. (Udi Shapiro has contributed the Concurrent
Prolog system, but while it contains a number of utility predicates
they are not split out so that people can get at them; he has
contributed to the program library but not the predicate library.)
Now Harry Barrow's contribution does not include the range/3 predicate,
but his paper on the VERIFY program uses it in a lot of the examples.
I think "range" is a bad name (because it's useful to talk about the
range of a function; if range is taken up this way we'll have to talk
about the co-domain), and while the argument order is every bit as
natural as the argument order of between/3 it is not at all useful
with the higher-order predicates. But if we can't talk him into using
between/3, let's have range/3 in the standard.
It is clearer to write between(1, 100, X) than it is to write
le(1, X), le(X, 100), and when X may be a variable
it is better logic (because X can be solved for). But when X is
not a variable it would be nice to use more specialised code.
A compiler should consider the following cases:
if there is enough information at compile time to determine
the result, generate a "true", "fail", "type fail", or "instantiation
fault" call as appropriate.
if X is known to be uninstantiated (perhaps this is the first mention
of it in the clause), call directly to the enumeration case (which has
to check that Lower and Upper are integers in the right order).
it can happen that X is known to be instantiated. If it is mentioned
in integer(X) or any other arithmetic relation upstream of this point,
it must be instantiated to an integer, and comparison code can be
generated as for the two le-s.
This means that in the clause
case_shift(Ucase, Lcase) :-
plus(Ucase, 32, Lcase),
between(0'A, 0'Z, Ucase).
the call to "between" is known not to be enumerating Ucase,
but to be a simple test. Of course it would be better to write
case_shift(Ucase, Lcase) :-
between(0'A, 0'Z, Ucase),
plus(Ucase, 32, Lcase).
as that can compute all the solutions.
Should we include gen_nat and/or gen_int in some form as standard
generators as well? I'm a bit unhappy about including any form of
unbounded backtracking.
Addition and Subtraction
succ(P, S) <-> integer(P) & integer(S) & S = P+1 & P >= 0.
if nonvar(P) and not integer(P) then type fail.
if nonvar(S) and not integer(S) then type fail.
if integer(P) then {
if P >= 0 and S unifies with P+1 then succeed else fail
} else
if integer(S) then {
if S > 0 then unify P with S-1 and succeed else fail
} else
instantiation fault.
plus(A, B, Sum) <->
integer(A) & integer(B) & integer(Sum) & A+B = Sum.
if nonvar(A) and not integer(A) then type fail.
if nonvar(B) and not integer(B) then type fail.
if nonvar(Sum) and not integer(Sum) then type fail.
if var(A) then {
if var(B) or var(Sum) then instantiation fault.
unify A with Sum-B and succeed.
} else
if var(B) then {
if var(Sum) then instantiation fault.
unify B with Sum-A and succeed.
} else {
if Sum unifies with A+B then succeed else fail.
}
Note: if the implementation does not support bignums, these
predicates can report an integer overflow fault.
As I said at the beginning, I originally copied succ/2 from Prolog-X.
But in Prolog-X succ(X,Y) and plus(1,X,Y) were identical. Why the
difference? (Prolog-X has now changed to this definition, I believe.)
The difference is that plus is for doing arithmetic, and succ is
for counting things. If we have plus/3 available, we can use
plus(1) whenever we are interested in adding or subtracting 1 from
an arbitrary integer, and we might as well get some use out of having
succ/2 as well. And the use is that a recursive loop using succ/2
cannot run off to negative infinity.
A common pattern for "counting" loops in Prolog is to count down to 0.
We write, for example,
p(0, , ) :- !,
<base case>.
p(N, , ) :-
<step case>(, ),
M is N-1,
p(M, , ).
The reason we need the cut in the first clause is to ensure that
the "is" in the second clause won't count down to -1, -2, -3, ...
Of course we could (and should!) write this without the cut, by
putting a test in the second clause. And that would also get
around the possibility of a call with the output arguments so
instantiated that the base case fails before it gets to the cut.
With 'succ' available, we combine the test and the subtraction,
so that the well behaved code is no longer nor more complex than
the hacky code. And a compiler could perhaps recognise succ as a
form of pattern matching. So we get
p(0, , ) :-
<base case>.
p(N, , ) :-
succ(M, N), % N is a successor natural
<step case>(, ),
p(M, , ).
If p/3 is called with its first argument instantiated to anything
other than a non-negative integer, it will simply fail, without
any need for complicated cuts and tests. It is just like
p(0, ...) :- ...
p(s(M), ...) :- ...
in that respect. This strong similarity to the obvious pattern
matching definition of successor is the reason for succ's utility.
Multiplication
times(A, B, P) <-> integer(A) & integer(B) & integer(P) & A*B=P.
if nonvar(A) & not integer(A) then type fail.
if nonvar(B) & not integer(B) then type fail.
if nonvar(P) & not integer(P) then type fail.
if nonvar(A) & nonvar(B) then
if P unifies with A*B then succeed else fail
if nonvar(A) & nonvar(P) then % B is var
if A = 0 then if P = 0 then instantiation fault else fail
else if A divides P then unify B with P/A and succeed else fail
if nonvar(B) & nonvar(P) then % A is var
if B = 0 then if P = 0 then instantiation fault else fail
else if B divides P then unify A with P/B and succeed else fail
else instantiation fault
As usual, this definition is forced by the decision that the
predicate shall apply to integers and that the design principle
shall be followed. Only the argument order is left. The
argument order is the same as the argument order of plus/3,
which is to say that it has the usual output at the right.
This order is good for mapping, e.g. to multiply a list of
integers by 7 we have only to call
maplist(times(7), SingleFold, SevenFold)
We can also use it to test for divisibility: times(7, _, X)
is true if and only if X is an integer divisible by 7.
As with plus, it doesn't take a very smart compiler to figure
out what special purpose code to generate, e.g. the divisibility
test above notes that the second argument is a void variable,
so it must be the output if any, so the case to consider is
if var(X) then instantiation fault else
if not integer(X) then type fail else
if X is divisible by 7 then succeed else fail
The main special case to watch out for, of course, is when the
third argument is a new variable.
Having two sets of arithmetic relations (the 'is' family and the
'plus' family) may seem like a bad idea. But they cover two
different kinds of problem. With the 'is' family we can express
involved calculations, while with the 'plus' family we can
express arithmetic relations. For example, to test that
all the elements of one list of integers are the same multiple
of some other list of integers (and this might be of interest
in doing calculations on polynomials) we can write
maplist(times(_), BaseList, Multiples)
which would be much harder to write using 'is' and hard to understand
once written. While with 'is' we can write
D is (sqrt(B^2 - 4*A*C) - B)/(2*A)
which cannot be expressed at all with the 'plus' family (which is
restricted to integers in order to have logically simple and
efficiently implementable semantics), and which would be completely
obscure if it could be so expressed.
Note that in the call times(X, Y, 24), although we don't have enough
information to determine a unique solution, we do have enough to
determine all 16 (1,24, -1-24, 2,12, -2-12, ..., 24,1, -24,-1).
A Prolog implementation which backtracked over the factors of P in
this way conforms to the standard. So does one which reports an
instantiation fault. And so does one which delays.
Division
Division is a nasty one. Division on the naturals is straight-forward,
but when the divisor and dividend can be negative there are at least
three definitions in use. (There was an article about this in Software
Practice & Experience.) I expect that there will be some disagreement
about this. The Gordian knot can be cut by insisting that divide/4
is only defined for non-negative arguments and must FAIL if given a
negative number in any of its argument positions (perhaps as a type
failure). However, my preference is to adopt the Algol 60 solution:
X div Y = sign(X/Y)*floor(abs(X/Y))
X mod Y = X - (X div Y)*Y
that is, division truncates towards 0.
divide(A, B, Q, R) <->
integer(A) & integer(B) & integer(Q) & integer(R) &
Q = A div B & R = A mod B
Prolog code for this is given in the Appendix. As usual, if
any argument is a nonvariable noninteger, divide/4 type fails.
If A and B are given, Q and R are uniquely determined.
If B, Q, and R are given, A must be B*Q+R, but it is still
possible for divide/4 to fail, as (B*Q+R)divB is not
necessarily Q. If A, Q, and R are given, we may have a
unique value for B, a failure, or an instantiation fault (see
the Prolog code). Otherwise we have an instantiation fault.
Note that if A is 0, this determines that Q and R are both
0, but it does not determine B at all. We could continue
with B unbound, but the design principle permits us to report
an (in this case continuable) instantiation fault.
If there is no instantiation fault, nor a type failure,
divide/4 will signal zero_divide if B = 0.
Divide is useful because we often want Q and R both.
I find it very irritating when writing
Q is X/Y, R is X mod Y,
in Dec-10 Prolog to know that it will do the division
twice. Of course, if I only want the remainder, I can call
divide(X, Y, _, Rem)
and it is a stupid compiler which can't exploit this.
Optimisation
Most of the predicates discussed in this section are as reversible as
is compatible with not backtracking in the implementation language.
Some of them have quite complicated descriptions. There are two
points to make. One is that the descriptions are if trees, and so
don't run all that slowly. Code using this family of predicates is
as much as twice as fast as code using the irreversible 'is' family
in C Prolog. The other is that most of the generality can be
optimised away by a very simple compiler.
What does the compiler need to know? The main thing it needs to know is
whether it has seen this particular variable before. If a variable has
not appeared earlier in the clause, it must be uninstantiated, and
this information is often sufficient to determine which branch of the
if tree must be followed. For example, in
p(A, B, C, Z) :- % Z is (A-C)*(A-B)
plus(A, X, B),
plus(A, Y, C),
times(X, Y, Z).
this information is all we need to tell which way round to use
the two 'plus' calls. The other information is when a variable is
instantiated. For the purpose of compiling arithmetic expressions,
all we need to know is whether a variable is an integer or not.
After an integer(X) test, or after any of 'plus' family calls,
we know that each of the variables involved is an integer. This
information is all we need to tell that times(X,Y,Z) can be
compiled as "unify Z with the product of X and Y", and as that is
the most efficient way to do it, that's how it should be compiled.
The Prolog type checker can be used to check that calls to 'plus'
family predicates only supply integers as arguments.
The result of this sort of optimisation is that we can write
clauses using logical relations, and get code which is every bit
as efficient as if we had used the irreversible 'is' family.
Indeed more efficient, as we don't need to call a tree walker.
Similar optimisation can and should be applied to other primitives,
functor and arg in particular.
Delaying and Presburger Arithmetic
It is interesting to note that if we restrict ourselves to the
relations, to succ, to plus, and to times with one of the first two
arguments bound, a conjunction of such goals is a formula in
Presburger arithmetic, and so is decidable. A system such as MU Prolog
could exploit this, and (at fairly high cost) actually run such
programs as
?- ge(X, 0), % X:int >= 0
ge(Y, 0), % Y:int >= 0
plus(X, Y, T1), lt(T1, 4), % X+Y < 4
times(2,X, T2), gt(T2, Y), % 2X > Y
times(2,Y, T3), le(T3, X).. % 2Y =< X
whose unique solution is X=2, Y=1. Detecting this is considerably
easier than with 'is'. This isn't much use in practical programming,
but it could be important in a "Prolog Technology Theorem Prover"
such as Stickel has suggested.
The 'is' family.
All arithmetic in Dec-10 Prolog is based on arithmetic expressions.
There are actually quite a few predicates which evaluate one or more
arguments:
V is E evaluates E
X =:= Y evaluates X and Y
X =\= Y evaluates X and Y
X =< Y evaluates X and Y
X >= Y evaluates X and Y
X < Y evaluates X and Y
X > Y evaluates X and Y
put(C) evaluates C
ttyput(C) evaluates C
skip(C) evaluates C
ttyskip(C) evaluates C
tab(N) evaluates N
ttytab(N) evaluates N
I/O
The principal reason why [tty]put/1 and [tty]skip/1 evaluate their
argument is that until recently Dec-10 Prolog lacked a notation for
character codes. In order to specify a particular character to be
output, one had either to write the Ascii code (as in put(42))
or to use a single-character string (as in put("*")). The need for
this has diminished considerably with that advent of "radix 0",
as one can now write put(0'*). It was also useful to write
put("0"+N), but the fact that Prolog predicates in general do not
evaluate their arguments has meant that few Prolog programmers other
than the implementors of Dec-10 Prolog have exploited this. I cannot
recall seeing any instances of tab(N) with N an arithmetic expression,
no doubt tab(3*N) would occasionally be useful, but in general one
uses the same argument for tab/1 so often that it is more efficient
not to re-evaluate it each time.
ttyput/1 wrings one more feature out of evaluating its argument.
Under TOPS-10 (and TOPS-10 only), if the high-order bits of N (all
but the bottom 7) are non-zero, the bottom 8 bits are sent directly
to the terminal without operating system intervention. So one
occasionally sees ttyput(256+C). The whole matter of the user
interface needs considerable work, and this solution to one tiny
corner of it is unlikely to be useful under any other operating
system. Certainly its lack in C Prolog has yet to be noticed.
A redesign of the whole I/O component of Prolog is needed. The
most that has ever been claimed by the implementors of Dec-10
Prolog is that the current system is enough to make the Prolog
compiler work. Having written BLISS-10 and MACRO-10 code using
the TOPS-10 monitor calls to do I/O, I can testify that the
TOPS-10 system strongly discourages any experimentation with I/O,
writing even the simplest I/O library is so pointlessly hideous
a task that anyone who has more rewarding things to get on with
(such as implementing TRO) will rightly turn to that as soon as
ia may.
Until such a redesign is done, we might as well retain most of
the Dec-10 Prolog predicates as they stand. But we do not
need to retain this peculiar property of [tty]{put,skip,tab}.
A Prolog implementation which supports these predicates, but
which insists that their arguments be instantiated to non-negative
integers (and for put and skip, integers less than 128) should be
considered as "standard". So we arrive at the specifications:
put(C):
if C is a variable, instantiation fault.
if C is an integer in the range 0..127
append the character C to the current output stream
otherwise
the implementation MAY type fail
the implementation MAY evaluate C/\127 as an arithmetic
expression and append the result to the current
output stream
the implementation may NOT do anything else
ttyput(C):
as put(C), but reading "standard output stream"
for "current output stream" throughout.
skip(C):
if C is a variable, instantiation fault.
if C is an integer in the range 0..127
repeat D := next character from current input stream
until D = 26 or D = C
if D =/= C then fail
otherwise
the implementation MAY type fail
the implementation MAY evaluate C/\127 as an
arithmetic expression and call skip() on the result
the implementation may NOT do anything else
ttyskip(C):
as skip(C) reading "standard input stream" for
"current input stream" throughout.
tab(N):
if N is a variable, instantiation fault
if N is an integer then
if N < 0 the implementation MAY type fail
write max(N,0) spaces to the current output stream
otherwise
the implementation MAY type fail
the implementation MAY evaluate N\/0 as an arithmetic
expression and call tab() on the result.
ttytab(N):
as tab(N) but reading "standard output stream" for
"current output stream".
Thus for example, it is not defined whether tab(1+2) writes
three spaces or type fails, but if it succeeds it may not do
anything but write three spaces. Nor is it defined whether
tab(-4) succeeds or type fails, but if it succeeds it must have
no effect.
If an implementation does support the extended forms of these
predicates, so that put("0"+N) does normally work, it is
recommended that when the "type failures produce error messages"
flag is set these predicates should nevertheless produce type
failure error reports. The point of this is that argument
evaluation will make it easier to convert Dec-10 Prolog programs,
but the type failure messages will help you find the places that need
to be changed if the program is to run under other standard-conforming
Prolog systems that choose to type fail.
There is a subtle point in the specifications above: /\ and \/
insist on integer arguments and produce integer results. So even if
a Prolog system supports argument evaluation in these predicates,
it must reject tab(2.7). A Prolog system MAY NOT accept floating
point numbers in these predicates. (Do you round up? down? truncate?)
Arithmetic Expressions
An arithmetic expression is a ground term built out of integers,
floating point numbers (if you've got them), and at least the
following functors:
+ /1 complex conjugate (as in APL)
+ /2 addition (same as PLUS in Common Lisp)
- /1 negation (same as MINUS in Common Lisp)
- /2 subtraction (same as DIFFERENCE in Common Lisp)
* /2 multiplication (same as TIMES in Common Lisp)
/ /2 This one is a problem. See the discussion in the text.
^ /2 exponentiation (same as in Common Lisp)
div /2 divide(X,Y,Q,R) -> Q is X div Y
mod /2 divide(X,Y,Q,R) -> R is X mod Y
num /1 rational(Q,N,D) -> N is num(Q)
den /1 rational(Q,N,D) -> D is den(Q)
re /1 complex(C,R,I) -> R is re(C)
im /1 complex(C,R,I) -> I is im(C)
i /2 complex(C,R,I) -> C is R i I
/\ /2 bitwise conjunction
\/ /2 bitwise disjunction
\ /1 bitwise complement
<< /2 left shift
> /2 right shift
The !(X) and $(X) forms of Dec-10 Prolog are peculiar to that
implementation. With bignums available (as they should be),
there is no need for them. The point of these operations was that
Dec-10 Prolog arithmetic used 36 bit 2s complement for calculations,
but could only store 18 bits. (Actually, this isn't quite true,
but how many people know about xwd?)
Dec-10 Prolog does not define +X at all. C Prolog defines +X to
be X for real and integer X. If complex numbers are admitted, we
need a complex conjugate operator. Since the conjugate of a real
number is that same real number, making +X be the complex conjugate
of X agrees with C Prolog on all the values that C Prolog can represent.
Whether complex/3 or rational/3 always give a representation fault or not,
+/1, num/1, den/1, re/1, and im/1 must not. However, if complex numbers
are not implemented XiY may representation fault even when Y=0.0.
Note that arithmetic expression evaluation does whatever coercions are
necessary: 2i3 =:= 2.0i3.0 even though complex(C,2,3) must type
fail. If rational numbers are not implemented, N/D may yield a floating
point number even when N and D are both integers.
If an implementation supports rational numbers, the exponentiation
routine may exploit the fact, and treat X^(1/3) differently from
X^(0.3333...).
The Evaluation of Lists
Dec-10 Prolog and C Prolog define the value of [X] to be the value
of X. (The Dec-10 Prolog manual suggests that this only applies
when X is an integer. In fact [1+2] evaluates to 3 in both the
Dec-10 and C-Prolog interpreters.) This feature is there so that
you can include character codes in your programs. Now that radix
0 exists (in Dec-10 Prolog, C Prolog, Prolog-X, and NIP) this is
no longer necessary. However, there is a lot of code that uses
it, as 0'C is quite recent. Therefore
it is strongly recommended that new Prolog implementations
should evaluate [X] as they evaluate X.
It is, after all, very cheap to do this (especially at compile
time), and what else would [X] mean anyway? However, an implementor
MAY choose not to implement this feature, but a standard conforming
Prolog MUST not define the value of a list to be anything else.
Division and Floating Point
What should the solidus (/) mean? Dec-10 Prolog uses it to mean div.
There is no problem with that, as Dec-10 Prolog hasn't got floating
point numbers, so div is all that it might mean. The LONG rational
package uses it to mean rational division, and introduced div to
mean integer division. C Prolog uses it to mean floating point division,
which is consistent in approach with LONG's use of it for rational division.
This is also what Algol 60 did. However, it doesn't do wonders for existing
Dec-10 Prolog programs.
There is a flag in C Prolog called "all_float". By turning this off,
the solidus can be made to act like Fortran's division operator.
That is, "X is 1/2" will unify X with 0, while "X is 1.1/2.2"
will unify X with 0.5. This has proven entirely satisfactory in
practice, as Dec-10 Prolog programs can run without change while the
solidii are hunted down and changed to div. This satisfaction is
unlikely to endure once people here start doing floating point
arithmetic in C Prolog, and has a major flaw.
The flaw is that we want to be able to say that 1 < 1.5 and that
1.5 < 2. Since arithmetic comparison is defined in terms of
evaluating both sides and then using term comparison, we want term
comparison to order integers and floating point numbers according to
their values as members of the real number line. This gives us two
choices. Either we consider each floating point number X to represent
its exact value, or to represent its exact value minus epsilon for some
very small epsilon. If we choose
the latter, we can distinguish between 2 and 2.0, and have another
choice to make: either N < N.0 (epsilon < 0) or N > N.0
(epsilon > 0). Either of these would define a consistent ordering
on terms, but neither is more natural than the other. It is difficult
to defend either ordering on any grounds other than distinguishing
integers and floats. So we have to take the first choice, and say
that N =:= N.0 for each integer N. And this leaves us no choice
but to say that N == N.0 for each integer N. And prior to the
introduction of floating point it was an axiom of Prolog that
X == Y -> X = Y
So the natural desire to make "2 =:= 2.0" true forces us to the conclusion
that "2 = 2.0" must be true. (This is perfectly natural on mathematical
grounds as well.) And that in turn forces the conclusion that integer(2.0).
I find some of these conclusions pretty revolting. We are warned in all
good textbooks on numerical analysis "=" and "=/=" are virtually meaningless
in floating point computations, and the most one should ever use is
abs(X-Y) < epsilon.
C Prolog gets around this problem by being a single language system. The
only way you can get a floating point number is by reading one in, getting
one from name/2, or by doing some arithmetic, and all of these sources
check whether the result is equal to some integer, and if so yield that
integer instead. (This never loses any information.) With this guarantee
that a number has a floating point representation if and only if it is not an
integer, all of these problems go away. The price of this is that the
"Fortran-divide" solution doesn't work: 2.0/4.0 clearly "ought" to be 0.5,
but the numbers will actually be represented as 2 and 4, so the answer 0
will be computed. The only consistent interpretation we can give to
the solidus operator in C Prolog is unconditional floating point division.
I do not know what Prolog-X does. NIP is not yet finished, but its
unification code explicitly checks for integer -vs- floating point
comparisons, and so accomplishes 2 = 2.0 the hard way. No doubt
it will implement integer(X) in a similar fashion. But there are
so many primitives that want one or more integer arguments. Is
name/2 to accept 97.0 as a character code for "a"? Being single
language systems, both these Prologs could adopt C Prolog's solution.
Note that we have three choices:
be consistent by forcing floating point results into an integer
representation whenever possible. This requires a check in name/2
and is/2 and nowhere else (if there is a routine for constructing
"boxed" floats, that is the place to put the check).
be consistent by converting floating point results whenever an
integer is wanted. This requires checks all over the place, and
is likely to have a substantial performance cost.
be consistent by ruling that 2.0 < 2 but there is no number lying
between them
be inconsistent; have 2 == 2.0 but 2 \= 2.0 and not(integer(2.0))
A mixed language system such as PopLog or a Prolog cohabiting with
PSL (or even a Prolog cohabiting with Fortran or C) cannot
adopt the first solution, because the other language(s) can produce
any floating point value they please. The cost of the second
solution is prohibitive. It is in any case far too easy to miss a
case (such as get0(32.0)) whereupon you're still inconsistent.
Many implementors are likely to prefer the fourth solution, on the
grounds that if Lisp says (EQN 2 2.0) then Prolog should agree. On
the other hand, whenever you have a floating point value 2.0, round-off
error means that the true value might have been 2.0-epsilon, and
really one has no business using =:= or =\= to compare floating point
values at all. Now if N is an integer and X is a floating point
number different from N, the following are all true:
X < N iff X-epsilon < N
X > N iff X-epsilon > N (since X \= N)
X > N iff X-epsilon > N
X < N iff X-epsilon < N (since X \= N)
and of course all comparisons of floats with floats are preserved.
Since epsilon is an infinitesimal, X =:= N will never succeed
when it should have failed, and though it might fail when it should
apparently have succeeded, round-off error (perhaps in the input
routine) might have made it fail anyway.
There are many good reasons for distinguishing between integers and
floats. For any integer N, we expect that N+N+N will be an integer
(or that the system will announce an integer overflow) and that
(N+N+N)divN will be 3. But if X is a floating point number
which happens to equal N, X+X+X will not in general be the representation
of an integer value (because of roundoff error), and if X is positive
we can confidently expect that (X+X+X)/X < 3.0 for some values of X.
In order to maximise both efficiency and clarity, then, we have no choice
but to rule that
no floating point value ever unifies with, compares equal to as a term,
or is regarded as having the same numerical value as an integer
An unpleasant property of floating point representations is that
float(N) < N and float(N+1) > N+1
can both be true. So it seems unimportant to me whether N.0 is always
taken to be less than N, greater than N, either depending on the sign,
or either depending on the 17th bit when the moon is full. We're not
going to get anything very pleasant no matter what we do. We should
insist that if X < and M < N then X < N but this is
easily accomplished by perturbing each X based on X's value and not on
the value we are comparing it with.
I'm afraid that in this case consistency with the host language
can go hang. If you have a program in Lisp (or FORTRAN) that
depends on N.0 comparing equal to N for some range of integers, you
can be quite sure that it will stop working if you run it on any
other machine.
The final question about floating point is what special functions
should be available and how should they be defined. Answers such as
"copy FORTRAN" make sense. But I would like to make a different
suggestion: copy Common Lisp. That is, any extension to the
range of forms understood by 'is' beyond those described here
should imitate Common Lisp as closely as possible. Given that a
large chunk of the Lisp community has decided to pull together
instead of apart (and even InterLisp-D has added some functions
from Common Lisp, though it has not changed its old ones), it is
difficult to imagine any excuse for making Prolog differ from
Common Lisp where we can make it agree.
Special "constants"
C Prolog adds "pi", "e", "heapsize", "stacksize", and "cputime"
as nullary operations. They return special constants, or information
about the current state of the program. The former use is quite a
good one, and implementation constants such as min_int, max_int,
macheps, and so on (see almost any discussion of software portability)
would naturally be handled in this way. The latter use is not
really satisfactory. It is a substitute for Dec-10 Prolog's "statistics"
predicate, and portability would have been better assisted by something
that looked like "statistics". I find it unpleasant that an arithmetic
expression can have one value at one time and other value at another time,
and it doesn't help program transformation tools (such as the unfolder)
much. There is little to lose by having an implementation-dependent
statistics(Key, ValueOrList)
predicate. In order to make life easier for compilers, other program
manipulation tools, and human readers, I suggest the following design
principle for arithmetic operations:
The value of any arithmetic expression f(X1,...,Xn) may only depend
on the functor f/n and the values of the arguments X1,...,Xn.
This doesn't mean that C Prolog can't have its little instrumentation
functions, only that they aren't standard, and that a code unfolder
is entitled to rearrange code in ways that will change the values of
heapsize, stacksize, and/or cputime. New Prolog systems should provide
a statistics/2 feature, perhaps in a machine dependent module.
We should be able to agree on a set of basic constants (and machine-
dependent constants) which could usefully be in the standard. There
are currently no standard special constants.
Bitwise Operations
We would like to get the same answers no matter what machine we are
using, provided its integer arithmetic has "enough" bits. The way to
ensure this is to regard each positive integer as a bit string with
infinitely many zeros to the left, and each negative integer as a bit
string with infinitely many ones to the left. This doesn't mean that
the machine has to do 2s complement arithmetic, only that whenever we
perform bitwise operations we get the expected answer. (We could have
defined sign and magnitude, but what happens to the sign bit when you
shift? And sign and magnitude and ones complement both have the
negative 0 problem.)
This definition also makes sense of bitwise operations on arbitrary
precision integers.
All the bitwise operations demand integer operands. This definition
of what a number means as a (logically infinite) bit string makes
sense of bitwise operations on arbitrary precision integers.
The result will be stored in the most compact format available, but
this only matters in a mixed language system, since Prolog does
not and MUST not have any way of telling how an integer is stored.
(There's no reason why the "haulong" function of Lisp could not be
provided. But that is defined solely in terms of a number's value.)
X /\ Y does a bitwise and on X and Y.
X \/ Y does a bitwise or on X and Y.
\Y does a bitwise complement.
X << N shifts X left N bits (right if N < 0).
X >> N shifts X right N bits (left if N < 0).
An interesting thing to note about shifts is that if we had chosen
any specific word length (such as 18 bits) we would have had to choose
between "logical" and "arithmetic" shifts. But as we regard integers
as extending infinitely far to the left (and must do so for portability
between machines of different word sizes or to handle arbitrary precision)
both kinds of shift coincide. What we get is arithmetic shifts.
And although the manual doesn't actually say so, that is what Dec-10
Prolog (and C Prolog) have always used.
These bitwise operations are in principle all we need. We could even
make do with (X\Y) meaning \(X/\Y) and >>. (<< can be obtained from
plus/3.) However, it is often convenient to have more operations
around. Common Lisp has an extensive and well thought out range of
bitwise operations, and I suggest that we copy them. That is,
No Prolog implementation is obliged to provide any arithmetic
facilities beyond those described here.
Any additional arithmetic functions should use the names, argument
orders, and semantics of suitable Common Lisp functions, except that
nospread functions may be implemented as binary operations.
What happened to the design principle?
Applying the design principle to predicates which evaluate arithmetic
expressions we find that: if there any constant in the tree which is
not an integer, a float, or a special constant, or if there is an compound
term whose principal functor has no evaluation procedure, the predicate
in question should type fail. If it doesn't type fail, but the tree
contains any variables, the predicate should report an instantiation
fault or delay. It isn't particularly hard to get this right. One way
is to walk the tree twice, the first time looking for an excuse to type
fail and setting a flag if there is a variable. After this walk, if a
variable was detected, report an instantiation fault. Otherwise walk
the tree again computing its value, possible reporting overflow.
Of course this is expensive. And if we're going to specify error
reporting to this level of detail, we might as well specify the order
in which the tree is walked, so that if more than one overflow could
be reported, you can guarantee which. That would be a reasonable
thing to do, but it seems more useful to relax the rule and to say that
the subexpressions of an arithmetic expression may be evaluated in any
order
if more than one error could be reported about the same arithmetic
expression, exactly one of them must be reported, but it is not defined
which
because a continuable error in an arithmetic expression does not imply
that there are no non-continuable errors, continuable errors are not
continuable when reported from arithmetic expression evaluation
Comparison
The six arithmetic comparison predicates are easily defined:
X =:= Y :- foo(X, Y, =).
X =\= Y :- baz(X, Y, =).
X < Y :- foo(X, Y, <).
X >= Y :- baz(X, Y, <).
X > Y :- foo(X, Y, >).
X =< Y :- baz(X, Y, >).
foo(X, Y, R) :-
A is X,
B is Y,
compare(R, X, Y).
baz(X, Y, R) :-
A is X,
B is Y,
\+ compare(R, X, Y).
foo and baz are to be taken as macros, and do not form part of the
standard.
Note that the signs are "=<" and ">=", not "<=" or "=>". In a
logic based programming language which is often used to manipulate
logical or denotational formulae, or to implement term rewriting
systems, arrows are too precious to waste on comparison. Note
further that these six relational operators must evaluate their
arguments; the failure of some otherwise excellent Prolog systems to
do this makes it disproportionately hard to move Dec-10 Prolog code
to them. (PDP-11 Prolog had the same problem, but then PDP-11
Prolog didn't have negative integers, so one had far worse problems
with "X < 0" than bothering about whether X was evaluated or not.)
It is true that it is useful to have comparison predicates which do
not evaluate their arguments. In fact we have three ways of
comparing two numbers:
lt(X, Y) X and Y must both be integers
X @< Y X and Y may be any terms, but the number order is right
X < Y X and Y are arithmetic expressions
With this range of expression there is no need to restrict '<'.
{Term comparison will be dealt with in another note.}
A pretty extension would be to define these relational operators
as arithmetic functions. The definition would be
if X R Y then (X R Y) evaluates to X.
otherwise the predicate evaluating X R Y fails.
This could be useful in "X is Y-1 > 0", but its intended use
is of course for X < Y < Z where we expect floating point
numbers to be involved.
The main reason for bringing this up is not to suggest that it is
useful enough to go in the standard, but to suggest that it is
useful enough for the standard to rule out any other interpretation
of these symbols as arithmetic operators, in particular to rule out
an interpretation similar to that of C or PL/I (values 0 or 1).
A note on ':='
Some people have suggested that ':=' would be a more "natural"
symbol than 'is'. Since this standard does not define ':=' to be
anything, there is nothing to stop someone providing ':=' as an
alternative to 'is' if they're silly enough. The point of this
section is to say why it would be a bad idea.
The first reason is that ':=' really is not all that natural.
It is only natural to Algol/Simula/Pascal/Ada programmers. To
a Prolog programmer it looks like a mis-typed '=:='. To a POP
programmer, the natural assignment is Expr -> Var. To a
Lisp programmer, setf(Var, Expr) might be more natural.
Given that 'is' is single assignment, not destructive
assignment, an even more "natural" syntax might be
let Var be Expr.
This brings me to the second point. Those programmers who
understand ':=' will be misled by it. In Pascal,
X := 1;
X := 2
is legal, meaningful, and leaves X = 2. In Prolog,
X is 1,
X is 2
is legal, meaningful, and equivalent to fail. In Pascal,
1 := 1
is illegal and meaningless, while in Prolog,
1 is 1
is legal, meaningful, and true. I fail to see what we can
possibly gain by lying to the programmer this way.
The third point is that it really is very stupid to take
away a possibly useful symbol when we already have a perfectly
good name for the operation in question. A Prolog system could
well be augmented with true destructive assignment. For
example, in old Dec-10 Prolog (this compiler is no longer available
to users, it is just used for building the Dec-10 Prolog system
itself), you can write
haulong(N, B) :-
N1 is local, B1 is local,
N1 := N, B1 := 0,
while N1 > 0 do (N1 := N1/2, B1 +:= 1),
B is B1.
Here := and +:= are genuine destructive assignment. A Prolog
system embedded in Lisp or Pop might well have a use for
atom := Expr
as equivalent to
T is Expr, lisp(set(atom,Expr))
While the use of destructive assignment generally would be a
regrettable retrograde step, it may have a place in low level
interface code, and if it is to be used at all it would be as
well to use a symbol which some people understand.
This is too vague to be a requirement of the standard, so it is
just a recommendation:
':='/2 should be used only for true destructive assignment
The Interface to the Host Language
Compiling arithmetic expressions (where variables can be bound to
arithmetic expressions and not just to integers) is tricky enough.
Recognising arbitrary host language forms is out of the question.
That being the case, it is difficult to justify recognising any
Lisp or Pop forms in arithmetic expressions. A very strong point
against using "is" as the interface between Prolog and a host
language is that in most host languages it won't work. Consider
"X is Y*Z". In InterLisp, * is a comment function. In this
standard, * works on floating point values (as it does in PopLog).
But in the MacLisp family, while * does mean multiplication, it
means integer multiplication.
A Prolog compiler can learn a lot from an arithmetic expression.
The first argument to is/2 must end up bound to a number. Any
other source variables are known to be bound to ground arithmetic
expressions (because otherwise a non-continuable fault would have
been reported), and this fact can be used to avoid re-evaluating
a variable. E.g. if we have
p(Y, Z) :-
Y > 1,
Z is (Y+1)/(Y-1).
the compiler can stash the result of evaluating Y in the Y > 1
literal away in a handy place in use it in the next literal. But if
Y could be bound to an arbitrary host language call, it might have
side effects which make this caching give the wrong answer.
It is therefore expedient to restrict arithmetic expressions to
performing arithmetic.
There are at least three other ways of calling a host language such
as Pop or Lisp. All of them involve an "apply" style of interface,
where Prolog instantiates arguments but does not evaluate
them. This simplifies the question of which parts of a host call
Prolog should just instantiate and which parts it should give to the
host to evaluate. Only the top level gets evaluated. If the host
language has 'eval' (and Pop has 'popval') this is no restriction.
It seems a good principle for writing Prolog compilers that
a Prolog compiler in a mixed language system should be completely
ignorant of the syntax and special forms of the other languages
in a mixed language system Prolog should be able to call functions
in the other languages, passing data structures constructed in the
same way as data structures passed to Prolog predicates
In general, it will not be possible to implement Prolog procedure
calls and host language procedure calls in the same way and an
interface primitive will be needed.
Control Structures.
At first sight, defining Prolog's control structures is supererogatory.
After all, Prolog is based on Horn clause logic, and once you have explained
that clauses are tried from top to bottom and that goals within a clause
are tried from left to right, there is nothing left to explain except the
subtleties of the cut. Unfortunately, it isn't true. Prolog has several
control structures, and some of them look as though they can easily be added
to a basic Prolog system by adding ordinary clauses. Unfortunately, the
definitions it is easy to add are wrong.
Another reason for including this section is that some useful control
structures have been developed which are not in the Dec-10 Prolog manual
(though they are in the library). To stop people inventing the same
control structures over and over again, we need to have agreed names and
semantics for them. It should go without saying, but experience has
regrettably shown that it does not, that one of Prolog's greatest virtues
is its simplicity and the "flat" nature of Prolog code. To introduce a
great many special forms and to nest special forms deeply in a program
are both folly. It is reasonable to expect that a Prolog compiler will
understand most of the forms in this section, and will not understand others.
A third reason for this section is to permit some optimisations in
Prolog interpreters. To this end I suggest the following principle:
"Assert" may rewrite any special form in any way provided that the
transformation is provably equivalent to the original clause without
reference to any user-defined predicate. Calls to user-defined
predicates, and their arguments, may not be rewritten.
The point of this is that if I call
assert((p(X) :- true,(p(X+1),(Y=1,Y=2))))
and later look at the data base using listing or clause,
I have no right to complain if I see
p(X) :- p(X+1), fail.
but I do have a right to complain if I see any of
p(X) :- true, fail.
p(X) :- fail.
p(X) :- Y=1, p(X+Y), Y=2.
p(X) :- p(X+Y), (Y=1,Y=2).
p(X) :- call(p(X+1)), fail.
In particular, conjunctions and disjunctions may be "flattened"
(so that the left argument of (A,B) never has ,/2 as its principal
functor), 'true' need no appear at all, and variables appearing in
a goal position may have call(_) wrapped around them. Further, if
a Prolog system is smart enough to spot that a call to some system
defined (but not user defined) predicate must fail, it may
replace it by a call to some system defined error reporting predicate.
It is important to permit an implementation this freedom so that
efficient methods may be used to store clauses instead of having to
store the literal form that is given to assert. Note that we do
require that the clause head and all calls to user-defined predicates
in the body be preserved intact (apart from the inevitable variable
renaming). In particular, while 'assert' is allowed to optimise
end_of_line(C) :- C is 0'J-64.
to
end_of_line(C) :- C = 10.
it is not allowed to optimise it to
end_of_line(10).
and it is not allowed to optimise a false clause out of existence.
Thus the sequence of clause heads found by clause or listing must
be exactly the same in every implementation.
It could be useful to allow other transformations on clauses.
One that would be particularly pleasant would be to rewrite grammar
rules into ordinary clauses, thus letting a program assert new
grammar rules. However, in the interests of portability,
'Assert' must not transform a clause except as permitted
in the previous boldface paragraph.
Why is this non-restrictive? Because the user doesn't have to
call the assert family directly. If you want to assert grammar
rules, there is nothing to stop you defining
expand_assert(Formula) :-
expand_term(Formula, Clause),
assert(Clause).
and using that instead. This applies to any form of macro expansion.
Using expand_assert instead of providing hooks in the ordinary
assert predicate means that macro expansion has to be implemented
only once, and the user does not have to decide whether to expand
a certain macro at read time or at assert time.
true
Is defined as if by the clause
true.
and no others. As with all standard predicates, it must be impossible
to change the effect of this predicate, and although it has a clausal
definition, that definition need not be visible to the user and if
visible need not agree with that given here.
'true' is mainly needed so that clause/2 can supply something in its
second argument as the "body" of a unit clause. It is also needed
for conditionals, so that we can write (a->true;b->c;d). This too can
be seen as supplying the body of a unit clause.
Although "true." and "true. true. true." are logically equivalent,
they are not procedurally equivalent. True must succeed exactly
once. However, clause(true, X) might report any number of clauses,
and any of them might potentially succeed. All that is required is
that at most one of them will succeed in any given call, without
producing any side effects.
fail
Is defined as if by the clause
fail :- integer(fred).
Note that there is no guarantee that clause(fail, X) will find no
clauses. It might find any number. But whenever called, none of them
may succeed, and there may be no side effects, not even any output.
(Appearing in a trace is a main effect of the tracer, not a side effect
of fail.)
repeat
Is as if defined by the clauses
repeat.
repeat :-
repeat.
These clauses may or may not be visible to the user. The key
property of repeat is that it must not use up any stack space.
The question ":- repeat, fail." must run until stopped by the user.
Repeat is permitted one side effect: on any "call" (but not on
"redo") it may print an insulting message on the standard error
channel.
repeat(N)
Is as if defined by the clauses
repeat(N) :-
ge(N, 1).
repeat(N) :-
gt(N, 1),
succ(M, N),
repeat(M).
These clauses may or may not be visible to the user. Repeat(N) must
not use up any stack space. If given a variable as argument, it must
report an instantiation fault. Given anything other than a non-negative
integer, it may type fail or it may just fail. Repeat(N) is required
to have one side effect: on every "call" but not on any "redo" it must
print an insulting message on the standard error channel.
The reason for putting this in the proposed standard is that there are
Prolog systems which use this, and we don't want it meaning anything
else. A good definition for repeat(X) might have been
repeat(Goal) :-
( call(Goal), fail
; true
).
i.e. repeat the Goal until it fails. So that programs written for
these other Prologs can be run in a standard conforming Prolog, repeat/1
may not be used for this purpose. That's the only reason that repeat/1
is in this standard, to stop the name being used for any sensible purpose.
There is a better way of getting the same effect, and that is to call
between(1,N,X) for some possibly anonymous variable X. That is why
repeat/1 is required to print an insulting message, so that the code
will be converted to the standard form.
call and apply
The fundamental predicate in this group is call/1. For any given
X{Footnote: X is a meta-variable, not a Prolog variable},
call(X) is required to behave as if call/1 had exactly one
clause and that
call(X) :-
X.
Just to make this a wee bit clearer, I do NOT mean that call should
behave as if there were a clause
call(X) :- % non-example
X. % non-example
Many Prolog implementations would translate this to call(X):-call(X)
as they are entitled to. No, I mean that call(p(1,X,Y)) should behave
as if the definition was
call(p(1,X,Y)) :- % example
p(1, X, Y). % example
and that call((p, !, q; r)) should beahve as if the definition was
call((p, !, q; r)) :- % example
p, !, q; r. % example
There are two consequences of this which deserve to be spelled out
in full. One is that call/1 is more like LISP's "eval" than like
"apply". That is, the term being called can be an instance of
any control structure at all: a simple predicate call, a forall/2,
an if-then-else, another call/1, and so on. The second is that
while X may contain cuts, those cuts cannot escape from the
call and affect the parent goal. That is call(!) is allowed and
is equivalent to true.
A variable G appearing as a goal is equivalent to call(G).
Note that this is not quite what you would expect from
substitution, and is not what a number of toy Prologs written
in Lisp do. Suppose we know that z/1 has no clauses. Then
?- assert(( z(X) :- repeat, X )),
clause(z(X), Body),
X = !.
binds Body = (repeat,call(!)) which is equivalent to repeat.
But
?- X = !,
assert(( z(X) :- repeat, X )),
clause(z(X), Body).
binds Body = (repeat,!) which is equivalent to (!). Now
any program where this makes a difference is very badly
written. While the fact that a variable goal can be rewritten
by assert into call(G) does spoil some of the substitution
properties, it is actually a very useful thing for programming.
It means that a predicate which calls one of its arguments
does not have to worry about a cut being smuggled in inside a
Trojan Horse. This integrity benefit is well worth having,
whatever the meretricious attractions of micro-PROLOG's
"meta-variable".
This does raise another question. What is a Prolog system to do
if a goal is still a variable, or is some constant other than an
atom? call/1 has to worry about this. Since a constant other than
an atom can never have a definition, it obviously doesn't make
sense, so assert can rewrite a clause, turning 3 into
call(3) or it can simply reject such a clause as ill formed.
So only call/1 has to worry about it. The answer is clearly that
call should be much like other predicates: an unbound variable is
an instantiation fault and a non-atom constant is a type failure.
There is one point though: if it is at all possible the error
message should identify the caller of call/1, as that is where the
error "really" is. So we have the following cases for call(G):
var(G) -> instantiation_fault;
atomic(G), \+ atom(G) -> type_fail(1,compound);
system(G) -> call the system predicate;
current_predicate(G) -> interpret the clauses for G if any, or
call the compiled code if any;
otherwise, G is an undefined predicate.
This needn't stop an implementor using an integer (perhaps the index
of a branch of a case statement, or the address of a procedure) as
the body of a clause in bootstrap code. It only means that such
clauses must never be visible to the user, and it must not be
possible for the user to create such clauses, and if the user
does call(72) it must type fail instead of invoking primitive 72
(even if there is such a primitive). If the bootstrap is such that
there is a need for a predicate like call/1 but which is capable of
invoking primitives this way, fine, but it must have another name.
apply(Pred, ArgList) :-
Pred =.. [F|GivenArgs],
append(GivenArgs, ArgList, AllArgs),
Goal =.. [F|AllArgs],
call(Goal).
This is the predicate which has been used up till now for
applying a computed predicate to a computed argument list.
It has a number of serious disadvantages. First, it is
usually implemented exactly as written above. This is
wasteful. It creates a new list which is never needed again,
and a new compound term which is never needed again. If it
is built in at a low enough level there is no need for these
data structures. Even if apply/2 takes care not to construct
superfluous data, the calling site will still construct an
argument list which isn't really needed. The second problem
is that even when we know what Pred is like, it is difficult
(in the general case, impossible) to type check a call to apply.
Every time I have seen apply used, there has been a specific
number of arguments at the calling site, commonly one or two.
Therefore, I propose a family of predicates.
call(p(X1,...,Xm), Y1,...,Yn) :-
p(X1,...,Xm,Y1,...,Yn).
That is, call(P, X, Y) is equivalent to apply(P, [X,Y])
except that no extra data structures have to be created
and it can easily be type checked. The requirement for
P is the same as for call(P): it must be an atom or a
compound term. For complete consistency, call/N for N>1
should be able to handle control structures too; since
call((A,B)) works so should call(,(A), B). However, it
may be considerably easier to implement this operation
if it only works for sytem and current predicates, and
does not work for arbitrary control structures. So
call/N may be thought of as "apply", with call/1 being "eval".
apply would have been the preferred name for this operation
if only it hadn't already been used for the list version.
once
Is defined as if by the clause
once(Body) :-
call(Body),
!.
Since cuts cannot get through "call", a cut which appears in the
Goal is confined to the once(_) form and does not have the effect
of cutting the parent goal.
Once need not be implemented exactly this way. It is quite
straightforward for a Prolog compiler to open-code this form,
but if it does the final cut and any internal cuts must be implemented
in a way which cannot be expressed in Prolog source language.
If the Body contains no cuts, the expansion (Body->true) can be used.
Negation
The form \+ Body is defined by macro-expansion into
( Body -> fail ; true )
Since disjunction and if-then-else are transparent to cuts, so should
negation be. That is, if we write
p :- \+ !.
p.
this should have exactly the same effect as
p :- ( ! -> fail ; true ).
p.
which is to say that every call to p must fail. If however, we write
p :- X = !, \+ X.
p.
this is equivalent to
p :- X = !, ( call(X) -> fail ; true ).
p.
and as the cut cannot escape from the call, every call to p must
succeed (in the second clause).
This is all rather unpleasant. What is particularly unpleasant is that
it makes Dec-10 Prolog non-standard. (On the other hand, it was
non-standard already, because the compiler does not understand if-then-else).
I have tried several times to discover a rule which would enable a user
to predict which special forms were transparent to cuts and which were not.
Making conjunction the only transparent form would make sense, but the
language which results is not powerful enough to write an intelligible
meta-circular interpreter. To do that disjunction has to be transparent
to cuts as well. The cleanest rule I could come up with was that
conjunction, disjunction, and if-then-else should be transparent to cuts
(as they are in Dec-10 Prolog), and that any other form was transparent
if and only if it could be macro expanded into a form involving these
three only, without needing call/1, which is definitely opaque to cuts.
A Prolog implementation which gets ',', ';', and '->'
compatible with the Dec-10 and C Prolog interpreters can easily get
'\+' right by expanding it in assert to precisely this form.
The requirement that \+ X behave exactly like its macro expansion is to
ensure that program processing tools which only understand the basic
three operations can process \+ X correctly by performing the macro
expansion. In general this permits X to be compiled directly instead
of constructing a data structure, and this can be no bad thing.
In fact I would be very surprised if there was any code written by
Prolog users (as opposed to Prolog implementors testing their systems)
which ever has a cut in a negated form. If there is such a cut, the
desired behaviour can easily be obtained by defining an auxiliary
predicate whose body is the form to be negated, and negating a call
to that. Although this definition is different from previous
Prologs, the practical impact is slight, and I think we can get away with it.
not
\+ is part of Dec-10 Prolog, 'not' is not. However, there is a library
predicate of that name which checks that its argument form is ground.
Prolog-X also adopts this convention. It seems like a good one. In
order for this check, and the error report which may follow, to proceed,
it is necessary that the form exist as a data structure. This therefore
involves "call", so cuts cannot escape from "not" even though they can
escape from \+.
A coroutining Prolog should delay not(X) until X is ground.
In fact a smart compiler can do a little better than that. Suppose X
is a disjunction of conjunctions of literals. (We can convert to DNF.)
If each disjunct is ok, the call can proceed, where a conjunct is ok
if the only unbound variables in it are referenced nowhere outside the
conjunction. What this means for a compiler is that in
disjoint(S1, S2) :-
not (member(X, S1), memberchk(X, S2)).
it is safe to proceed with the call provided S1 and S2 are ground.
But in
non_member(X, S) :-
not(member(X, S)).
it is not safe to proceed with the call until X is ground as well as
S, because X can be seen outside. So a compiler should handle
sound negation by compiling code to check that the variables which
appear elsewhere in the clause are ground. If you write not(X).
A sound if-then-else can be defined as well. We might as well copy
MU-Prolog on that one.
forall
Is defined by macro-expansion. forall(Gen, Test) is equivalent to:
\+ (Gen, \+ Test)
which is in turn equivalent to
( Gen, ( Test -> fail ; true) -> fail ; true )
Logically, this is bounded quantification. For example,
subset(Small, Big) :-
forall(member(X, Small), memberchk(X, Big)).
Procedurally it is just another form of failure driven loop,
permitting iteration without recursion. It is only useful
when the only information to leave the loop is the success or
failure indication.
Used sparingly, particularly when bounded quantification is
intended, this can make programs clearer. But in Dec-10 Prolog
it is avoided because the compiler generates a call to the
interpreter, and remembering to put in all the public declarations
can be a pain. One would be tempted to use the expanded form, if
the Dec-10 compiler understood it.
Calls to forall can be open coded quite effectively. However,
as it has this macro definition, we see that a cut inside either
the generator form or the test form can escape and cut the parent
goal. Just like negation, this differs from existing Prologs.
Just like negation, I don't think that matters.
Conjunction and Disjunction
Apart from the fact that disjunction is transparent to cuts, these
are generally understood. However, there is a problem. The punctuation
marks use to represent these operations are not just a matter of syntax,
some functor must be used to represent them in clauses. There'd be
no problem about that, as the Clocksin & Mellish book says quite clearly
what they are.
However, a lot of people would like to see '&' used for conjunction
and '|' used for disjunction. Now '|' is already used in Dec-10 and
C Prolog as a synonym for ';', but this is not entirely satisfactory
as (A|B) is the only instance where functor((A<op>B),<op>,2) fails.
There is no difficulty in defining what Dec-10 Prolog does. But in
a system where the native syntax used '&' and '|' for conjunction and
disjunction it would be difficult to justify ',' and ';' popping up
in clauses. It seems to me that there are two reasonable courses to
pursue. One of them is to say that the internal form must correspond
to Dec-10 internal form, whatever syntax is normally available to users.
That would work, but the "standard" might end up being ignored. The
second alternative is to say that an implementation can use whatever
functors it please (even to having say a special functor '$!'/2 with
$!(A,B) meaning A,!,B), but that it must support some standard kit of
predicates for manipulating formulas.
Let me make this absolutely clear: I do not see any point at all in
changing the internal representation from (A,B) for conjunction and
(A;B) for disjunction, with (A->B) being used in if-then-else forms.
I am putting forward the second alternative in this proposed standard
to make it acceptable to a number of UK users whose infatuation with
the beauty of '&' and '|' renders them incapable of seeing that to
use those symbols is to lie to the user.
The suggestion, then, is that there should be some standard predicates
for constructing formulas and taking them apart.
prolog_clause(Clause, Head, Body)
prolog_bounded_quantification(Formula, Generator, Test)
prolog_negation(Formula, PositivePart)
prolog_if_branch(Formula, Hypothesis, Conclusion)
prolog_conjunction(Formula, ListOfConjuncts)
prolog_disjunction(Formula, ListOfDisjuncts)
Versions of these predicates for Dec-10 Prolog are given in an
appendix. The point of taking the conjuncts &c as lists is to
emphasise Prolog's right to normalise these operators, indeed, to
allow for the possibility that the internal forms may be some form of
decorated lists.
prolog_clause works with any formula, as any atom or compound term can
be used as a unit clause. (Indeed, one has to be very careful, as
(a:-b) :- p is a perfectly good clause.) The other predicates,
if their first argument is instantiated, will only succeed when the
formula is the kind of thing they are looking for. This leads to the
anomaly that prolog_conjunction(a,X) fails, but prolog_conjunction(Y,[a])
succeeds, binding Y=1a.
Given a Formula, these predicates take it apart and match the pieces
against their other arguments. With a variable in the first argument
position, they construct a formula from their second argument and instantiate
the first argument to it. These predicates are not truly bidirectional.
For a given Formula, if prolog_conjunction(Formula,L) succeeds it does not
necessarily follow that prolog_conjunction(F,L) will instantiate F to
Formula (though it will succeed, and will instantiate F to something
with the same effect as F). And for a given List, prolog_conjunction(F,List)
having succeeded (and it might fail, if a top level element of List is say
an integer) does not mean that prolog_conjunction(F,L) will even succeed,
let alone instantiate L to a copy of List. What is required is
something rather weaker:
for given L0
if prolog_conjunction(F1, L0) succeeds
then prolog_conjunction(F1, L1) succeeds
and prolog_conjunction(F2, L1) succeeds
and F2 == F1.
for given F0
if prolog_conjunction(F0, L1) succeeds
and prolog_conjunction(F1, L1) succeeds
then prolog_conjunction(F1, L2) succeeds
and L2 == L1.
with similar axioms for prolog_disjunction and prolog_if_then_else.
Any formula has a unique decomposition, if any, it is just that the
formula might have been initially constructed by some other means, and might
not actually make sense. For example, we might find that
prolog_conjunction((1,2), [1,2]) succeeds
but prolog_conjunction(X, [1,2]) fails.
This section of the draft is the most tentative. All things considered,
it might be much cleaner to have separate compose_THING and decompose_THING
operations.
If-then-else and "cases"
Let us suppose that conjunction and disjunction are well understood.
We can explain Ken Kahn's "cases" construct by introducing updatable
variables into Prolog (for exposition only!) and translating
cases
Hypothesis1 -> Conclusion1
; Hypothesis2 -> Conclusion2
...
; Hypothesisn -> Conclusionn
as follows:
DoneIt is local,
Doneit := 0,
( Hypothesis1, DoneIt := 1, Conclusion1
; DoneIt =:= 0, Hypothesis2, DoneIt := 1, Conclusion2
...
; DoneIt =:= 0, Hypothesisn, Conclusionn
)
Every case except the first has the test "DoneIt=:=0" before its
hypothesis (it is redundant before the first case). Every case except
the last has the assignment "DoneIt:=1" after its hypothesis (the
variable is dead after the last case). If a case lacks the if-then-else
arrow, the translation retains the test but drops the assignment.
This corresponds almost exactly to the desired logical reading of an
if-then-else:
( H1 and C1
or not H1 and H2 and C2
...
or not H1 and not H2 and... and not Hn-1 and Hn and Cn
)
where the negations are not actually calculated but are implicit from
position. In particular, the hypotheses may backtrack; having
passed an arrow once it thereafter acts just like a comma. Since
tests are generally determinate, this makes little or no practical
difference.
The "cases" construct is implemented in C Prolog is exactly this fashion.
Given a reasonable Prolog compiler which can handle disjunction, it is
very easy to add the cases construct, whether using this technique or
whether by fiddling with the choice point list directly. Note that
in so far as it can be expressed in Prolog at all, the cases construct
is open coded, so that it is "natural" for it to be transparent to cuts.
Cases and if-then-else are to a large degree a substitute for the cut,
and one might expect that a programmer who used them would not be putting
cuts in them. So it might seem a matter of indifference what a cut inside
a cases or if-then-else does. However, there is at least one Prolog
programmer here whose code includes
p(...) :-
...
( ... -> ! ; true ),
... .
I laughed immoderately when I saw it, then groaned when I realised that
he really meant to keep this in his program and thought it genuinely
useful. He does at least deserve to be told when this will work and
when it will not, while the rest of us continue to look for efficient
alternatives to the cut that make our code even clearer, the Philosopher's
Stone, the Elixir of Life, and a 1 GigaLIPS machine.
The conventional if-then-else form is exactly the same except that it
doesn't have "cases" in front.
( H1 -> C1
H2 -> C2
...
; Else
)
is almost the same as
(cases
once(H1) -> C1
; once(H2) -> C2
...
; true -> Else
)
except that cuts can escape from the Hi (should anyone be so incredibly
foolish as to put them there) as well as the Ci. The if-then-else
arrow functions exactly as a cut with narrow scope. It is correspondingly
easy to implement.
My feeling is that for most practical programs it doesn't matter whether
we use the if-then-else form or the cases form, and that beyond that it
is too early to say which is better. There is a fair bit of existing
Dec-10 and C Prolog code which relies on the current if-then-else and
its interaction with cut. So the if-then-else has to be part of
the standard. Someone might yet want to port a program from
LM-Prolog to Standard Prolog, and anyone might want to experiment with
it. To ensure that everyone who implements it does so compatibly, the
"cases" form had better be part of the standard as well.
On the other hand, the implied negations are only sound when they are
ground, and in that case it ought not make any difference whether we
write (A -> B; C) or (not(A) -> C; B). So perhaps we don't
really need "cases" after all.
The syntax I've shown for "cases" is the C Prolog syntax. I don't
particularly care for it, and invite suggestions. That aside, both
should be in the standard, and both should be transparent to cuts.
setof and bagof
These should be as described in David Warren's paper and as coded in
the Prolog library file SETOF.PL. But there is one point which needs
consideration. There is nothing in the specification of these
predicates which says that they should do anything to the data base.
The definitions in SETOF.PL make no permanent change to the data base unless
they are aborted (by 'abort' or ^C and "a"), and are set up so that if they
are aborted the junk thus left behind will not interfere with future
calls to these predicates. However, it is there. This has two
effects. First, if you interrupt CHAT-80 often enough it can fill up the
data base with junk that the system has no way of detecting should not be
there. Second, while the setof or bagof is proceeding, these working data
are visible in the data base to an inquisitive program and vulnerable to
a malicious one.
The visibility and vulnerability can be taken care of by making the key
used an atom which is not accessible to the user's program. For example,
the SRI versions of C Prolog have a way of hiding an atom by removing it
from the symbol table, so using keys composed from '$' and then hiding '$'
would do the trick.
The problem of filling up the data base with junk if a computation is
interrupted is perhaps not so important. User code is faced with that
risk no matter what we do. However, it would be no bad thing if the
'abort' predicate cleaned out whatever part of the data base was used by
these predicates.
Best of all would be if these predicates didn't use the data base at all.
The code in SETOF.PL uses the recorded data base rather than the clause data
base both to keep this junk out of harm's way, and also to avoid compilation.
A system which compiles clauses when they are asserted makes setof, bagof,
and findall unbelievably slow. These clauses or records are also special
in that we know when we retract them that there is only the one reference
to them (or at least we would like that to be true). It might even be
advisable to have special code for copying these notes into the heap and
pulling them out again without any compilation or indexing.
The second argument of bagof/3, findall/3, findall/4, or setof/3 is
given to call/1, and so cuts cannot escape from it. This need not
prevent these predicates being compiled open. None of the control
structures mentioned in this section has to appear in traces, any of
them except conjunction may.
current_{atom,functor,predicate}
These predicates are really very strange. current_predicate and
system are useful for writing Prolog interpreters and debuggers.
current_atom(Atom)
if var(Atom) {
try binding Atom to each "currently known" atom in turn;
the order is not specified
} else {
if atom(Atom) then succeed else type_fail(1, atom).
}
It is obvious that this is the analgoue of InterLisp's MAPATOMS.
What is not obvious is which atoms are "currently known". Can
the question
?- current_atom(X), X = fred.
ever fail? If we list all the atoms at one time, and list all
the atoms at a later time, must the first lot be a subset of the
second? In Dec-10 Prolog, C Prolog, and Prolog-X the answers
are no and yes. In these Prologs atoms are never forgotten.
Bearing in mind that future Prolog systems will be able to interface
to external data bases, and so may have a much larger turnover of
atoms, all that we can reasonably require is that
Every atom appearing in any clause
or other internal data base record (possibly as the key) and every atom
which currently appears in the stacks is "currently known", but if
all references to and from an atom are lost it may become unknown.
current_atom is easily implemented by searching the symbol table rather
than the data base and stacks. It is up to the garbage collector to
prune the symbol table if it so wishes. A warning to implementors:
be very careful with this predicate if you prune the symbol table.
It is a part of this predicate's definition that no atom is enumerated
twice in the same call. If new atoms are being created while an
enumeration is going on, it is permissible for current_atom to miss
them. It is only required to enumerate those atoms which were current
when it started and which are still current when it finishes, though it
may enumerate more.
current_functor(Atom, Term)
if nonvar(Atom) and \+atom(Atom), type_fail(1, atom).
if constant(Term) and \+atom(Term), type_fail(2, compound).
if nonvar(Term) {
unify Atom with the principal function symbol of Term
} else {
current_atom(Atom)
try functor(Term, Atom, N) for each N such that Atom/N
is a "currently known" functor; the order is not defined
}
Here we have an even nastier problem. In Dec-10 Prolog, C Prolog,
Prolog-X, and NIP, each current functor is represented by a physical
block of memory in the symbol table. In Poplog and Quintus Prolog,
however, the functor is implicit. This looks like bad news; we might
have to examine the stacks and data base to find out what functors
are "current". The answer is to redefine what "current" means.
A functor is current if it appears in a clause or internal
data base record (possibly as the key).
This is a minimal requirement; other functors may be "current" as well.
One unpleasant aspect of the Dec-10 definition is that if you have a
"vector" class which you represent say as vector(E1,...,En), a functor
block is left hanging around forever for each size of vector you have
ever used. Such behaviour is permitted by this standard, but not
required. vector/N must show up if there is such a term somewhere in
the data base, but vector-like structures are commonly held only in the
stacks, and this definition permits current_functor to ignore the stacks.
An easy way to implement current_functor is to associate a set of integers
with each atom in the symbol table. Assert{|a|z} and Record{|a|z} have to
update these sets to include all the functors which appear in the data
base (and of course they may appear there differently from the way they
appear in the stacks), and the garbage collector may but need not prune
the sets. The integer 0 need not appear explicitly in these sets, as
current_atom(X)=>current_functor(X,X). The mapping might be represented
as a hash table in any of several ways. In particular, current_functor
is not required to enumerate all the current functors for a given
atom consecutively.
current_predicate(Atom, Goal) :-
current_functor(Atom, Goal),
"there have been clauses for Goal".
current_predicate(Goal) :-
current_predicate(_, Goal).
The Dec-10 Prolog manual is actually very clear on this point. It
says "it only generates functors corresponding to predicates for which
there currently exists an interpreted procedure". In David Warren's
papers, "procedure" means "clause". So what this means is that we
could define the Dec-10 version of current_predicate as
current_predicate(Atom, Goal) :-
current_functor(Atom, Goal),
\+ \+ clause(Goal, _).
Now if we want that definition, we can easily program it. I have
argued earlier in this document that a more useful definition would
be that a predicate is "current" if and only if there has been at
least one 'assert' for that predicate since its last 'abolish'.
One way to accomplish this is to have a mapping from atoms and arities
to "procedure headers", where this mapping is void for undefined or
abolished predicates, but "retract" leaves the header intact even
when it removes the last clause.
system(Atom, Goal) :-
current_functor(Atom, Goal),
"Goal has compiled code and compiled code only".
system(Goal) :-
system(_, Goal).
The distinction between system/[1-2] and current_predicate/[1-2] is
that a "system" predicate has only a compiled definition. There
is nothing to stop a system maintaining clauses and compiled code
both. However, if there are clauses for a predicate it must be a
current_predicate and not a system predicate.
'Assert' and 'retract' must report an error when applied to a system
predicate. It is less obvious that 'clause' should do so. However,
clause(Head,Body) obviously cannot succeed when system(Head) is true,
as by definition there are no clauses. But it should not fail either,
as a program might take that as an indication that the predicate in
question is undefined, which is not so. This is not restrictive, as
the programmer can always define
safe_clause(Head, Body) :-
current_predicate(Head),
clause(Head, Body).
which has the additional benefit of being able to search the whole
data base. (clause will report an instantiation fault if Head is
unbound.) The fact that 'assert' in Dec-10 Prolog does not
report an error but quietly ignores additions to system predicates
is one of Dec-10 Prolog's less helpful quirks.
system(X) or current_predicate(X) must be true for each of the predicates
defined in this document, but the implementor is free to choose which.
However, there must be a set of predicates which are system predicates
and in terms of which the others are defined so that a Prolog in Prolog
interpreter can bottom out somewhere.
Evaluable Predicates Not Described
The following predicates are part of Dec-10 prolog but are not covered
in this document, or not adequately covered.
abolish/2, assert/1, assert/2, asserta/1, asserta/2, assertz/1,
assertz/2, clause/2, clause/3, compile/1,
consult/1, erase/1, instance/2, listing/0,
listing/1, reconsult/1, recorda/3, recorded/3, recordz/3, retract/1.
These predicates are concerned with the data base. There are a
number of thorny problems here. Should the recorded data base be
considered as part of the standard? It is there in Dec-10 Prolog,
C Prolog, Prolog X, Quintus Prolog, and even in PopLog. On the
other hand, I have a design for a better indexed data base. And
on the third hand {Footnote: A kind of vyce familiar to carpenters} this
should really be replaced by a general indexed data base mechanism.
But there are largish programs that use it. Should data base
references be described? They are constants in C Prolog and Prolog-X.
But they are undocumented compound terms in Dec-10 Prolog. What about
the interaction of retract with running code? It may be that Quintus
can come up with a reinterpretation of this nuisance that permits
immediate reclamation of retracted clauses, I know they're trying to.
abort, break, debug, debugging, halt, leash/1, nodebug, nospy/1,
spy/1, trace/0, unknown/2
These predicates are concerned with debugging.
ancestors/1, compile/1, depth/1, gc/0, gcguide/3, incore/1,
log/0, maxdepth/1, nogc/0, nolog/0, plsys/1, reinitialise/0,
restore/1, revive/2, save/1, save/2, statistics/0, statistics/2,
subgoal_of/1, trimcore/0, version/0, version/1, 'LC'/0, 'NOLC'/0
These are obsolete or implementation dependent. 'NOLC' mode
is most definitely obsolete. trimcore/0 may be used to cause some
sort of garbage collection, which might be confined to the Prolog
stacks. gc/0 merely sets a flag. A general "flag assignment"
mechanism should be used in future instead of multitudes of little
0 or 2 argument predicates for individual flags.
close/1, current_op/3, display/1, file_exists/1 (exists/1 in C Prolog),
fileerrors/0, get/1, get0/1, nl/0, nofileerrors/0,
op/3, portray/1, print/1, prompt/2, put/1, read/1, read/2,
rename/2, see/1, seeing/1 (and seeing/2 in C Prolog), seen/0,
skip/1, tab/1, tell/1, telling/1 (and telling/2 in C Prolog), told/0,
ttyflush/0, ttyget/1, ttyget0/1, ttynl/0, ttyput/1, ttyskip/1, ttytab/1,
write/1, writeq/1
These predicates are concerned with input/output. read/1 and read/2
are in the Prolog library, so don't need to be defined here.
display/1, print/1, write/1, and writeq/1 will be placed in the Prolog
library, so again don't need defining here. The other predicates
should certainly be properly defined, but there are aspects of the current
file system which are, um, regrettable, and the standard should improve
on current practice. In particular, I would like to see input redirected
by
with_input_from_file(FileName, Body)
with_input_from_chars(ListOfChars, Body)
with_input_from_string(String, Body)
with_input_from_call(Pred, Body)
and output redirected by
with_output_to_file(FileName, Body)
with_output_to_chars(ListOfChars, Body)
with_output_to_string(String, Body)
with_output_to_call(Pred, Body)
and of course, for interacting with the user,
with_io_to_terminal(Body)
But none of this redesign is anywhere near being well enough
thought out to include in a standard, and it does require new
machinery in an implementation. (For example, it would be
extremely difficult to support with*call/2 in C Prolog.)
'C'/3, expand_term/2, phrase/2.
These are concerned with grammar rules. Grammar rule preprocessors
for DCGs, DCSGs, and MSGs are in the Prolog library, and a preprocessor
for XGs has been published.
numbervars/3
This is in the Prolog library (file METUTL.PL or /usr/lib/prolog/metalog).
Dec-10 Prolog code for the 'plus' family
@Include(Util:Arith.Pl)
Dec-10 Prolog code for 'univ'
Term =.. [FunctionSymbol|Args] :-
nonvar(Term),
!,
functor(Term, FunctionSymbol, Arity),
$argument_list(0, Arity, Term, Args).
Term =.. [Term] :-
atomic(Term),
!.
Term =.. [FunctionSymbol|Args] :-
atom(FunctionSymbol),
length(Args, Arity),
functor(Term, FunctionSymbol, Arity),
$argument_list(0, Arity, Term, Args).
$argument_list(N, N, _, []) :- !.
$argument_list(I, N, Term, [Arg|Args]) :-
J is I+1,
arg(J, Term, Arg),
$argument_list(J, N, Term, Args).
Dec-10 Prolog code for repeat/1
repeat(N) :-
telling(Old), tell(user),
write('It is pointlessly stupid to use repeat/1.'), nl,
write('Why don''t you use between/3 instead, fathead?'), nl,
tell(Old),
between(1, N, _).
Code for the sorting routines
@Include(Util:Sorts.Pl)
Code for prolog_conjunction and friends
@Include(Util:DeCons.Pl)