File : PLSTD.MSS Author : R.A.O'Keefe Updated: 23 July 1984 Purpose: SCRIBE manuscript for the draft Prolog standard

Department of Artificial Intelligence

University of Edinburgh

DAI Working Paper No. @Parm(Text) Date: @Value(Date)

Title: Draft Proposed Standard for Prolog Evaluable Predicates
Author: R. A. O'Keefe.

Abstract

There are several different implementations of Prolog which resemble Dec-10 Prolog closely enough that it is worth trying to move programs from one version to another. To facilitate this, it is important to have a clear specification of the semantics of the standard predicates (syntax is comparatively unimportant). This paper presents such a specification for term classification, construction, and comparison, for arithmetic, and for control structures. the term classification and construction and the arithmetic evaluable predicates of a Prolog system. The proposed standard extends Dec-10 Prolog in the interests of portability between future systems.

This is a draft, published for comments.

This work was supported by a Commonwealth Scholarship. Computing was done on the Edinburgh ICF KL-10 under SERC grant GR/C/06226.

Draft Proposed Standard for Prolog Evaluable Predicates

Introduction

Thanks in no small part to Chris Mellish's and Bill Clocksin's book "Programming in Prolog", there are quite a few Prolog interpreters around (and one or two compilers) which vaguely resemble the Edinburgh Prologs. However, Dec-10 Prolog, like Algol 60, is an improvement on most of its successors, and it is often pointlessly difficult to make a correct and reasonable Dec-10 Prolog program run under one of its imitators. The trouble is that people implementing new Prolog systems have mistaken "Programming in Prolog" (which is a textbook describing a "core" Prolog, and not fully specifying much of that) for a specification of Prolog in full.

Dec-10 Prolog is about to lose its supremacy. Already we have PopLog, which makes up with its extensible screen editor and its host language Pop-11 what it lacks in speed. Bill Clocksin's and Lawrence Byrd's Prolog-X system, which has been running for some time, has recently been rewritten in C, and is now the fastest portable Prolog in the UK. Bill Clocksin is adding the built in predicates it currently lacks, and it should not be long before this is the Prolog of choice on the UK AI machines. Dave Bowen's NIP system (whose architecture is virtually identical to that of Prolog-X) is running, and with any luck should be usable within a few months. With assembly code support it promises to be a factor of 2 faster than PopLog. Most of all, Quintus Prolog, which is more efficient than Dec-10 Prolog, is due to be released this year.

Then of course we have such systems as MU Prolog, which follow Edinburgh syntax to some extent, but which come even closer to logic. These systems necessarily redefine the semantics of conjunction, but there is nothing to be gained by their having different names or semantics for arithmetic operations.

Since a version of the Dec-10 Prolog parser is readily available written in Dec-10 Prolog, any syntactic differences between Prolog versions can be trivially overcome. What we need is a specification of the semantics of the built in predicates.

A possible objection to such standardisation is that implementors often think that they have a better idea of what some operation should do, and don't want a standard telling them not to do their thing. This objection is easily countered: all that a standard demands is that operations called so-and-so shall exist doing such-and-such. A standard does not say that these are the only operations for doing such things. For example, Dec-10 Prolog has a predicate read/1 which reads a term in Dec-10 syntax from the current input stream, and which yields 'end_of_file' when the end of the current input stream is reached. Some people would prefer an input predicate which fails when the end of the input stream is reached. Standard conformance doesn't stop them having such a predicate, it just stops them calling it read/1. As another example, the fact that Dec-10 Prolog's name/2 predicate works for both atoms and integers has the unfortunate consequence that
	atom(A) & name(A, S) & name(X, S) -> X = A
is not always true. And an implementor whose host language supports strings particularly well might prefer it if the "string" argument of name/2 were a string and not a sequence of character codes. But neither of these reasons for dissatisfaction with name/2 is licence to change it. Instead, a new standard should define more satisfactory operations as well (as this one does).

This brings me to another point about this proposed standard. It includes some things that are not in Dec-10 Prolog (at least, not built in, though they are available in the library). It is clear that Dec-10 Prolog is not the last word. It would be a good thing if we could agree on the improvements to it. The additions I propose here are easily added to any Prolog. Indeed, if I had the time it would be easy to add them to Dec-10 Prolog, and I have added some of them to C Prolog. Prolog versions of some of these new operations are available; they require the current Dec-10 Prolog facilities.

The purpose of this document is to describe a set of evaluable predicates which will facilitate the transport of programs between Prolog implementations resembling Dec-10 Prolog. Dec-10 Prolog itself does not conform to this standard, but most Dec-10 Prolog programs will be happy with it.

This document is not yet complete, nor does it attain the rigour needed by a true standard. I have decided to publish it as it stands, because (a) I shall have no opportunity to do any further work on it this year, (b) even in its present state it may be useful, there being nothing remotely like it, and (c) while I have put a good deal of thought into most of it, the Prolog community may not agree with my principles, and we might as well find that out before I waste any more effort on this project.

Error Reporting

This document does not describe how errors are signalled, nor how they are handled. That should be standardised as well, of course. What it does described is what errors can be or must be reported.

An error is ignorable if it is possible to tell the Prolog system to ignore it and do something sensible.

An error is continuable if after it has been reported to the user it makes sense to proceed.

Simple Failure

The first kind of error is when a question is perfectly well formed but the answer is "no". An example of this would be "3 < 1". An implementation must not signal or report an error when this happens. The specifications below use fail to indicate this.

Undefined Predicates

The second kind of error is when the program calls a predicate and there are no clauses for that predicate. Logically, this is perfectly well defined. The goal should fail, and that is all there is to it.

However, calling a predicate which has no clauses usually results from a typing mistake in the source code. At Edinburgh this is not much of a problem because of the cross-referencers, the syntax checker in the editor, and the WLIST program. It is quite easy for us to find most mis-spelled predicates before running the code. Other Prologs are not so well endowed, and finding an undefined predicate in a Prolog which is standard conforming but lacks the Edinburgh environment or even a trace facility is amazingly painful.

It is therefore required of a standard Prolog that there be at least two modes of operation. In one, calls to undefined predicates simply fail. In the other, calls to undefined predicates produce an error message, and this mode must be the default. The exact form of the error message does not matter, but something like
	! Undefined predicate: pljs/3
	! pljs(3, 17, _273)
is recommended. It is redundant, but when the culprit call is very large it may be difficult for the programmer to count the arguments. As a general matter, it is recommended that all error messages have the form
	! <error phrase>: <detail>
	! <the culprit goal>
and that some form of dialogue then be entered. In the case of undefined predicates, the user must have the option of failing this call, advising the Prolog system to fail all calls to this predicate in future, or loading a file. There are many other options that could usefully be provided.

To tell the system what mode to use, some command such as unknown(Old,New) or flag(unknown,Old,New) should be used, where Old is unified with the Old mode and the mode is then set to New. The form of the call is not (yet) part of the standard, but that the key word involved is 'unknown' is part of the standard, so that the programmer knows where to look in the index. As the mode where undefined predicates are reported is the default, the name of that mode is not part of the standard. (And there may be more than one such mode.) However, the name of the mode where undefined predicates just fail is defined to be 'fail'.

There is an obvious difference between calling a predicate which has never had any clauses and calling a predicate which has had clauses but which happens not to have any just at the moment thanks to the use of 'retract'. The former is very likely to be a mistake, the latter is likely to be "normal" data base hacking. It is straightforward for an implementation to distinguish between these two cases if this problem is considered in the initial design. It is therefore defined that each predicate starts life as "undefined", that any sort of 'assert' changes its status to "defined" and that nothing else does, and that 'abolish' changes its status back to "undefined" and that nothing else does.

There is a built in predicate in Dec-10 Prolog and C-Prolog:
	current_predicate(Name, Goal)
(see section 4.6 of the Dec-10 Prolog manual) which recognises user predicates. Up till now it has meant that the predicate in question has at least one clause. However, it would be more useful if it meant that the predicate in question has at some time had clauses, for then a Prolog interpreter written in Prolog could easily report undefined predicate errors, the Prolog pretty-printer could distinguish between a command to print a predicate which currently has no clauses and a command to print a non-predicate, and so on.

Note that the requirement that only assertions can make a predicate defined does not preclude a declaration
	:- data F/N.
which indicates a predicate which will be dynamically built and changed, as we can write
	declare_data(F, N) :-
		functor(Goal, F, N),
		(   current_predicate(F, Goal)
		;   assert(Goal), retract(Goal)
		), !.
This works with either definition of current_predicate.

Type Failure

The third kind of error is when a predicate is given arguments of the wrong type. An example of this would be "X is fred".

If you regard a built in predicate as an implicit form of a possibly infinite table, it is clear that logically the response to such questions should be simple failure. If for example, we ask "?- plus(X,a,Y)" it is clear that there are no substitutions for X or Y which could ever make this true, and the appropriate response is simply to fail.

However, for debugging it may be more helpful to forget about Prolog's "totally defined semantics" and to produce an error report. The specifications below use type_fail(ArgNo,ExpectedType) to indicate this. The exact form of the error message is not specified, as a Prolog program should have no way of getting its hands on such a message, but it might look like
	! Type Error: argument 2 of plus/3 is not an integer:
	! plus(_273, a, _275)

An implementation should be able to provide both forms of behaviour at the programmer's option. This could be a global flag set by e.g. flag(type_fail,_,fail) or flag(type_fail,_,error). Or if there is some way of providing handlers for errors, it could be by letting the programmer supply a handler for type errors which just fails. There should be a distinction between continuable errors, where there is an obvious way of proceeding, and non-continuable errors, where either success or failure would yield the wrong answers in some model. Type failures are continuable.

Overflow

The fourth kind of error is arithmetic overflow. Ideally, a Prolog implementation should support arbitrary precision integers (bignums) and even rational numbers (ratios). Furthermore, the distinction between various sizes of integer should be completely invisible to a Prolog program. (Thus there should be no smallp/fixp/bigp predicates in Prolog, only integer/1.) If this is done, arithmetic overflow cannot happen in integer arithmetic. (Though storage overflow can.) The only Prolog implementations I know of which support bignums are those embedded in a Lisp system which already had them. Dec-10 Prolog, PopLog, Prolog-X, NIP, and David Warren's "New Engine" all support N-bit 2s-complement arithmetic only. Some of them can't even detect arithmetic overflow. In an N-bit 2s-complement system without overflow detection, it is possible for X and Y to be positive but for X+Y to be negative. This has in fact been something of a nuisance in Dec-10 Prolog (where N=18 exception during the calculation of a single expression where N=36), and is the reason why I wrote the "LONG" package. (This package is in the public domain. It implements bignums and ratios in Prolog. A host language implementation is bound to be more efficient, but LONG is at least a place to start.)

Any Prolog implementation should support at least 16-bit integers. C Prolog's 29 bits has proven adequate for most purposes so far, and the same is true of Prolog-X's and NIP's 24 bits. However, when bignums (indistinguishable from small integers) are not available, integer overflow should be signalled as a fault. (Not as an error, the program has a perfectly good meaning, but as a fault, the implementation isn't good enough to run the program.)

The specifications below do not indicate overflow, because the values to be calculated are perfectly well defined. If an implementation cannot represent these values, it should signal overflow.

An implementation which lets the programmer supply handlers for exceptions might use this as a hook for bignums, e.g. the programmer might be able to say
	handle overflow in plus(X,Y,Z) where nonvar(X)
	and nonvar(Y) by calling eval(Z is X+Y).
But getting this right would require similar hooks into type failures as well, and the whole thing would end up as an incredible mess. So overflows should not be continuable. (It would be wrong to fail an overflowed goal, because there is an answer. It would be equally wrong to succeed, because any number the implementation can represent would be wrong.)

Given Knuth volume 2, the UNIX "mint" package, and the LONG package written in reasonably portable Prolog, there is very little difficulty in implementing bignums. Bignums have finally found their way into a Lisp "standard" (Common Lisp); it is time they found their way into Prolog.

Even so, we are still left with division by zero. Suppose we ask "X is 7/0". That is, find me an X such that 0*X=7. Clearly, there is no such X, and by the reasoning used on type failures this error should be continuable and ignorable, but should also be reported if the programmer prefers. I like the "totally defined semantics" of Prolog, and would prefer on aesthetic grounds that division by 0 should simply fail. (See the next subsection for times(0,X,0).) When using Prolog as a theorem prover, this is certainly the behaviour one wants. But the default should be to report an error. The code below uses zero_divide to indicate such faults.

Arithmetic overflow has another aspect: floating point. In an imperative or functional programming language, floating point behaves itself reasonably well, though it is regrettable that it satisfies so few of the field axioms. The IEEE floating point standard doesn't just define a family of storage formats. More importantly, it defines the arithmetic operations (and square root) so that IEEE floating point arithmetic obeys more useful axioms than most. Any Prolog hardware which offers floating point should certainly adopt the IEEE standard. The IEEE standard also specifies the exceptions to be raised.

Floating point exceptions include
domain errors, such as sqrt(-1). These should be handled in the same way as integer division by 0.

overflows. These should be handled like integer overflows, that is, they must be detected, and must not be continuable.

underflow. It is common practice to replace an underflowed result by 0. There are two problems with this. The first is the fact that the ordered field laws
X > 0 & Y > 0 => X*Y > 0
X < 0 & Y > 0 => X*Y < 0
X*Y = 0 => X = 0 or Y = 0
and so on can be falsified. In a language allegedly based on logic this would be a pity. The second is that underflow represents a complete "loss of precision" and indicates a poor formulation of a calculation problem. As there is a vast literature on numerical analysis, and a good deal is known about coping with the defects of floating point, it would be a great pity if this indication of a badly expressed algorithm were not brought to the programmer's notice.
There may of course be other floating point exceptions, such as trying to perform arithmetic on an IEEE NaN value.

With the exception of domain errors (like division by 0), these represent the failure of the implementation to handle a well defined value, and so should be presented as faults rather than errors. No faults are continuable.

Instantiation Faults

The fifth kind of error is when a question has too many variables in it. Now this logically speaking is no error at all. The question "plus(X,X,Y)" has a perfectly good set of solutions {(X,Y)|X is an integer and Y=2*X}.

As the Herbrand Universe of a set of clauses is countable, it is evident that any Prolog implementation could in principle backtrack over all possible solutions, even for "is" (except for floating point). So if an implementation decides not to do so, its failure to compute solutions for such questions is an implementation fault and not a logical error.

There are of course excellent reasons for declining to do all this backtracking. The implementation cost is very high (especially for backtracking over all possible ground arithmetic expressions in 'is'), and the benefit is negligible. If a programmer genuinely wants code that backtracks over all the integers, it is easy enough to write a generator. For example:
	gen_nat(N) :-			% gen-erate nat-ural
		nonvar(N),		% if we aren't to generate it
		!,			% demand that it is an integer
		integer(N), N >= 0.	% and non-negative.
	gen_nat(N) :-			% otherwise, generate an
		gen_nat(0, N).		% integer >= 0

gen_nat(L, L). gen_nat(L, N) :- % generate natural > L succ(L, M), % M = L+1 gen_nat(M, N). % generate natural >= M

gen_int(I) :- % gen-erate int-eger nonvar(I), % if we aren't to generate it !, % demand that it is an integer. integer(I). gen_int(0). % generate 0 gen_int(I) :- % generate +/- N for N > 0 gen_nat(1, N), ( I = N ; plus(I, N, 0) % I+N = 0 ).
It is also better style to use an explicit generator, so that the human reader can better grasp what the algorithm is up to.

The basic problem is that when there are infinitely many solutions to a single goal, blind chronological backtracking such as Prologs generally use will stick at that goal, when the cause of the failure may lie elsewhere entirely. Reporting an instantiation fault instead is a good way to ask the programmer for more help in the control component of the program.

A system such as MU Prolog should not report instantiation faults, but should delay such goals until more variables are instantiated. When such a system deadlocks (because all the goals it might work on are delayed), that is its equivalent of reporting an instantiation fault.

If there is enough information in the goal to determine that it cannot succeed, it should fail. So an instantiation fault should generally correspond to a question with at least one answer (which the Prolog doesn't think it can afford to find). If the system supports user defined error handlers, and if an error handler can generate solutions, then it might be an idea to allow the user's error handler to propose solutions and use the built in code to check them. But when the user's handler/generator fails, it will have failed after a finite number of solutions, and most instantiation faults correspond to questions with an infinite number of solutions, so the error should be re-signalled (by-passing the user's handlers). Instantiation faults are not failure-continuable, though they are success-continuable.

Instantiation faults are indicated in the specifications below by instantiation_fault.

Unimplemented data types

This proposed standard describes some data types (rational and complex numbers) not currently in any Edinburgh-family Prolog, and one (strings) which is only in the more recent generation of compilers (PopLog, NIP, Prolog-X). The point of defining some of the predicates which work with these values is to make sure that no-one uses the names for anything else.

A particularly good example of this is complex numbers. If there is a reason for having floating point numbers in Prolog, exactly the same reason can be applied to argue for complex numbers. "The host language has floats" is not a reason, it is an excuse. If the host language has floats, Prolog can have complex numbers. The second volume of R.P.van de Riet's book on formula manipulation in Algol could be consulted to find out how to implement some of the basic functions with acceptable accuracy, and there are many other sources which could be consulted. Indeed, a FORTRAN or Common-Lisp based Prolog would get complex numbers for free. Complex numbers being a well-defined easily implemented and useful extension, we must not use the name "complex" when we mean what standard texts call "compound" terms.

If a Prolog implementation does not include some data type, what is it to do if a predicate pertaining to such a type is called? In the case of a recognition predicate, this is already defined. No argument can be of the required type, so the predicate will fail. But if the predicate is supposed to construct an object of the desired type, say if we call number_chars(X,"1.2") in a system which lacks floating point, what then?

Recall the approach to arithmetic overflow. The case is the same: there is a well defined result which the machine cannot represent. Arithmetic overflow and unimplemented types are both instances of representation faults. (That is, the system admits its inadequacy. Representation faults are not programmer errors.) By the same logic, no form of representation fault can be continued, even by failing.

In some machine ranges the less expensive members of the family implement floating point arithmetic in software, and the order code is so arranged that an appropriate interpreter will be called when a program tries to use an unimplemented instruction. By analogy, an obvious thing to do would be to allow the user to write handlers for unimplemented type faults and to let ia implement ia's own complex arithmetic. This we must not do.

The first reason for this is that there is simply no way that the user can possibly do it. The user would have to represent the missing data type using the data types are are available. But that means that ia cannot construct an object which will be recognised as belonging to the data type ia is trying to supply. Worse than that, it must be recognised as some other data type. We have found this to be a problem with the current implementation of rational arithmetic in Prolog; no system code recognises that rational numbers are supposed to be atoms, which can lead not only to inefficiency but to error.

I have elsewhere proposed a mechanism called "seals" for allowing the Prolog programmer to define ia's own new atomic types. But that still isn't good enough, because the standard operations are required either to have their standard effect or to report an appropriate fault. They may not produce any error message not explicitly permitted herein, other than storage overflow. (Which is hereby explicitly permitted anywhere.) We cannot guarantee this of user code; indeed we can guarantee nothing about user code. Now a program that calls user code is asking for trouble, and Prolog can't prevent it getting trouble. But a program which calls standard predicates is entitled to get the standard effect and no other.

A third reason that it is undesirable is that it is a waste of time. If anyone wants a new data type added to the standard, either at least one implementation should be well known in the computer science community (which applies to strings, rational numbers, and complex numbers), or else ia should publish the implementation of ia's data type in some generally known language (this might apply to updatable arrays, Shimon Cohen has met this requirement). We need a Prolog standards group whose function is to distribute references to this and other public implementation information to would-be implementors. The principle here is that if you want other people to support your data structure, you have to tell them how.

A Design Principle

Some built in predicates are control structures. ','/2, ';'/2, '->'/2, '!'/0, once/1, '\+'/1, forall/2.

Some built in predicates have to do with I/O.

Some built in predicates have to do with the data base.

But many of the rest are implicit representations of perfectly good logical relations. An example is length/2, which could be defined in Prolog as
	length([], 0).
	length([_|Tail], N) :-
		length(Tail, M),
		succ(M, N).
or, to avoid runaway recursion, as
	length(List, Length) :-
		nonvar(Length),
		!,
		length_gen(Length, List).
	length(List, Length) :-
		length_chk(List, Length).

length_chk([], 0). length_chk([_|Tail], N) :- length_chk(Tail, M), succ(M, N).

length_gen(0, []). length_gen(N, [_|Tail]) :- succ(M, N), length_gen(M, Tail).
This is a very useful predicate, and to get the utmost speed, it might be coded in the host language, or even in micro-code. We would like such an implementation to have the best affordable approximation to its logical semantics.

The arithmetic predicates are a case where the relation represented cannot be described in Prolog. If numbers are represented using successor notation, then Prolog can indeed express all the arithmetic operations. But otherwise, even for the relation N=M+1, there are infinitely many values for M, and to describe this relation in Prolog we would need an infinite number of clauses.

Still, the intended relation is perfectly clear.

The design principle for such built in operations is this:
if there is sufficient information in a question to determine that it has a unique answer, the implementation should yield that answer.

if there is sufficient information in a question to determine that it has NO answer, the implementation should fail.

If neither of these cases holds, but the implementation can tell that there is a finite number of solutions, it may enumerate them. But it need not.

If the implementation cannot tell that there is a finite number of solutions, or if it doesn't check, it should either adopt some other control strategy (such as delaying) or it should report an instantiation fault. It must NOT backtrack over an infinite set of solutions.

Unfortunately, Dec-10 Prolog has not followed this design principle. For example, the length/2 predicate in Dec-10 Prolog only works one way: length(X,4) reports an instantiation fault instead of computing the unique solution X=[_,_,_,_]. However, there is an important consequence of adopting this principle: it is more complete than Dec-10 Prolog, so programs which run in Dec-10 Prolog will run in an implementation which follows this principle. Many "Prologs" are less complete than Dec-10 Prolog. Another benefit is that this principle determines uniquely what each predicate should do, the implementor and standardiser to not have to make and justify choices for each of dozens of predicates.

I have tried to make each C-coded predicate in C Prolog follow this principle. length/2 is coded in Prolog, but in a non-structure-sharing Prolog it is easy enough to code it in C so that it obeys this rule.

Note that determining that there is no solution may involve detecting a type failure. For example, "succ(a, X)" has no solution, so the principle says it must fail, and the discussion on errors says it may report a type failure on the way.

Some Naming Conventions

These conventions are recommended in the interests of clarity. They are followed in this standard. Code which is not intended for use outside a particular group may use any conventions you please.

When defining a new data type, pick some reasonably short word to name it. I'll symbolise it here by D. When used in predicate and constructor names the word should be entirely in lower case, and if there is other text in the predicate or function name it should be set off by underscores. When used in variable names it should be capitalised, and other text may be set off by underscores but need not be. For example, having invented a new type "basket", we might have predicates "is_basket", "basket_to_bag", "fill_basket", "price_basket", a constructor "basket" and "empty_basket". We might use variable names Basket, Basket1, FullBasket, New_Basket, and so on. In published code abbreviations such as "bkt_size" or "BstLen" should be avoided. (They might be bucket size or beast length.) But there is no reason not to use an abbreviation of an English word as the type name; the objection is to further abbreviations of that. So bkt_size is ok as a predicate name if the data type is called bkt. If the code contains explicit type declarations for some predicates, the variable name convention need not apply to those predicates.

A predicate to recognise objects of type D should either be called "D", or "is_D". The is_D form is preferred. Unfortunately it is too late to do anything about the standard term type names. A recognition predicate should only enumerate objects of the given type if there are finitely many of them. It is generally a bad idea for "basic" predicate to be capable of unbounded backtracking. If the Prolog implementation supports delaying, recognition predicates should delay when given variables.

If a data type represents some sort of collection, predicates D_size, D_member, D_memberchk, check_D, part_D and map_D may be defined, where
	D_size(D, Size) :-
		bind Size to the number of elements in D, or
		if D is a mapping, to the number of elements in 
		the domain of D.

D_member(Element, D) :- recognise or enumerate elements of the collection.

D_memberchk(Element, D) :- recognise elements of the collection, do not enumerate. May delay if Element is unbound, but need not.

check_D(Pred, D) :- check that Pred(X) is true for each element X of D.

part_D(Pred, D, TrueD, FailD) :- distribute the elements of D into two objects TrueD/FailD of the same type according as Pred(Elem) succeeds/fails.

map_D(Pred, Input_D, Output_D) :- construct a new object of type D, where corresponding elements I and O of Input_D and Output_D are related by Pred(I,O).
These conventions need thinking about. There could also be some others. Given D_member, we can define bounded existential quantification as
	some_D(Pred, D) :-
		D_member(Element, D),
		Pred(Element),
		!.
Note the presence of the cut. Since there is no "official" way to find out what Element was involved, the only effect of the cut should be to eliminate redundant proofs of the same fact. However, if Pred (which can be a compound term) contains variables, or if it has side effects, or if the collection D contains variables, the program might be able to detect the difference. If it matters, the programmer can easily write the calls to D_member and Pred explicitly, and ia then has the successful element as well. Similarly, we can define bounded universal quantification as
	each_D(Pred, D) :-
		forall(D_member(Element,D), Pred(Element)).
Again, this will not bind any free variables in Pred or D, and not only will such bindings not be propagated to the caller, they are not kept from one Element to the next. In contrast, the check_D predicate does bind free variables. For example, suppose we have a predicate divisible_by(X,Y) which means that X is divisible by the non-unit Y. Then
	check_list(divisible_by(Y), [72,6,30,1296])
might instantiate Y to 6, 3, and 2, while
	each_list(divisible_by(Y), [5,3,7])
might succeed, because each of the numbers has a non-unit factor, leaving Y uninstantiated.

Some data structures represent finite functions in various ways. Note that map_D predicates only act on the results of such functions, i.e. if I represents the function f and Pred encodes the function g, and map_D(Pred,I,O) succeeds, then O represents the function g o f.

If D and R are two data types, and there is an obvious mapping from D to R (such as from a list to a set), the conversion predicate should be called D_to_R. In particular, it is recommended that all new data types should have D_to_list and list_to_D conversions whenever this makes sense. If the conversion is a partial bijection, so that given an object of type D there is always a unique object of type R to which it is related, and given an object of type R there is at most one object of type D to which it is related (but there need not be any such object), the "to_" may be omitted and the name D_R used. Note that this is still directional. The user is entitled to assume that
	for all D_obj there is a unique R_obj such that
	    D_R(D_obj, R_obj)
but not to assume that
*	for all R_obj there is a unique D_obj such that
*	    D_R(D_obj, R_obj)
If this second rule is true, there should also be a predicate R_D.

Why have I fussed about with all these conventions? First, there is a whole pile of library predicates whose names need rationalising. Second, this proposed standard has to introduce some new predicates to go with the new types (particularly strings), and I don't want to have to defend each name individually, and I particularly don't want further extensions to have individually defended names. If these conventions are adopted, there will be little or no choice for the name of a predicate once we have decided what it does. Think of the benefit to a Prolog user trying to find out if there is a predicate to frobboz zeelunks. Since these conventions do not allow the abbreviation of a type name, the user has only to ask
	?- cp "*zeelunk*".
(known as 'apropos' in many Lisps) to find out. We should further rule that operation names (make, test, check [hence "chk" in memberchk, as a different operation is meant], map, size, portray, ...) may not be abbreviated as well. {NB: a language like CLU achieves this effect by having compound names: <type>$<function> e.g. integer$assign. We might want to have multi-type operations, such as T1_T2_sameSize.)

Determining the Type of A Term

Prolog has a number of predicates for classifying terms. These predicates are called "meta-logical", because they view terms as data structures, and not as logic would view them. In particular, they discriminate between variables and other terms, a distinction which is completely meaningless from a logical point of view, and can only be understood with reference to the current "state" of a procedural computation.

This distinction between variables and other terms is important for simple depth-first strategies such as Dec-10 Prolog, Prolog-X, NIP, and PopLog use. However, it is very little use in a more complete interpreter such as MU Prolog. Even in a more complete interpreter, it is convenient to have say a predicate for accepting any atom. This standard proposes a single new type testing predicate which such a system can use, which has a logical reading, unlike the other predicates of this section which have only a procedural reading.

The term "types" can be arranged in a tree, where type A appearing as an ancestor of type B means that all terms of type B are also of type A, and types are otherwise incompatible. Here is the tree:

			true
			  |
		+---------+---------+
		|		    |
	     nonvar		   var
		|
      +---------+--------------------------+
      |					   |
  compound			     atomic=constant
					   |
		+----------------+---------+---------+
		|		 |		     |
	     number	      string		 atom=symbol
		|
	+-------+-------+
	|		|
    rational	     complex
	|		|
     integer	      float

Every standard Prolog implementation is required to support the var, nonvar, compound, atom, and integer types, and to supply all these type testing predicates. A recognition predicate for an unsupported type must always simply fail, but the compiler or other tools may issue a warning when the program is compiled or otherwise loaded. There may be other atomic types, such as the data base references of C Prolog and Prolog-X. There may be other numeric types. There is no standard type for a rational number which is not an integer, as this may be defined in Prolog by
proper_rational(Q) :-
	rational(Q),
	\+ integer(Q).
and similarly there is no standard type for a complex number with a non-zero imaginary part, as this may be defined by
proper_complex(F) :-
	complex(F),
	\+ float(F).

Var, nonvar

var(X) succeeds if X is a variable, and fails if it is not.

nonvar(X) fails if X is a variable, and succeeds if it is not.

true(X), which always succeeds, and var(X) are the only type testing predicates which accept variables. All the other type testing predicates fail for variables. Note that this means that
	...
	\+ number(X),
	...
	number(X)
can actually succeed, provided X is a variable at the time of the first test and is suitably instantiated at the time of the second. This sort of behaviour is what makes these predicates "metalogical" instead of being equivalent to infinite tables.

simple, compound, atomic

A compound term is a term which has arguments. That is
compound(Term) -> exists N. exists A. arg(N, Term, A).
No other terms have arguments. In particular, arg may not be used to select characters of a string or atom, not may it be used to extract the real and/or imaginary parts of a complex number.

A standard Prolog implementation must support compound terms with up to 200 arguments, and may support any greater arity. See the section on term construction.

A standard Prolog implementation may not provide a "tuple" data type. Compound terms suffice for this purpose. If the implementation is capable of supporting tuples of some length, it is capable of supporting compound terms of that arity (perhaps less 1).

A simple term is anything other than a compound term. That is, simple is exactly defined by
simple(Term) :-
	\+ compound(Term).

Dec-10 Prolog uses the name "atomic" to mean a non-variable term other than a compound term. In a mixed language system this can be confusing. Logicians normally use the name "constant" for such terms. For compatibility, a standard Prolog implementation must supply atomic/1 defined as if by
	atomic(Term) :-
		nonvar(Term),
		\+ compound(Term).
but it is recommended that constant/1 be supplied as well, with identical meaning.

It is unfortunate that a number of writers have invented new names for compound terms. 'complex' we've already dealt with as being needed for complex numbers. Another popular not-invented-here name for compound terms has been "structures". This is a particularly unhelpful name, as all kinds of terms are data structures, and when we are talking about implementations of trees, graphs, and so on in Prolog we may want to distinguish between a compound term and the data structure it represents. It is almost impossible to make the distinction between these two concepts clear if you call them both structures! And of course there are control structures also represented by compound terms. Also, there may be data structures which Prolog regards as atomic because pattern matching cannot look inside them, but which the host language regards as data structures, such as strings, arrays, hash tables, streams, ... Denying that a Lisp array which Prolog has its hands on is a structure helps no-one.

There is not the least need to steal this useful and harmless word, as "compound term" conveys precisely the notion of a term which is composed from other terms. Therefore, in the interests of having a teachable language, a standard Prolog may not use the predicate structure(X) to recognise compound terms, and term_type(X, T) may not bind T=structure.

There is nothing to stop a sufficiently foolish programmer putting
structure(X):-compound(X).
in his own programs. What is forbidden is having this in a standard Prolog system or in the public Prolog library.

Atoms and Strings

An atom is a constant which can serve as the function symbol of a compound term. This name can be confusing in a mixed language system, where an "atom" might be anything other than a list, or in a modern Lisp would be any constant. The name "symbol" is recommended as an alternative, but "atom" must be provided.

An atom is uniquely identified by its name, which is a sequence of characters. In a Prolog system which has modules of some sort, this should still be true; the module system should pertain to predicates, not to terms. This means that if a Prolog implementation is embedded in a language (such as ZetaLisp) which has a module system which pertains to atoms, the name of an atom is the fully qualified name. Thus 'user:outer-module:middle:inner:final' might be the name of an atom which could in a suitable context be input as 'final'. An implementation must allow an atom to have up to 255 characters in its name. There is no standard upper bound.

A string is also a sequence of characters. However, it is possible for two strings to have the same characters and yet be distinguishable in some fashion, by the host language. A Prolog program must not be able to distinguish between strings containing the same characters without calling a host-language routine.

An implementation is not obliged to implement strings at all. If it does, it must allow a string to have up to 255 characters. There is no standard upper bound.

Strings may be mutable, that is, it may be possible for the sequence of characters associated with a string to change. This would be the case with strings implemented as Pop11 byte vectors, as Common Lisp strings or byte vectors, or as InterLisp strings. This fact is the reason why a string cannot be used as the function symbol of a compound term. Atom names must not be mutable; an implementation may use the same data structures for atom names as for strings, but should ensure that routines which change strings never get their hands on atom names. In extreme cases this may entail writing a new symbol table just for Prolog.

The availability or otherwise of strings as a distinguishable data structure is one question. Whether "abc" is a list of ASCII character codes or something else in another. An implementation may provide a string data structure and a convenient notation for lists of character codes both.

The characters in an atom name or a string are in ASCII. An implementation is not required to support character codes outside the range 0..127. However, any byte size and any integer range that includes 0..127 may be used. If it is necessary to interface Prolog to an environment using the EBCDIC character code, Prolog must use the 8-bit ASCII character set internally, and translation in each direction must be done according to the following tables (taken from document X3J11/83-045):

    8-bit ASCII to EBCDIC		    EBCDIC to 8-bit ASCII

00 00 40 7c 80 20 c0 76 00 00 40 20 80 c3 c0 7b 01 01 41 c1 81 21 c1 77 01 01 41 a0 81 61 c1 41 02 02 42 c2 82 22 c2 78 02 02 42 a1 82 62 c2 42 03 03 43 c3 83 23 c3 80 03 03 43 a2 83 63 c3 43 04 37 44 c4 84 24 c4 8a 04 9c 44 a3 84 64 c4 44 05 2d 45 c5 85 15 c5 8b 05 09 45 a4 85 65 c5 45 06 2e 46 c6 86 06 c6 8c 06 86 46 a5 86 66 c6 46 07 2f 47 c7 87 17 c7 8d 07 7f 47 a6 87 67 c7 47 08 16 48 c8 88 28 c8 8e 08 97 48 a7 88 68 c8 48 09 05 49 c9 89 29 c9 8f 09 8d 49 a8 89 69 c9 49 0a 25 4a d1 8a 2a ca 90 0a 8e 4a d5 8a c4 ca e8 0b 0b 4b d2 8b 2b cb 6a 0b 0b 4b 2e 8b c5 cb e9 0c 0c 4c d3 8c 2c cc 9b 0c 0c 4c 3c 8c c6 cc ea 0d 0d 4d d4 8d 09 cd 9c 0d 0d 4d 28 8d c7 cd eb 0e 0e 4e d5 8e 0a ce 9d 0e 0e 4e 2b 8e c8 ce ec 0f 0f 4f d6 8f 1b cf 9e 0f 0f 4f 7c 8f c9 cf ed

8-bit ASCII to EBCDIC EBCDIC to 8-bit ASCII

10 10 50 d7 90 30 d0 9f 10 10 50 26 90 ca d0 7d 11 11 51 d8 91 31 d1 a0 11 11 51 a9 91 6a d1 4a 12 12 52 d9 92 1a d2 aa 12 12 52 aa 92 6b d2 4b 13 13 53 e2 93 33 d3 ab 13 13 53 ab 93 6c d3 4c 14 3c 54 e3 94 34 d4 ac 14 9d 54 ac 94 6d d4 4d 15 3d 55 e4 95 35 d5 4a 15 85 55 ad 95 6e d5 4e 16 32 56 e5 96 36 d6 ae 16 08 56 ae 96 6f d6 4f 17 26 57 e6 97 08 d7 af 17 87 57 af 97 70 d7 50 18 18 58 e7 98 38 d8 b0 18 18 58 b0 98 71 d8 51 19 19 59 e8 99 39 d9 b1 19 19 59 b1 99 72 d9 52 1a 3f 5a e9 9a 3a da b2 1a 92 5a 21 9a 5e da ee 1b 27 5b ad 9b 3b db b3 1b 8f 5b 24 9b cc db ef 1c 1c 5c e0 9c 04 dc b4 1c 1c 5c 2a 9c cd dc f0 1d 1d 5d bd 9d 14 dd b5 1d 1d 5d 29 9d ce dd f1 1e 1e 5e 9a 9e 3e de b6 1e 1e 5e 3b 9e cf de f2 1f 1f 5f 6d 9f e1 df b7 1f 1f 5f 7e 9f d0 df f3

8-bit ASCII to EBCDIC EBCDIC to 8-bit ASCII

20 40 60 79 a0 41 e0 b8 20 80 60 2d a0 d1 e0 5c 21 5a 61 81 a1 42 e1 b9 21 81 61 2f a1 e5 e1 9f 22 7f 62 82 a2 43 e2 ba 22 82 62 b2 a2 73 e2 53 23 7b 63 83 a3 44 e3 bb 23 83 63 b3 a3 74 e3 54 24 5b 64 84 a4 45 e4 bc 24 84 64 b4 a4 75 e4 55 25 6c 65 85 a5 46 e5 a1 25 0a 65 b5 a5 76 e5 56 26 50 66 86 a6 47 e6 be 26 17 66 b6 a6 77 e6 57 27 7d 67 87 a7 48 e7 bf 27 1b 67 b7 a7 78 e7 58 28 4d 68 88 a8 49 e8 ca 28 88 68 b8 a8 79 e8 59 29 5d 69 89 a9 51 e9 cb 29 89 69 b9 a9 7a e9 5a 2a 5c 6a 91 aa 52 ea cc 2a 8a 6a cb aa d2 ea f4 2b 4e 6b 92 ab 53 eb cd 2b 8b 6b 2c ab d3 eb f5 2c 6b 6c 93 ac 54 ec ce 2c 8c 6c 25 ac d4 ec f6 2d 60 6d 94 ad 55 ed cf 2d 05 6d 5f ad 5b ed f7 2e 4b 6e 95 ae 56 ee da 2e 06 6e 3e ae d6 ee f8 2f 61 6f 96 af 57 ef db 2f 07 6f 3f af d7 ef f9

8-bit ASCII to EBCDIC EBCDIC to 8-bit ASCII

30 f0 70 97 b0 58 f0 dc 30 90 70 ba b0 d8 f0 30 31 f1 71 98 b1 59 f1 dd 31 91 71 bb b1 d9 f1 31 32 f2 72 99 b2 62 f2 de 32 16 72 bc b2 da f2 32 33 f3 73 a2 b3 63 f3 df 33 93 73 bd b3 db f3 33 34 f4 74 a3 b4 64 f4 ea 34 94 74 be b4 dc f4 34 35 f5 75 a4 b5 65 f5 eb 35 95 75 bf b5 dd f5 35 36 f6 76 a5 b6 66 f6 ec 36 96 76 c0 b6 de f6 36 37 f7 77 a6 b7 67 f7 ed 37 04 77 c1 b7 df f7 37 38 f8 78 a7 b8 68 f8 ee 38 98 78 c2 b8 e0 f8 38 39 f9 79 a8 b9 69 f9 ef 39 99 79 60 b9 e1 f9 39 3a 7a 7a a9 ba 70 fa fa 3a 9a 7a 3a ba e2 fa fa 3b 5e 7b c0 bb 71 fb fb 3b 9b 7b 23 bb e3 fb fb 3c 4c 7c 4f bc 72 fc fc 3c 14 7c 40 bc e4 fc fc 3d 7e 7d d0 bd 73 fd fd 3d 15 7d 27 bd 5d fd fd 3e 6e 7e 5f be 74 fe fe 3e 9e 7e 3d be e6 fe fe 3f 6f 7f 07 bf 75 ff ff 3f 1a 7f 22 bf e7 ff ff

But what about other alphabets, such as Hebrew, Greek, Cyrillic, Kanji? The answer is that there is nothing in this standard to prohibit their use. The range of character codes must be at least 0..127; it may be more. In particular, a 16-bit character set may be used. All that this standard requires is that the codes 0..127 bear their ASCII meanings. In order to cope with extended alphabets it may be necessary to have both 8-bit and 16-bit strings; as atom names are immutable they may be stored in the most economical format.

Numbers

A Prolog implementation must support at least 16-bit integers. There is no requirement that any other form of number be handled. It is strongly urged that arbitrary precision integers (bignums) should be supported, it's easy enough in all conscience. Floating point numbers should have at least the precision and range of the host machine's native single precision floating point format if it has one, or of IEEE single precision if it has not. Common Lisp "double" is recommended.

A rational and a complex number may never unify or compare equal.

A rational number equal to some integer is that integer. There is no way for a Prolog program to distinguish between 273 and 273/1, nor for that matter between 273 and 546/2. Similarly, a complex number equal to some float (that is, having a zero imaginary part) is that float. There is no way for a Prolog program to distinguish between 27.3 and 27.3I0.0.

Other types of Terms

Other types of terms may be added. However, the compound, atom, var rational, and complex classes are closed. New types (such as Pop or Lisp arrays, hash tables, flavours, pointers to compiled code, data base references, functor pointers, you name it) must be classified as constants, and may but need not have their own recognition predicates. It is recommended that Prolog should not in general be extended to cope with host language data structures; as long as Prolog can receive such structures in arguments, recognise that they are constants, and can pass them on in calls to host language functions or subroutines, greater consistency and portability is obtained by coding manipulations of such structures in host language routines and calling them from Prolog. For example, a Prolog cohabiting with Lisp should not have an "array" predicate, but should be able to call
	lisp(array(X))

Type Testing with Delays

Apart from the var/nonvar distinction, all the types described above make perfectly good sense considered as infinite tables. In a Prolog which uses a more flexible execution strategy, it would be useful to have a way of type testing which will wait if necessary until the term to be tested is instantiated. The new predicate term_type/2 accomplishes this.

term_type(Term, Type) :-
	var(Term),
	wait_for_binding_of(Term),
	% or report an instantiation fault
	term_type1(Term, Type).
term_type(Term, Type) :-
	nonvar(Term),
	term_type1(Term, Type).

term_type1(Term, compound) :- compound(Term). term_type1(Term, atom) :- atom(Term). term_type1(Term, string) :- string(Term). term_type1(Term, integer) :- integer(Term). term_type1(Term, rational) :- rational(Term). term_type1(Term, float) :- float(Term). term_type1(Term, complex) :- complex(Term). term_type1(Term, number) :- number(Term). term_type1(Term, atomic) :- atomic(Term).

Except for the existence of the predicate term_type1, this is to be taken literally. If term_type(X, T) is called, and X ends up bound to 3, term_type is to enumerate the types integer, rational, and atomic in that order. If the table is extended with new types, more specific types are always to be enumerated before more general types. Normally, T will be bound in the call.

This predicate may be used in more conventional Prologs, where it is to report an instantiation fault if its first argument is a variable.

Getting at the Print Names of Constants

There are six predicates which can be used on atomic terms to construct them given their print names or to obtain their print names given the terms.

string_chars(String, ListOfChars)
    if string(String) {
	unify ListOfChars with [c1,...,cn] where String has n
	characters and their codes are c1,....,cn.
    }
    if var(String) {
	check that ListOfChars is a valid list of character codes
	if it is too long, report a string overflow fault
	unify String with a new string containing those codes
    }
    if nonvar(String) and \+string(String) {
	type_fail(1, string)
    }

To check that X is a valid list of character codes:
    if var(X), report an instantiation fault.
    if X = [], succeed.
    if X = [H|T] {
	if var(H), report an instantiation fault.
	if \+ integer(H) or H < 0 or H > max_char, type_fail(2,chars).
	check that T is a valid list of character codes.
    }
    otherwise type_fail(2,chars).

Note that this follows the naming convention: given any string there is always exactly one list of character codes which corresponds to it, but given a list of integers there may not be a corresponding string (the list might be too long, or some of the integers might be outside the range of character codes). Note also that the string is new. If strings are mutable (and if they are not there really isn't the least point in distinguishing them from atoms) this means that we are guaranteed that changing this string will not affect any other string, even if this one happens to have the same characters initially.

There ought to be some way of finding out the range of character codes. max_char/1 seems like a good name, but we want something that fits in with a characterisation of the floating point system. Copy Common Lisp?

atom_string(Atom, String)
    if var(Atom) and var(String) report an instantiation fault.
    if nonvar(Atom) and \+ atom(Atom) type_fail(1,atom).
    if nonvar(String) and \+ string(String) type_fail(2,string).
    if nonvar(Atom) {
	unify String with a new string whose contents are
	the character codes of the name of Atom
    } else {
	unify Atom with the atom whose name is String, creating
	the atom if necessary.
    }

This definition suggests that it is possible for an atom to have any name at all that can be represented as a string. However, some Lisps place different restrictions on strings and atom names. In particular, many Lisps demand that any constant whose name looks like the name of a number is that number, e.g. (FIXP (COMPRESS '(#\1))) is likely to be true. This has not yet been an issue for Prolog, except that in Dec-10 and C Prolog, the atom '1' which the reader can read can be built in no other way. It would be particularly unfortunate to take over Lisp semantics unchanged: there is no reason for a Prolog programmer to suppose that '11Q' has anything to do with the integer 9. If a Prolog implementation would return some object other than an atom because of this or any other restriction, it must not do so, but must report an implementation fault. That is, atom_string(X, "1") may report a fault, but under no circumstances may it bind X to anything other than an atom.

No form of case conversion is permitted. If the implementation can only support monocase identifiers (since no standard predicate has anything other than lower case letters and underscores in its name, and since variable names do not need to be stored as atoms, so monocase atom names could be used) it must react to upper case letters in a name by reporting an implementation fault. Such an implementation can offer a case converting version of atom_string/2 or atom_chars/2 or name/2, but it must have some other name.

atom_chars(Atom, ListOfChars) :-
	nonvar(Atom),
	atom_string(Atom, String),
	string_chars(String, ListOfChars).
atom_chars(Atom, ListOfChars) :-
	var(Atom),
	string_chars(String, ListOfChars),
	atom_string(Atom, String).

Although atom_chars is defined in terms of atom_string, it is perfectly permissible to make atom_chars the primitive and define atom_string in terms of it, or even to omit atom_string entirely. There is a subtle point of this definition: we only require the ListOfChars to be a ground list when the Atom is unbound. When we know the Atom, we can use it to fill in the variables in the ListOfChars. So
	atom_chars(append, [97,X,X,101|R])
must succeed and bind X = 112, R = [110,100].

number_string(Number, String)
    if var(Number) and var(String) report an instantiation fault.
    if nonvar(Number) and \+ number(Number) type_fail(1,number).
    if nonvar(String) and \+ string(String) type_fail(2,string).
    if nonvar(String) {
	if String cannot be parsed as a number, simply fail.
	unify Number with the number represented by String
    } else {
	"print" Number and unify String with a new string of the result
    }

number_chars(Number, ListOfChars) :- var(Number), string_chars(String, ListOfChars), number_string(Number, String). number_chars(Number, ListOfChars) :- nonvar(Number), number_string(Number, String), string_chars(String, ListOfChars).

There are two subtle points here. The first is the same as with atom_chars; number_chars(12,[X,50]) must succeed and bind X=49. The second is what happens when the String or ListOfChars is a valid string or character list, but the characters cannot be parsed as a number? Calling number_string(X,"1.e") looks like some sort of mistake. But when both arguments are bound there are two ways of proceeding. We could take the number, "print" it into an internal buffer, and compare the result with the given string. Or we could convert the string to a number, and compare the result with the given number. In fact it is the former method which is more general. It lets us determine, for example, that number_string(1,"2/3") must be false, without any need to implement general rational numbers. It would be difficult to explain to a novice why number_string(1,"x") should behave any differently from atom_string(fred,"x"). The clinching argument in favour of doing nothing special when the string or character list does not represent a number is that it would be very difficult indeed to check the syntax of an non-ground list of characters. For consistency then, "bad syntax" must fail in all cases. There is a third subtle point which I'd better make clear in the code; the "unification" of the String with the print name of the number (when the number is instantiated) should ignore case. This is a tiresome detail, and only matters if floats are implemented.

Here is the syntax of numbers:
<number>    ::= <integer>
	     |  <ratio>
	     |  <float>
	     |  <complex>

<integer> ::= <sign> <digits>

<sign> ::= <empty> | - + <digits> ::= (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9)

<ratio> ::= <integer> / <digits>

<float> ::= {as in Common Lisp}

<complex> ::= <float> I <float>

The first thing to note is that + is not a sign. It would make sense if it was, but unfortunately
	?- name(X, "+2"), atom(X).
succeeds in Dec-10 Prolog. Given that
	?- X is +2.
succeeds in C Prolog, binding X to the integer 2, I feel that this is an anomaly which should be eliminated.

The next thing to note is that because of the ruling that "bad syntax" simply fails, number_string does not have to understand the syntax of number types which are not implemented.

To be compatible with Common Lisp, the denominator in N/D should not be 0. In PRESS, however, we found it very useful to have +infinity (1/0), -infinity/0), and undefined (0/0) values. Reproducing the behaviour of the LONG package in a low-level implementation of bignums and ratios is quite easy. On the other hand, an implementation which tries to share the world with LISP could find it difficult to do this. So for the moment this standard does not say whether X/0 must be permitted or not.

It should be the case that
	p(X) :-
		number_chars(X, L),
		number_chars(L, Y),
		Y = X.
succeeds for any number X. In the case of integers and ratios this is easy, and if you can do it for floats it is easy to do it for complex numbers. But for floats this is a very stringent requirement, and it means that the floating point I/O routines supplied by the host language (whether LISP, Pop, or C) are unlikely to be usable. There's no helping that, it's a property we must have or give up any pretence that floating point numbers can be handled in logic. Admittedly, it can be very much clearer to round numbers to a few digits before showing them to a human, and there really ought to be a standard predicate for doing that. But these predicates must generate maximal precision, and for some numbers a wee bit more. There is an MC (now CWI) report which gives Algol 68 routines that could be used here.

A final point about number syntax is that number_string defines an internal form for numbers. The external syntax of numbers can be anything you please. Any number of extra leading zeros can be permitted, and underscores can be allowed in numbers as well as in atoms. This is a particularly good idea with bignums; it is much easier to read 123_546_789_652 than 123546789652. And the external syntax can support bases other than 10. Non-decimal notation is not defined here because amongst other things name/2 doesn't understand it; only the read/[1-2] routines of the Edinburgh family Prologs understand it.

name(Constant, ListOfChars) :-
	string(Constant),
	!,
	string_chars(Constant, ListOfChars).
name(Constant, ListOfChars) :-
	number(Constant),
	!,
	number_chars(Constant, ListOfChars).
name(Constant, ListOfChars) :-
	atom(Constant),
	!,
	atom_chars(Constant, ListOfChars).
name(Constant, ListOfChars) :-
	var(Constant),
	number_chars(Constant, ListOfChars),
	!.
name(Constant, ListOfChars) :-
	var(Constant),
	!,
	atom_chars(Constant, ListOfChars). 
name(Constant, ListOfChars) :-
	type_fail(1, atom-or-number).

As usual, it is the effect only which is being defined, and these clauses need not be visible. The main reason for defining this predicate is backwards compatibility. There is a rather strange aspect of it. It was pointed out earlier that number_chars need not recognise the syntax of unimplemented number classes. This means that in an implementation such as Dec-10 Prolog which lacks floats, name(X,"1.2e3") may bind X to an atom (as in fact it does) instead of reporting a representation fault. In this case only, this deviation from the rest of standard is permitted. But reporting a representation fault is also permitted. A Prolog embedded in Lisp may find it convenient to treat name/2 as the fundamental primitive, and check that the result Lisp gives back has the right type. For example,
	name(X, "(QUOTE (F))")
might call the Interlisp function PACKC and then check that the result is an atom or number, and fail on finding that it isn't. name/2 is required to work with any integer or unquoted Prolog identifier; a program that uses it only for those can expect to fit in with any host language. But new programs should generally use atom_chars or number_chars.

It is extremely useful to have a single predicate for converting a constant to a list of ASCII codes. Given that name/2 is already strange, adding the first clause so that it can convert a string to a list of characters increases its usefulness without increasing its strangeness too much. It is already the case that
	nonvar(X), name(X, L), name(Y, L)
does not imply that X = Y; consider the case X='1'. So the fact that name can convert a string to a list of characters but not conversely is not additionally strange. There are two predicates which "construct" numbers from other numbers.

*rational(Q, N, D)		% NOT the real definition!
*   if nonvar(Q) and \+rational(Q), type_fail(1, rational).
*   if nonvar(N) and \+integer(N), type_fail(2, integer).
*   if nonvar(D) and \+integer(D), type_fail(3, integer).
*   if nonvar(N) and nonvar(D) {
*	if the implementation cannot handle 1/0,0/0,-1/0 {
*	    if D = 0, zero_divide.
*	}
*	unify Q with the value of N/D
*   } else
*   if nonvar(Q) {
*	let N1,D1 be such that N1/D1 = Q and gcd(N1,D1) = 1.
*	if nonvar(D) {
*	    if D = 0 then {
*		if D1 =/= 0 then fail.
*		unify N with N1.
*	    } else {
*		if gcd(|D|,D1) =/= D1 then fail.
*		unify N with N1*D/D1.
*	    }
*	} else
*	if nonvar(N) then {
*	    if gcd(|N|,|N1|) =/= |N1| then fail.
*	    unify D with D1*N/N1.
*	} else {
*	    unify N with N1, D with D1.
*	}
*   } else {
*	report an instantiation fault.
*   }

complex(C, R, I) if nonvar(C) and \+complex(C), type fail(1, complex). if nonvar(R) and \+float(R), type fail(2, float). if nonvar(I) and \+float(I), type fail(3, float). if nonvar(C) { unify R with re(C), I with im(C). } else if nonvar(R) and nonvar(I) { unify C with R+i*I. } else { report an instantiation fault. }

There is no problem with complex/3. We could almost define it in Prolog as
	complex(c(R,I), R, I) :-
		float(R), float(I),
		I \== 0.0.
	complex(R, R, 0.0) :-
		float(R).
R and I cannot be rational numbers. If they could be, we'd run into the problem discussed in the next paragraph.

There is quite a nasty little problem with rational/3. For any given pair of integers N,D there is exactly one rational number N/D (possibly including +infinity, -infinity, and undefined as rational numbers). However, for any given rational number Q there are infinitely many ways that of expressing it as the quotient of two integers (with the exception of 0/0). According to the design principle, then, rational(Q,N,D) for var(N) and var(D) should report an instantiation fault or delay until at least one of N and D is bound. But according to the description above, it will succeed uniquely. This has the extremely nasty effect that
	rational(1/2, N, D), D = 4
will fail (as rational/3 bound N=1, D=2) but
	D = 4, rational(1/2, N, D)
will succeed. It could be argued that this is due to rational/3 trying to be too clever, and when nonvar(Q) it should not take the other arguments into account. That would make both calls fail. But that would mean that
	rational(Q, 2, 4),
	rational(Q, 2, 4)
would fail. If we ever hope to prove anything about Prolog programs, this sort of behaviour in the primitives is totally unacceptable. The conclusion I am forced to is that there is no one definition of rational/3 which is satisfactory. Instead we have to use

divide(Q, N, D)
    if nonvar(Q) and \+rational(Q), type_fail(1, rational).
    if nonvar(N) and \+ integer(N), type_fail(2, integer).
    if nonvar(D) and \+ integer(D), type_fail(3, integer).
    if var(N) or var(D), instantiation fault.
    unify Q with N/D.

rational(Q, N, D) if nonvar(Q) and \+rational(Q), type_fail(1, rational). if nonvar(N) and \+ integer(N), type_fail(2, integer). if nonvar(D) and \+ integer(D), type_fail(3, integer). if nonvar(Q) { let N1,D1 be such that Q=N1/D1, D1 >= 0, gcd(N1,D1) = 1. unify N=N1, D=D1. } else if nonvar(N) and nonvar(D) { if D1 < 0 or gcd(N1,D1) =/= 1, simply fail. unify Q with N1/D1. } else { instantiation fault. }

The purpose of divide/3 is to construct a rational number by performing the division. The purpose of rational/3 is to construct a rational number by packing together two integers which already have the right form, or to unpack a rational into two integers. These definitions are free of the problems which made the previous definition of rational/3 worthless.

Recall that these predicates may have to report a representation fault if the complex or rational data types are not implemented. In the interests of helping people debug their code, a representation fault may be reported even when the result is representable, that is
	rational(X, X, 1)
	complex(X, X, 0.0)
may report a representation fault.

Constructing Compound Terms

There are three predicates which are used to construct compound terms and take them apart, in addition, of course, to pattern matching. They are functor/3, arg/3, and =../2.

functor(Term, Fsymbol, Arity)
    if nonvar(Fsymbol) and \+atomic(Fsymbol), type_fail(2,atomic).
    if nonvar(Arity) and (\+integer(Arity) or Arity < 0),
	type_fail(3,natural).
    if var(Term) {
	if var(Fsymbol) or var(Arity), instantiation fault.
	if Arity = 0 {
	    unify Term = Fsymbol.
	} else {
	    if Arity > some limit, implementation fault.
	    construct a new term with principal functor
		Fsymbol and arity Arity, all of whose
		arguments are distinct unbound new variables
	    unify Term = this new term
	}
    } else {
	if atomic(Term) {
	    unify Fsymbol = Term, Arity = 0.
	} else {
	    compound(Term) is true.
	    unify Fsymbol = the principal function symbol of Term
	      and Artiy = the arity of Term
	}
    }

This looks amazingly complicated, but it mostly follows from the design principle. If Term is a nonvariable, or Fsymbol and Arity are both nonvariables, we have enough information to succeed at most once, otherwise there is an instantiation fault, and instantiation faults should be checked for before type failures. The Term could be anything. The Arity must be a non-negative integer, and the call should type fail if it is not.
The Arity argument of functor/3 and the ArgNo argument of arg/3 MUST NOT be integer expressions. The call MUST type fail if given such erroneous values.
The inquisitive programmer is entitled to use functor(a(1,2),a,1+1) to discover what a type failure looks like. When the Term is a variable, permitting integer expressions here looks like a useful extension. However, permitting integer expressions here would preclude efficient compilation of a construct which must be implemented as efficiently as possible, because although high level code may not use functor/3 much, low level support code uses it extensively. Generally, arithmetic expressions are the kiss of death to efficiency in Prolog. Further, it is difficult to come up with a single definition of what it means to have an arithmetic expression here which makes sense in all cases. If
	functor(T, a, 1+1)
were to be allowed, consistency would mean that
	functor(a(1,2), a, X)
would have to be able to enumerate X=2, X=1+1, and all the other integer expressions with value 2. This is obviously silly, and the extra "generality" of allowing an integer expression when you are building a term is spurious, as you can always write
	N is Expr,
	functor(Term, F, N)

The one thing which is not obvious is that functor(C,C,0) must succeed for any constant C, so that if constant(C), functor(C,F,N) binds F=C,N=0, and constant(T,C,0) binds T=C. Since there are values of N for which functor(T,72.3,N) succeeds, we can only type fail if the function symbol is a compound term.

An implementation is permitted to impose an upper bound on the arity of a term. However, all arities up to 200 must be accepted. This is the only Prolog primitive which needs to check this limit. =.. is defined in terms of functor, so no separate check is needed. Note that the error reported is an implementation fault, not a user error. functor(T,a,9999) is perfectly well defined, if a Prolog can't do it that's its fault, not the programmer's.

arg(ArgNo, Term, Argument)
    if var(ArgNo) or var(Term), instantiation fault.
    if \+integer(ArgNo) or ArgNo < 1, type fail 1.
    if constant(Term) and \+ atom(Term), type fail 2.
    if ArgNo > the arity of Term, simply fail.
    unify Argument with the ArgNoth argument of Term.

Note that arg(1, a, X) simply fails, while arg(1, 0, X) type fails. This is because 'a' might have come from a call functor(T,a,N) where N could take on any non-negative value, and it makes sense to consider it as the "empty" case of a "vector" class. But no other atomic value has any related compound terms, so it makes sense to type fail for them. A common pattern for low level code is
foo(Old, New, Context) :-
	var(Old),
	!,
	var_foo(Old, New, Context).
foo(Old, New, Context) :-
	nonvar_foo(Old, New, Context).
foo(Old, New, Context) :-
	functor(Old, F, N),
	functor(New, F, N),
	foo(N, Old, New, Context).

foo(0, _, _, _) :- !. foo(N, Old, New, Context) :- succ(M, N), arg(N, Old, OldArg), foo(OldArg, NewArg, Context), arg(N, New, NewArg), foo(M, Old, New, Context).
The succ/2 call shields the arg/3 calls from N < 1. If the succ/2 call were moved down, and the cut were not there, the possibility of arg/3 being called with N=0 would arise, and in this context we would prefer arg(0, 72.3, X) just to fail. If we habitually use the template shown (and if the compiler is smart enough the cut can be dispensed with) it doesn't matter whether arg/3 fails or type fails for N=0 or for general constants, and the usefulness of picking up likely errors suggests that type failure is best.

=.. (pronounced "univ") converts a term to a list and vice versa. It is defined in an appendix. That definition relies on length/2 (as defined above) and functor/3 to generate all appropriate instantiation faults and type failures. It is interesting to note that
	X =.. [F,A1,...,An],
where the list is spelled out in the code, can be compiled nearly as efficiently as
	X = f(A1,...,An),
certainly without ever generating the actual list. While the general case of univ is to be avoided as turning over more non-(local stack) store than the use of functor and arg, this case is quite reasonable. If it were generally appreciated that this can be compiled efficiently, some of the demand for "variable functors" might go away, but I suppose that is too much sense to hope for.

While these three predicates are all we actually need in the standard for term hacking, there are three more that I would like to introduce.
same_functor(T1, T2) :-
	same_functor(T1, T2, _, _).

same_functor(T1, T2, N) :- same_functor(T1, T2, _, N).

same_functor(T1, T2, F, N) :- nonvar(T1), functor(T1, F, N), functor(T2, F, N). same_functor(T1, T2, F, N) :- var(T1), functor(T2, F, N), functor(T1, F, N).
The reason for wanting these predicates is that they are reversible. With them available, we can write
foo(Old, New, Context) :-
	var(Old),
	!,
	var_foo(Old, New, Context).
foo(Old, New, Context) :-
	nonvar_foo(Old, New, Context).
foo(Old, New, Context) :-
	same_functor(Old, New, N),
	foo(N, Old, New, Context).

foo(0, _, _, _) :- !. foo(N, Old, New, Context) :- succ(M, N), arg(N, Old, OldArg), arg(N, New, NewArg), foo(OldArg, NewArg, Context), foo(M, Old, New, Context).
and so propagate this reversibility up a level without too much pain. In Dec-10 Prolog (where amongst other things we can't swap the arg/3 and foo/3 calls in the last clause as I just did) the demands of "efficiency" force logic out into the cold when one writes low level term hacking code. This should not be so. A standard conforming implementation can defeat the point of these predicates by implementing them exactly as written, but at least reversible code will still run. The point of putting these predicates in the standard, of course, is to prevent the names being used for something else.

Term Comparison

Dec-10 Prolog introduced the idea of a standard total ordering of terms, defined (to quote the Dec-10 Prolog manual) as follows:
variables, in a standard order (roughly, oldest first - the order is NOT related to the names of variables);

integers, from -"infinity" to +"infinity";

atoms, in alphabetical (i.e. ASCII) order;

compound terms, ordered first by arity, then by the principal function system, then by the arguments in left to right order).

This predicate is defined because sort and keysort need it, and they are defined because bagof and setof need them. The existence of a total order on terms permits the the implementation of efficient algorithms for many data structures, such as linear time set operations, N^2lgN graph operations, and the like.

The fact that the ordering is defined for variables presents difficulties. There are logical anomalies, such as the fact that
	X @< 1
may be true at one time (because X is unbound) and false at another (because X is bound to 72). Undoubtedly the most serious problem is that there is an unstated assumption. So long as we do not bind any variables, the order of two variables may not change. The easy way to ensure this is the way C Prolog does it: don't have a garbage collector. The next easier way to ensure it is the way Dec-10 Prolog does it: have a compacting garbage collector for the global/copy stack, and never mind what the heap does. But PopLog cannot do either of these things. It has a garbage collector, and it is a stop and copy one. This means that the addresses of variables cannot be used to define their order. A third way is to make Prolog variables into record structures (which PopLog already does) and to timestamp them. This might seem rather costly, but then so is an O(N^2) setof costly. Let's put that to one side for a moment.

Another problem is where do the other data types fit into this order? In particular, strings, floats, and complex numbers? As elsewhere, my response to this problem is to invent a new principle. The principle here is the interval principle:
"Major" types should correspond to intervals in the standard order.
In the existing standard order, the major types are thus true, var, nonvar, compound, atomic, number, atom.

This doesn't uniquely define where things like strings should go. Strings are more like atoms than they are like any other data type, so they should be adjacent to atoms in the standard order. But there is another "data type" which is unnamed: the set of terms T for which
	functor(T, F, N), atom(F)
is true. That should be an interval too. So strings cannot go between atoms and compound terms. The only place left for strings is less than any atom but greater than any other kind of constant.

The position of other data types (such as bit maps) is even less clear. One thing which is absolutely clear is that nothing can be less than variables, or that will spoil the interval property. It is the case in Dec-10 and C Prolog that
var(X) :-
	X @< _.
and it really would be a pity to lose this property. (The anonymous variable is younger than any other variable, and the variable ordering is by age.) So other data types must be greater than variables and less than strings. But this doesn't define their order with respect to numbers. Indeed, they could fall into two groups, one on either side. However, another unnamed "type" might be of interest, the "standard" constants, and that means that non-standard constants must be less than numbers.

There is a problem with complex numbers. There is no way that we can provide a total order which is compatible with the standard arithmetic ordering on complex numbers, as the standard mathematical definition has it that no ordering on the complex numbers exists. But that needn't stop us assigning whatever order we like, it just means that it has no connection with algebraic laws. An obvious ordering which is easy to implement is lexicographic order, ordering floating point numbers as if they were complex numbers with 0 imaginary part. (The Common Lisp manual says that the complex number #C(X,0) is the number X. This should be part of the Prolog standard as well.)

The extended standard ordering is thus
variables, in some time-invariant order;

all "other" atomic objects, in some time-invariant implementation-defined order;

integers and floating point numbers, ordered according to numeric value (but see section 7.4 for 2 -vs- 2.0), and complex numbers in lexicographic order;

strings, in lexicographic order;

atoms, in lexicographic order;

compound terms, ordered first by arity, then by function symbol (always an atom), then by arguments from left to right.

The fundamental predicate here is
compare(R, X, Y)
    If X comes before Y in the standard order, unify R with '<'.
    If X is identical to Y, unify R with '='.
    If X comes after Y in the standard order, unify R with '>'.

There are six term comparison predicates defined in terms of compare/3.
X @< Y :-
	compare(<, X, Y).

X @>= Y :- \+ compare(<, X, Y).

X == Y :- compare(=, X, Y).

X \== Y :- \+ compare(=, X, Y).

X @> Y :- compare(>, X, Y).

X @=< Y :- \+ compare(>, X, Y).

There are two kinds of Prolog implementation where these predicates are not desirable. One is implementations like PopLog, where it is difficult to order variables. The other is implementations like MU Prolog which have coroutining capabilities, and would rather wait until they can ascribe a definite order. One point of this proposed standard is to ensure that certain predicates, even if not implemented, shall not have their names used to mean something else. It is therefore incumbent on me to propose some useful alternative for these other systems so that they shan't make these names do the wrong things. Therefore I define

term_compare(R, X, Y) :-
    if X and Y are identical, unify R with '='.
    otherwise, find the first place in a left to right scan
    where they differ.  If either one has a variable in that
    place, wait for that variable to be bound and try again
    if the implementation supports coroutining or report an
    instantiation fault it it doesn't.
    call compare(R, X, Y).
This is not quite the same as demanding that X and Y be ground. term_compare(=,A+B,A+B) should succeed. What is required is that the call be delayed until X and Y are either identical or clearly different in a place where the ordering can be determined. When term_compare eventually returns a result, it returns the same result that compare would then. The point is that if term_compare says that the ordering is such and such, it will never change its mind. (Unlike compare/3 with variables.) The infix relations are

X $< Y :-
	term_compare(<, X, Y).

X $>= Y :- term_compare(R, X, Y), (R = '=' ; R = '>').

X $> Y :- term_compare(>, X, Y).

X $=< Y :- term_compare(R, X, Y), (R = '=' ; R = '<').

There is a subtle point here. X $>= Y is not the same thing as not(X $< Y). The latter goal will be delayed until X and Y are ground, or in a non-coroutining system will report an instantiation fault if X and Y are not both ground. But the former only requires them to be sufficiently instantiated to determine the result.

We would also like to have $= and $\=. However, they can proceed even sooner than term_compare(R,X,Y). Namely, term_compare(R,X,Y) has to wait until the first possible difference between X and Y is either made definite or removed by instantiation. We can define $= and $\= easily enough. A non-coroutining system should report an instantiation fault instead of delaying.

X $= Y
    unify X=Y.  If this fails, fail.
    If no variables were bound, succeed.
    Otherwise, undo the bindings and wait
    until some variable in X or Y is bound.

X $\= Y unify X=Y. If this fails, succeed. If no variables were bound, fail. Otherwise, undo the bindings and wait until some variable in X or Y is bound.

This gives us rather a lot of "equality" predicates. X = Y will try to make X and Y the same. X $= Y will wait until someone else makes them the same. X == Y wants them to be the same now. And X =:= Y will evaluate two arithmetic expressions.

A lot of my code uses term comparison. Most of it should be using the 'term_compare' family rather than the 'compare' family. I do not enjoy knowing that my "efficient" code for ordered sets depends on the unchecked (though documented) assumption that the elements are "sufficiently" instantiated. I really do think that the existing comparison family should be reserved for meta-logical hackery and that most user code should be using the term_compare family.

Sorting

Dec-10 Prolog has two sorting predicates:
keysort(PairList, SortedPairList) sorts a list of Key-Value pairs so that the Keys are in standard order, retains duplicates, and unifies the result with SortedPairList. This is needed for bagof/3 so that solutions with the same bindings for free variables can be grouped together. It is vital that this sort be stable.

sort(Random, Sorted) sorts the elements of the Random list into standard order, eliminates duplicates, and unifies the result with Sorted. This is used by setof/3 to convert a chronologically ordered list to a set representation, and is used by the ordered-set library package to convert lists to sets.

C Prolog added two more predicates:
msort(Random, Sorted) sorts a list of terms as sort/2 does, but does not eliminate duplicates. This is useful for putting bags into a standard form.

merge(L1, L2, Merged) combines two ordered lists, retaining duplicates.

Now there are many variations on sorting that could be useful. It would occasionally be useful to sort a list into descending order instead of sorting it into ascending order and then reversing it, to use other pairing symbols than '-', such as '=', or to use some argument of an element other than the first as a key.

Therefore I have chosen to standardise, not the Dec-10/C Prolog routines, but a more general pair sort/4 and merge/5 in terms of which the current predicates can be defined. An appendix gives the definitions of these predicates, and also definitions of sort/2, msort/2, keysort/2, merge/3 in terms of sort/4 and merge/5. The details of the code are not to be regarded as part of the standard. I have a C version of these routines which is quite efficient.

sort(Key, Order, Random, Sorted) sorts the list Random according to the Key and Order specifications, and unifies Sorted with the result. Key is a non-negative integer. If Key=0, it means that the entire terms are to be compared. If Key=N>0, it means that the Nth arguments of the terms are to be compared. Order says whether the list is to be sorted into ascending (<, =<) or descending (>, >=) order and whether duplicates are to be retained (=<, >=) or eliminated (<, >). The way to remember the Order argument is that it is the relation which holds between adjacent elements in the Sorted result.

A warning to implementors: don't use quicksort! Quicksort is not the worst possible sorting method for Prolog but it comes pretty close. Merge sort (or some variation on the natural merge) in fact does fewer comparisons and less work generally; the reason the people believe quicksort to be good is that papers evaluating it have been concerned only with sorting arrays of integers. Merge sort is also a lot simpler to code in Prolog, as the appendix shows.

List Processing

There are four operations on lists which are so very useful that they ought to be present in a Prolog system even before any libraries have been loaded. The fact that they are not so present means that a great many programs have tended to include their own versions of these predicates, which causes endless trouble when they are loaded into a Prolog where they are present. It is time this was stopped.

length/2 has already been described. The others are

append([], L, L).
append([H|T], L, [H|R]) :-
	append(T, L, R).

member(X, [X|_]). member(X, [_|T]) :- member(X, T).

memberchk(X, [X|_]) :- !. memberchk(X, [_|T]) :- memberchk(X, T).

The definitions of member/2 and memberchk/2 are to be taken literally. Too many programmers have been unable to resist the temptation to use memberchk(X, L) to add X to the end of a list with an unbound tail for any other definition to be acceptable.

append/3 is another matter. The clauses above define the values it computes and the order it computes them in. However, it may report an instantiation fault instead of entering a runaway recursion, or delay in a coroutining Prolog, and it is strongly recommended that it should. It may also type fail, while member/2 and memberchk/2 may not.

It is worth noting that if L is bound to a term [T1,...,Tn] (that is, it is a list and does not end with a variable, then
append(L, X, Y) has at most one solution, whatever X and Y are, and cannot backtrack at all.

append(X, Y, L) as at most length(L)+1 solutions, whatever X and Y are, and though it can backtrack over these it cannot run away without finding a solution. A Prolog implementation is required to find all the relevant solutions. In this sense append/3 is not primitive, because a primitive would be allowed to delay or instantiation fault.

append(X, L, Y), however, can backtrack indefinitely if X and Y are variables. This may be reported as an instantiation fault.

The question does arise: what does functor([_|_], Cons, 2) bind Cons to, and what does name([], Nil) bind Nil to?

Historically, the answers have been Cons=(.), Nil="[]". There is little or nothing to gain in a new Prolog by changing the Cons symbol. Indeed, several new Prolog systems represent list cells in a special compact fashion where the Cons symbol or functor is not actually stored. Having the Cons symbol as (.) means that with a suitable operator declaration a.b.c.[] is an acceptable way of writing the list [a,b,c], and it would be a pity to lose this. A Prolog embedded in Lisp might as well continue the Cons=(.) tradition. And of course there are programs which ask functor(X,.,2) to test whether X is a list instead of asking X=[_|_].

The name of [] is another matter. There is a substantial benefit to Prologs embedded in Lisp or Pop if Nil="NIL" or Nil="nil" is permitted, or perhaps even Nil="()". Now there is already something strange about the token [] in Prologt. There may be any number of spaces between the [ and the ]; in fact [] is really a pair of tokens and is recognised by the parser rather than the tokeniser. In Prolog code there has never been anything to gain by knowing exactly what atom represented the empty list, provided the reader always maps [] to the same atom. Therefore,

The empty list may be represented by one of the atoms '[]', '()', 'nil', or 'NIL', the choice made once and for all by the implementor. It may not be represented by any other atom. The parser must turn the empty list [ ] into this atom.

So neither the truth nor the falsehood of []='[]' or []=nil is guaranteed. But the truth of length([],0) is guaranteed even though that of length(nil,0) is not, and the truth of []\='Nil' is guaranteed.

Operations on Strings

Edinburgh Prologs have not had any string operations. This is for an excellent reason. They have lists, and they have atoms. Since lists of ASCII codes can be used to represent strings, and since that makes all the operations available for lists immediately available for strings, using some other data structure would make strings harder to use. One recent proposal smacks of the Wisdom of Gotham: it was suggested that compound terms be used for strings, but that the binary functor concerned should not be the list functor. The advantage of that suggestion is that you don't save any space; if there is special provision for lists (as there is in Quintus Prolog, a list cell costing only 2 words instead of the 3 that general binary terms require) you actually lose a substantial amount of space efficiency; and you not only can't use append/3 to concatenate two strings, you can't use grammar rules to do general pattern matching! Now the ability to write SNOBOL-like patterns for string processing using Prolog grammar rules is well worth keeping. Even if a Prolog implementation does support strings as a "primitive" data structure, it would be excessively foolish not to provide for the input of lists of ASCII codes in some form of string notation.

A Prolog implementation is not required to support strings in any way. Indeed, implementors are advised not to implement strings; that way Prolog programmers will get more string processing power and convenience from the language. However, if a Prolog system is embedded in some other programming language which already provides strings, or if it communicates with some other language using remote procedure calls and has to be able to accept strings, then for the interface with the other language it is useful to have strings as a primitive data structure.

Strings can be compared using the term comparison predicates of section 5.9. Conversion between strings and other objects can be done with the predicates of section 5.7.

string_size(StringOrAtom, Size)
    if nonvar(Size) and (\+integer(Size) or Size < 0),
	type_fail(2, natural).
    if atom(StringOrAtom) {
	atom_chars(StringOrAtom, L), length(L, Size)
    } else
    if string(StringOrAtom) {
	string_chars(StringOrAtom, L), length(L, Size)
    } else
    if var(StringOrAtom) instantiation_fault 
    else type_fail(1, string-or-atom).

string_char(CharNo, StringOrAtom, CharCode) if nonvar(CharNo) and (\+ integer(CharNo) or CharNo < 1), type_fail(1, natural). if nonvar(CharCode) and (\+ integer(CharCode) or CharCode < 0), type_fail(3, natural). if nonvar(StringOrAtom) and \+string(StringOrAtom) and \+atom(StringOrAtom), type_fail(2, string-or-atom). if var(CharNo) or var(StringOrAtom), instantiation_fault. succ(K, CharNo), length(P, K), if atom(StringOrAtom) { atom_chars(StringOrAtom, L), append(P, [CharCode|_], L) } else { string_chars(StringOrAtom, L), append(P, [CharCode|_], L) }

substring(StringOrAtom, Offset, Length, SubString) StringOrAtom must be a string or Atom Offset must be an integer >= 0 Length must be an integer >= 0 if bound, SubString must be a string or atom. atom(StringOrAtom) -> atom_chars(StringOrAtom, L) ; string_chars(StringOrAtom, L). length(Drop, Offset), length(Keep, Length), append(Drop, X, L), append(Keep, _, X), atom(StringOrAtom) -> atom_chars(SubString, Keep) ; string_chars(SubString, Keep).

concat(A, B, C) <-> A is a string or atom, B is a string, atom or number, C is the same sort of constant as A, and the corresponding lists of characters satisfy append(La,Lb,Lc).

concat(A, B, C) if nonvar(A) and \+(string(A) or atom(A)), type_fail(1, string-or-atom). if nonvar(C) and \+(string(C) or atom(C)), type_fail(3, string-or-atom). if var(B), instantiation_fault. if \+(string(B) or atom(B) or number(B)), type_fail(2, constant). string(B) -> string_chars(B, Lb) ; name(B, Lb). if nonvar(A) { string(A) -> string_chars(A, La) ; atom_chars(A, La). append(La, Lb, Lc). string(A) -> string_chars(C, Lc) ; atom_chars(C, Lc). } else if nonvar(C) { string(C) -> string_chars(C, Lc) ; atom_chars(C, Lc). append(La, Lb, Lc). string(C) -> string_chars(A, La) ; atom_chars(A, La). } else { instantiation_fault. }

The first thing to note about these is that they can be applied to atoms as well as to strings, so a Prolog system which does not support strings should still support these predicates. In both substring and concat the type of the result is the same as the type of the first argument.

The name 'concat' was chosen instead of something with "string" in it because 'concat' is already in the Prolog library with this meaning. It allows the second argument to be a number because 'concat' mainly exists for gensym/2; one wants to call concat(step, 47, StepName) and have StepName bound to step47. In fact the Dec-10 Prolog version of this predicate lets you call concat(10,12,X) and X will be bound to the integer 1012. The specification above rules this out as a type failure. It certainly isn't a particularly meaningful operation on numbers.

concat/3 is like append/3. Now append(A,B,C) can be used to solve for any one of A,B,C given the other two. So one might expect that concat(A,B,C) could be used to solve for any one of A,B,C given the other two. Not so. If we know A and B, there is a unique C, and if we know B and C there is at most one A which will do. But if there is any solution for B given A and C, there are at least two and may be three. For example, concat('a', B, 'a1') has the three solutions B=1 {integer(B)}, B='1' {atom(B)}, B="1" {string(B)}. Now in some particular implementation, it might be the case that there were no strings, and that the name of an atom could not resemble the name of a number. In that case B would have a unique solution. But according to the section on representation faults, the fact that the solution B="1" cannot be yielded is not because the logic is different in this implementation, but because the implementation is deficient. The solution's uniqueness being accidental in such an implementation, concat('a',_,'a1') must instantiation fault, otherwise a program using it will stop working in an implementation with more data types.

There are lots of ways of defining substring. We could specify the number of characters to drop as here, or the index of the first character to keep (Drop+1). We could specify the number of characters to keep as here, or the index of the last character to keep (Drop+Keep). This definition is the simplest, and has some nice algebraic properties. Any other substring extraction predicate can easily be programmed in terms of this one. One which is recommended is part_string:
part_string(StringOrAtom, Neck, Before, After) :-
	string_size(StringOrAtom, Size),
	Keep is Size-Neck,
	substring(StringOrAtom, 0, Neck, Before),
	substring(StringOrAtom, Neck, Keep, After).
NB: the operation is not string_part; that would take a string and a predicate and part the characters according to some property.

The argument order of string_char was chosen to resemble arg/3.

Extending these predicates to accept lists of ASCII codes might seem a useful thing to do. However, that would leave us with the problem of distinguishing between the empty list [] and the atom []. So the extension is ruled out.

Simple Integer Arithmetic

The predicates defined in this section are not part of Dec-10 Prolog. Though they are available in the Dec-10 Prolog library (in the file EDXA::UTIL:ARITH.PL). I reluctantly added succ/2 to C Prolog (copying it from Prolog-X) because of the extreme inefficiency of 'is' in that interpreter. The inefficiency is not because it is badly coded. Each individual 'is' call is slowed down by the necessity for coping with floating point arithmetic and the use of a recursive procedure in C to walk over the tree, even when there is only a tiny tree to be walked over. Further, there is a global inefficiency due to the fact that an arithmetic expression is a compound term, which in C Prolog does not have to be constructed each time as it would in say Prolog-X or NIP, but it does tie up global stack space which may never be recovered. In other Prologs, the tree has to be constructed and then interpreted.

One problem with 'is' is that in a goal such as "X is Y+1", Y may at run time be instantiated to an arbitrary ground arithmetic expression, so even if the compiler generates code for arithmetic expressions it has to be prepared to call a tree-walker for the variables in the expressions. The Dec-10 Prolog compiler gets around this by insisting that in a compiled call to 'is' the source variables must be bound to integers and may not be bound to expressions. Only a compiler writer could imagine that this is in any way justifiable; there is certainly no shadow of support for it in any approach to the logic involved, and the discrepancy between compiled and interpreted arithmetic has the usual baneful consequences.

Having introduced succ/2 to C Prolog, I found that I liked its reversibility quite as much as I liked the fact that length/2 now ran twice as fast. So I added plus/3. And I found that I liked that too. between/3 has been around for a long time as a way of generating integers in a bounded range. The other predicates were added to the set for completeness. divide/4 is particularly useful.

These predicates have the following characteristics.
They work on integers, not on expressions.

They are invertible (that is succ(M,N) will solve for N given M or for M given N) and so follow the "design principle" in a way that "N is M+1" does not.

They can be compiled into efficient code with little effort. For example, a compiler noticing that Sum is unbound in the goal plus(X,Y,Sum) can generate the equivalent of "Sum is X+Y" but without the necessity of invoking a tree walker to evaluate X and Y. Further, as the arguments can only be constants or variables (the compiler, detecting anything else, can replace such a goal by code to report a type failure) there is no need to build compound terms at run time or put these variables in the global stack.

They do not seduce beginners into writing "Sum = X+Y", which is a particularly nasty bug, as if Sum later on appears in an arithmetic relation or an 'is' it will appear to have the right value.

A clause containing at most one of the comparison predicates and at most one of the computation predicates is generally clearer than a clause containing "is". This is of course a matter of personal taste.

However, there are things you can do with 'is' that you can't do with these predicates. Most notably, C Prolog, NIP, Prolog-X, and PopLog all provide floating point. The efficiency of these predicates is due to the fact that they do not work on floating point numbers, only on integers. Also, the bitwise operations <<, /\, and so on, are only available via "is". There is some sort of excuse for this. Bitwise operations are less "logical" than simple integer arithmetic (for example, right shifts are not the same as division by powers of two on most machines, but on some machines they are), and floating point arithmetic is notoriously ill-behaved.

The reason for including these predicates in this proposed standard is that there are substantial efficiency gains from using them (in a Prolog which implements them directly), and it would be a pity if one had to choose between efficiency and portability (we know what would get chosen, don't we?). There are also pedagogical advantages from presenting these predicates first; their syntax and behaviour resembles the syntax and behaviour of list-processing predicates such as append/3 far more than does "is".

The Dec-10 library file is given in an appendix, so any Prolog implementation which supports "is" in a standard way can also support these predicates, albeit inefficiently.

Comparison

lt(X, Y) <->
	integer(X) & integer(Y) & "X is less than Y".

if nonvar(X) and not integer(X) then type fail. if nonvar(Y) and not integer(Y) then type fail. if var(X) then instantiation fault. if var(Y) then instantiation fault. if X @< Y then succeed else fail.

le(X, Y) <-> integer(X) & integer(Y) & "X is less than or equal to Y".

if nonvar(X) and not integer(X) then type fail. if nonvar(Y) and not integer(Y) then type fail. if var(X) then instantiation fault. if var(Y) then instantiation fault. if Y @< X then fail else succeed.

There are three ways of comparing two numbers.
lt(X, Y) works only for integers, fails for non integers, and reports an instantiation fault if either argument is a variable and the other is a variable or integer.

X @< Y compares arbitrary terms, and defines an order on variables even. It generalises micro-PROLOG's "LESS" predicate, and is indispensable for the implementation of setof/3, bagof/3, and efficient implementation of sets and finite maps. The fact that "1 @< b" is useful, but when you intend to compare integers only it is even more useful to say so.

X < Y compares arbitrary ground arithmetic expressions, by evaluating both and then calling @< on the results. A Prolog definition for it is given in the second appendix.

The term comparison predicate compare/3 is the fundamental operation used to define all the comparison predicates. It is adequately defined in the Dec-10 Prolog manual, except to note that which ordering is given to variables isn't all that important. PopLog could for example put a "timestamp" in the spare word of a variable record and do comparison on that.

It is important to note that the "law"
	le(X, Y) <-> \+ lt(X, Y)
does not hold; lt(a, b) is false (because a and b are not integers), but that does not make le(b, a) true.

Because apply/2 and call/N for N>1 put the extra arguments at the end of the goal, we can use lt(X) to express the predicate "greater than X" and le(X) to express the predicate "greater than or equal to X". In order to express the predicates "less than X" and "less than or equal to X" we need
	gt(X, Y) :-
		lt(Y, X).

ge(X, Y) :- le(X, Y).
These predicates are not otherwise necessary, but it can be clearer to use them. While they are defined here by Prolog clauses calling the primitives lt and le, error reports produced by them should identify gt and ge as the culprits. If there is any asymmetry in the implementation, this should not be documented, in order to avoid distorting people's coding styles.

As these predicates are not yet in general use, there is time to settle on better names for them. The names that must not be used are <, =<, >, and =, because PDP-11 Prolog and PopLog are alone in failing to evaluate the arguments of those predicates as ground arithmetic expressions (and a great nuisance it is trying to make Dec-10 or C Prolog programs run in PopLog it is too).

When a compiler encounters <comparison>(X, Y), it should do the following.
if X and Y are both integers, the goal should be replaced by "true" or "fail" as appropriate. This case is most likely to arise in automatically generated code such as that produced by the unfolder.

if X or Y is a non-variable non-integer, the call should be replaced by a type_fail call.

if X or Y is a variable which must be uninstantiated at this point, the call should be replaced by an instantiation_fault call.

otherwise the compiler can generate the obvious machine code. There should be an implementation subroutine with a fast calling method that dereferences a variable and checks that the result is an integer; all the arithmetic code can use it.

If anyone wants to make these operators, they are welcome to do so, but questions of syntax apart from relation name and argument order are beyond the scope of this document.

Enumeration

between(Lower, Upper, X) <->
	integer(Lower) & integer(Upper) & integer(X) &
	Lower =< X =< Upper.

if nonvar(Lower) and \+integer(Lower), type_fail(1,integer). if nonvar(Upper) and \+integer(Upper), type_fail(2,integer). if nonvar(X) and \+integer(X), type_fail(3,integer). if nonvar(Lower) and nonvar(Upper) then if Lower > Upper then fail else if var(X) then { try binding X to each of the integers Lower..Upper in turn, but the search order is not specified } else { if Lower =< X =< Upper then succeed else fail } else if nonvar(X) then { if nonvar(Lower) and Lower > X then fail else if nonvar(Upper) and Upper < X then fail else instantiation fault } else { instantiation fault. }

range(Lower, X, Upper) :- between(Lower, Upper, X).

The pseudo-code for between/3 looks quite complicated, but the Prolog programmer doesn't have to understand it. All ia has to understand is the logical definition. The code follows uniquely from applying the design principle to that definition.

The utility of a predicate such as this is obvious. The only thing to decide on is the name and the order of the arguments. There is no incentive to place the Upper argument to the left of the Lower one, so there are only three alternatives:
	foo(X, Lower, Upper)
	foo(Lower, X, Upper)
	foo(Lower, Upper, X)
As with the question of whether it is a good idea to have gt as well as lt, the ground of the decision is "what is a useful partially applied form to give maplist/checklist/mapassoc/...?" The answer is immediate: between(L,U) is a useful combination and the others are not, because X is the only parameter which has a finite solution set and can thus be generated.

So between/3 is uniquely determined as the best possible bounded enumeration predicate. Why then have I included range/3? Just as a gesture of politeness to Harry Barrow, who is the only Prolog programmer not at Edinburgh ever to contribute to the Prolog subroutine library. (Udi Shapiro has contributed the Concurrent Prolog system, but while it contains a number of utility predicates they are not split out so that people can get at them; he has contributed to the program library but not the predicate library.) Now Harry Barrow's contribution does not include the range/3 predicate, but his paper on the VERIFY program uses it in a lot of the examples. I think "range" is a bad name (because it's useful to talk about the range of a function; if range is taken up this way we'll have to talk about the co-domain), and while the argument order is every bit as natural as the argument order of between/3 it is not at all useful with the higher-order predicates. But if we can't talk him into using between/3, let's have range/3 in the standard.

It is clearer to write between(1, 100, X) than it is to write le(1, X), le(X, 100), and when X may be a variable it is better logic (because X can be solved for). But when X is not a variable it would be nice to use more specialised code. A compiler should consider the following cases:
if there is enough information at compile time to determine the result, generate a "true", "fail", "type fail", or "instantiation fault" call as appropriate.

if X is known to be uninstantiated (perhaps this is the first mention of it in the clause), call directly to the enumeration case (which has to check that Lower and Upper are integers in the right order).

it can happen that X is known to be instantiated. If it is mentioned in integer(X) or any other arithmetic relation upstream of this point, it must be instantiated to an integer, and comparison code can be generated as for the two le-s.

This means that in the clause
	case_shift(Ucase, Lcase) :-
		plus(Ucase, 32, Lcase),
		between(0'A, 0'Z, Ucase).
the call to "between" is known not to be enumerating Ucase, but to be a simple test. Of course it would be better to write
	case_shift(Ucase, Lcase) :-
		between(0'A, 0'Z, Ucase),
		plus(Ucase, 32, Lcase).
as that can compute all the solutions.

Should we include gen_nat and/or gen_int in some form as standard generators as well? I'm a bit unhappy about including any form of unbounded backtracking.

Addition and Subtraction

succ(P, S) <-> integer(P) & integer(S) & S = P+1 & P >= 0.

if nonvar(P) and not integer(P) then type fail. if nonvar(S) and not integer(S) then type fail. if integer(P) then { if P >= 0 and S unifies with P+1 then succeed else fail } else if integer(S) then { if S > 0 then unify P with S-1 and succeed else fail } else instantiation fault.

plus(A, B, Sum) <-> integer(A) & integer(B) & integer(Sum) & A+B = Sum.

if nonvar(A) and not integer(A) then type fail. if nonvar(B) and not integer(B) then type fail. if nonvar(Sum) and not integer(Sum) then type fail. if var(A) then { if var(B) or var(Sum) then instantiation fault. unify A with Sum-B and succeed. } else if var(B) then { if var(Sum) then instantiation fault. unify B with Sum-A and succeed. } else { if Sum unifies with A+B then succeed else fail. }

Note: if the implementation does not support bignums, these predicates can report an integer overflow fault.

As I said at the beginning, I originally copied succ/2 from Prolog-X. But in Prolog-X succ(X,Y) and plus(1,X,Y) were identical. Why the difference? (Prolog-X has now changed to this definition, I believe.)

The difference is that plus is for doing arithmetic, and succ is for counting things. If we have plus/3 available, we can use plus(1) whenever we are interested in adding or subtracting 1 from an arbitrary integer, and we might as well get some use out of having succ/2 as well. And the use is that a recursive loop using succ/2 cannot run off to negative infinity.

A common pattern for "counting" loops in Prolog is to count down to 0. We write, for example,
	p(0, , ) :- !,
		<base case>.
	p(N, , ) :-
		<step case>(, ),
		M is N-1,
		p(M, , ).
The reason we need the cut in the first clause is to ensure that the "is" in the second clause won't count down to -1, -2, -3, ... Of course we could (and should!) write this without the cut, by putting a test in the second clause. And that would also get around the possibility of a call with the output arguments so instantiated that the base case fails before it gets to the cut. With 'succ' available, we combine the test and the subtraction, so that the well behaved code is no longer nor more complex than the hacky code. And a compiler could perhaps recognise succ as a form of pattern matching. So we get
	p(0, , ) :-
		<base case>.
	p(N, , ) :-
		succ(M, N),	% N is a successor natural
		<step case>(, ),
		p(M, , ).
If p/3 is called with its first argument instantiated to anything other than a non-negative integer, it will simply fail, without any need for complicated cuts and tests. It is just like
	p(0, ...) :- ...
	p(s(M), ...) :- ...
in that respect. This strong similarity to the obvious pattern matching definition of successor is the reason for succ's utility.

Multiplication

times(A, B, P) <-> integer(A) & integer(B) & integer(P) & A*B=P.

if nonvar(A) & not integer(A) then type fail. if nonvar(B) & not integer(B) then type fail. if nonvar(P) & not integer(P) then type fail. if nonvar(A) & nonvar(B) then if P unifies with A*B then succeed else fail if nonvar(A) & nonvar(P) then % B is var if A = 0 then if P = 0 then instantiation fault else fail else if A divides P then unify B with P/A and succeed else fail if nonvar(B) & nonvar(P) then % A is var if B = 0 then if P = 0 then instantiation fault else fail else if B divides P then unify A with P/B and succeed else fail else instantiation fault

As usual, this definition is forced by the decision that the predicate shall apply to integers and that the design principle shall be followed. Only the argument order is left. The argument order is the same as the argument order of plus/3, which is to say that it has the usual output at the right. This order is good for mapping, e.g. to multiply a list of integers by 7 we have only to call
	maplist(times(7), SingleFold, SevenFold)
We can also use it to test for divisibility: times(7, _, X) is true if and only if X is an integer divisible by 7.

As with plus, it doesn't take a very smart compiler to figure out what special purpose code to generate, e.g. the divisibility test above notes that the second argument is a void variable, so it must be the output if any, so the case to consider is
	if var(X) then instantiation fault else
	if not integer(X) then type fail else
	if X is divisible by 7 then succeed else fail
The main special case to watch out for, of course, is when the third argument is a new variable.

Having two sets of arithmetic relations (the 'is' family and the 'plus' family) may seem like a bad idea. But they cover two different kinds of problem. With the 'is' family we can express involved calculations, while with the 'plus' family we can express arithmetic relations. For example, to test that all the elements of one list of integers are the same multiple of some other list of integers (and this might be of interest in doing calculations on polynomials) we can write
	maplist(times(_), BaseList, Multiples)
which would be much harder to write using 'is' and hard to understand once written. While with 'is' we can write
	D is (sqrt(B^2 - 4*A*C) - B)/(2*A)
which cannot be expressed at all with the 'plus' family (which is restricted to integers in order to have logically simple and efficiently implementable semantics), and which would be completely obscure if it could be so expressed.

Note that in the call times(X, Y, 24), although we don't have enough information to determine a unique solution, we do have enough to determine all 16 (1,24, -1-24, 2,12, -2-12, ..., 24,1, -24,-1). A Prolog implementation which backtracked over the factors of P in this way conforms to the standard. So does one which reports an instantiation fault. And so does one which delays.

Division

Division is a nasty one. Division on the naturals is straight-forward, but when the divisor and dividend can be negative there are at least three definitions in use. (There was an article about this in Software Practice & Experience.) I expect that there will be some disagreement about this. The Gordian knot can be cut by insisting that divide/4 is only defined for non-negative arguments and must FAIL if given a negative number in any of its argument positions (perhaps as a type failure). However, my preference is to adopt the Algol 60 solution:
	X div Y = sign(X/Y)*floor(abs(X/Y))
	X mod Y = X - (X div Y)*Y
that is, division truncates towards 0.

divide(A, B, Q, R) <->
	integer(A) & integer(B) & integer(Q) & integer(R) &
	Q = A div B & R = A mod B

Prolog code for this is given in the Appendix. As usual, if any argument is a nonvariable noninteger, divide/4 type fails. If A and B are given, Q and R are uniquely determined. If B, Q, and R are given, A must be B*Q+R, but it is still possible for divide/4 to fail, as (B*Q+R)divB is not necessarily Q. If A, Q, and R are given, we may have a unique value for B, a failure, or an instantiation fault (see the Prolog code). Otherwise we have an instantiation fault. Note that if A is 0, this determines that Q and R are both 0, but it does not determine B at all. We could continue with B unbound, but the design principle permits us to report an (in this case continuable) instantiation fault. If there is no instantiation fault, nor a type failure, divide/4 will signal zero_divide if B = 0.

Divide is useful because we often want Q and R both. I find it very irritating when writing
	Q is X/Y, R is X mod Y,
in Dec-10 Prolog to know that it will do the division twice. Of course, if I only want the remainder, I can call
	divide(X, Y, _, Rem)
and it is a stupid compiler which can't exploit this.

Optimisation

Most of the predicates discussed in this section are as reversible as is compatible with not backtracking in the implementation language. Some of them have quite complicated descriptions. There are two points to make. One is that the descriptions are if trees, and so don't run all that slowly. Code using this family of predicates is as much as twice as fast as code using the irreversible 'is' family in C Prolog. The other is that most of the generality can be optimised away by a very simple compiler.

What does the compiler need to know? The main thing it needs to know is whether it has seen this particular variable before. If a variable has not appeared earlier in the clause, it must be uninstantiated, and this information is often sufficient to determine which branch of the if tree must be followed. For example, in
	p(A, B, C, Z) :-	% Z is (A-C)*(A-B)
		plus(A, X, B),
		plus(A, Y, C),
		times(X, Y, Z).
this information is all we need to tell which way round to use the two 'plus' calls. The other information is when a variable is instantiated. For the purpose of compiling arithmetic expressions, all we need to know is whether a variable is an integer or not. After an integer(X) test, or after any of 'plus' family calls, we know that each of the variables involved is an integer. This information is all we need to tell that times(X,Y,Z) can be compiled as "unify Z with the product of X and Y", and as that is the most efficient way to do it, that's how it should be compiled. The Prolog type checker can be used to check that calls to 'plus' family predicates only supply integers as arguments.

The result of this sort of optimisation is that we can write clauses using logical relations, and get code which is every bit as efficient as if we had used the irreversible 'is' family. Indeed more efficient, as we don't need to call a tree walker.

Similar optimisation can and should be applied to other primitives, functor and arg in particular.

Delaying and Presburger Arithmetic

It is interesting to note that if we restrict ourselves to the relations, to succ, to plus, and to times with one of the first two arguments bound, a conjunction of such goals is a formula in Presburger arithmetic, and so is decidable. A system such as MU Prolog could exploit this, and (at fairly high cost) actually run such programs as
?- ge(X, 0),		 	% X:int >= 0
   ge(Y, 0),			% Y:int >= 0
   plus(X, Y, T1), lt(T1, 4),	% X+Y < 4
   times(2,X, T2), gt(T2, Y),	% 2X > Y
   times(2,Y, T3), le(T3, X)..	% 2Y =< X
whose unique solution is X=2, Y=1. Detecting this is considerably easier than with 'is'. This isn't much use in practical programming, but it could be important in a "Prolog Technology Theorem Prover" such as Stickel has suggested.

The 'is' family.

All arithmetic in Dec-10 Prolog is based on arithmetic expressions. There are actually quite a few predicates which evaluate one or more arguments:

V is E	evaluates E

X =:= Y evaluates X and Y X =\= Y evaluates X and Y X =< Y evaluates X and Y X >= Y evaluates X and Y X < Y evaluates X and Y X > Y evaluates X and Y

put(C) evaluates C ttyput(C) evaluates C

skip(C) evaluates C ttyskip(C) evaluates C

tab(N) evaluates N ttytab(N) evaluates N

I/O

The principal reason why [tty]put/1 and [tty]skip/1 evaluate their argument is that until recently Dec-10 Prolog lacked a notation for character codes. In order to specify a particular character to be output, one had either to write the Ascii code (as in put(42)) or to use a single-character string (as in put("*")). The need for this has diminished considerably with that advent of "radix 0", as one can now write put(0'*). It was also useful to write put("0"+N), but the fact that Prolog predicates in general do not evaluate their arguments has meant that few Prolog programmers other than the implementors of Dec-10 Prolog have exploited this. I cannot recall seeing any instances of tab(N) with N an arithmetic expression, no doubt tab(3*N) would occasionally be useful, but in general one uses the same argument for tab/1 so often that it is more efficient not to re-evaluate it each time.

ttyput/1 wrings one more feature out of evaluating its argument. Under TOPS-10 (and TOPS-10 only), if the high-order bits of N (all but the bottom 7) are non-zero, the bottom 8 bits are sent directly to the terminal without operating system intervention. So one occasionally sees ttyput(256+C). The whole matter of the user interface needs considerable work, and this solution to one tiny corner of it is unlikely to be useful under any other operating system. Certainly its lack in C Prolog has yet to be noticed.

A redesign of the whole I/O component of Prolog is needed. The most that has ever been claimed by the implementors of Dec-10 Prolog is that the current system is enough to make the Prolog compiler work. Having written BLISS-10 and MACRO-10 code using the TOPS-10 monitor calls to do I/O, I can testify that the TOPS-10 system strongly discourages any experimentation with I/O, writing even the simplest I/O library is so pointlessly hideous a task that anyone who has more rewarding things to get on with (such as implementing TRO) will rightly turn to that as soon as ia may.

Until such a redesign is done, we might as well retain most of the Dec-10 Prolog predicates as they stand. But we do not need to retain this peculiar property of [tty]{put,skip,tab}. A Prolog implementation which supports these predicates, but which insists that their arguments be instantiated to non-negative integers (and for put and skip, integers less than 128) should be considered as "standard". So we arrive at the specifications:

put(C):
	if C is a variable, instantiation fault.
	if C is an integer in the range 0..127
	    append the character C to the current output stream
	otherwise
	    the implementation MAY type fail
	    the implementation MAY evaluate C/\127 as an arithmetic
		expression and append the result to the current
		output stream
	    the implementation may NOT do anything else

ttyput(C): as put(C), but reading "standard output stream" for "current output stream" throughout.

skip(C): if C is a variable, instantiation fault. if C is an integer in the range 0..127 repeat D := next character from current input stream until D = 26 or D = C if D =/= C then fail otherwise the implementation MAY type fail the implementation MAY evaluate C/\127 as an arithmetic expression and call skip() on the result the implementation may NOT do anything else

ttyskip(C): as skip(C) reading "standard input stream" for "current input stream" throughout.

tab(N): if N is a variable, instantiation fault if N is an integer then if N < 0 the implementation MAY type fail write max(N,0) spaces to the current output stream otherwise the implementation MAY type fail the implementation MAY evaluate N\/0 as an arithmetic expression and call tab() on the result.

ttytab(N): as tab(N) but reading "standard output stream" for "current output stream".

Thus for example, it is not defined whether tab(1+2) writes three spaces or type fails, but if it succeeds it may not do anything but write three spaces. Nor is it defined whether tab(-4) succeeds or type fails, but if it succeeds it must have no effect.

If an implementation does support the extended forms of these predicates, so that put("0"+N) does normally work, it is recommended that when the "type failures produce error messages" flag is set these predicates should nevertheless produce type failure error reports. The point of this is that argument evaluation will make it easier to convert Dec-10 Prolog programs, but the type failure messages will help you find the places that need to be changed if the program is to run under other standard-conforming Prolog systems that choose to type fail.

There is a subtle point in the specifications above: /\ and \/ insist on integer arguments and produce integer results. So even if a Prolog system supports argument evaluation in these predicates, it must reject tab(2.7). A Prolog system MAY NOT accept floating point numbers in these predicates. (Do you round up? down? truncate?)

Arithmetic Expressions

An arithmetic expression is a ground term built out of integers, floating point numbers (if you've got them), and at least the following functors:

+ /1	complex conjugate (as in APL)

+ /2 addition (same as PLUS in Common Lisp)

- /1 negation (same as MINUS in Common Lisp)

- /2 subtraction (same as DIFFERENCE in Common Lisp)

* /2 multiplication (same as TIMES in Common Lisp)

/ /2 This one is a problem. See the discussion in the text.

^ /2 exponentiation (same as in Common Lisp)

div /2 divide(X,Y,Q,R) -> Q is X div Y

mod /2 divide(X,Y,Q,R) -> R is X mod Y

num /1 rational(Q,N,D) -> N is num(Q)

den /1 rational(Q,N,D) -> D is den(Q)

re /1 complex(C,R,I) -> R is re(C)

im /1 complex(C,R,I) -> I is im(C)

i /2 complex(C,R,I) -> C is R i I

/\ /2 bitwise conjunction

\/ /2 bitwise disjunction

\ /1 bitwise complement

<< /2 left shift

> /2 right shift

The !(X) and $(X) forms of Dec-10 Prolog are peculiar to that implementation. With bignums available (as they should be), there is no need for them. The point of these operations was that Dec-10 Prolog arithmetic used 36 bit 2s complement for calculations, but could only store 18 bits. (Actually, this isn't quite true, but how many people know about xwd?)

Dec-10 Prolog does not define +X at all. C Prolog defines +X to be X for real and integer X. If complex numbers are admitted, we need a complex conjugate operator. Since the conjugate of a real number is that same real number, making +X be the complex conjugate of X agrees with C Prolog on all the values that C Prolog can represent. Whether complex/3 or rational/3 always give a representation fault or not, +/1, num/1, den/1, re/1, and im/1 must not. However, if complex numbers are not implemented XiY may representation fault even when Y=0.0. Note that arithmetic expression evaluation does whatever coercions are necessary: 2i3 =:= 2.0i3.0 even though complex(C,2,3) must type fail. If rational numbers are not implemented, N/D may yield a floating point number even when N and D are both integers.

If an implementation supports rational numbers, the exponentiation routine may exploit the fact, and treat X^(1/3) differently from X^(0.3333...).

The Evaluation of Lists

Dec-10 Prolog and C Prolog define the value of [X] to be the value of X. (The Dec-10 Prolog manual suggests that this only applies when X is an integer. In fact [1+2] evaluates to 3 in both the Dec-10 and C-Prolog interpreters.) This feature is there so that you can include character codes in your programs. Now that radix 0 exists (in Dec-10 Prolog, C Prolog, Prolog-X, and NIP) this is no longer necessary. However, there is a lot of code that uses it, as 0'C is quite recent. Therefore
it is strongly recommended that new Prolog implementations should evaluate [X] as they evaluate X.
It is, after all, very cheap to do this (especially at compile time), and what else would [X] mean anyway? However, an implementor MAY choose not to implement this feature, but a standard conforming Prolog MUST not define the value of a list to be anything else.

Division and Floating Point

What should the solidus (/) mean? Dec-10 Prolog uses it to mean div. There is no problem with that, as Dec-10 Prolog hasn't got floating point numbers, so div is all that it might mean. The LONG rational package uses it to mean rational division, and introduced div to mean integer division. C Prolog uses it to mean floating point division, which is consistent in approach with LONG's use of it for rational division. This is also what Algol 60 did. However, it doesn't do wonders for existing Dec-10 Prolog programs.

There is a flag in C Prolog called "all_float". By turning this off, the solidus can be made to act like Fortran's division operator. That is, "X is 1/2" will unify X with 0, while "X is 1.1/2.2" will unify X with 0.5. This has proven entirely satisfactory in practice, as Dec-10 Prolog programs can run without change while the solidii are hunted down and changed to div. This satisfaction is unlikely to endure once people here start doing floating point arithmetic in C Prolog, and has a major flaw.

The flaw is that we want to be able to say that 1 < 1.5 and that 1.5 < 2. Since arithmetic comparison is defined in terms of evaluating both sides and then using term comparison, we want term comparison to order integers and floating point numbers according to their values as members of the real number line. This gives us two choices. Either we consider each floating point number X to represent its exact value, or to represent its exact value minus epsilon for some very small epsilon. If we choose the latter, we can distinguish between 2 and 2.0, and have another choice to make: either N < N.0 (epsilon < 0) or N > N.0 (epsilon > 0). Either of these would define a consistent ordering on terms, but neither is more natural than the other. It is difficult to defend either ordering on any grounds other than distinguishing integers and floats. So we have to take the first choice, and say that N =:= N.0 for each integer N. And this leaves us no choice but to say that N == N.0 for each integer N. And prior to the introduction of floating point it was an axiom of Prolog that
	X == Y	->  X = Y
So the natural desire to make "2 =:= 2.0" true forces us to the conclusion that "2 = 2.0" must be true. (This is perfectly natural on mathematical grounds as well.) And that in turn forces the conclusion that integer(2.0).

I find some of these conclusions pretty revolting. We are warned in all good textbooks on numerical analysis "=" and "=/=" are virtually meaningless in floating point computations, and the most one should ever use is abs(X-Y) < epsilon.

C Prolog gets around this problem by being a single language system. The only way you can get a floating point number is by reading one in, getting one from name/2, or by doing some arithmetic, and all of these sources check whether the result is equal to some integer, and if so yield that integer instead. (This never loses any information.) With this guarantee that a number has a floating point representation if and only if it is not an integer, all of these problems go away. The price of this is that the "Fortran-divide" solution doesn't work: 2.0/4.0 clearly "ought" to be 0.5, but the numbers will actually be represented as 2 and 4, so the answer 0 will be computed. The only consistent interpretation we can give to the solidus operator in C Prolog is unconditional floating point division.

I do not know what Prolog-X does. NIP is not yet finished, but its unification code explicitly checks for integer -vs- floating point comparisons, and so accomplishes 2 = 2.0 the hard way. No doubt it will implement integer(X) in a similar fashion. But there are so many primitives that want one or more integer arguments. Is name/2 to accept 97.0 as a character code for "a"? Being single language systems, both these Prologs could adopt C Prolog's solution. Note that we have three choices:
be consistent by forcing floating point results into an integer representation whenever possible. This requires a check in name/2 and is/2 and nowhere else (if there is a routine for constructing "boxed" floats, that is the place to put the check).

be consistent by converting floating point results whenever an integer is wanted. This requires checks all over the place, and is likely to have a substantial performance cost.

be consistent by ruling that 2.0 < 2 but there is no number lying between them

be inconsistent; have 2 == 2.0 but 2 \= 2.0 and not(integer(2.0))
A mixed language system such as PopLog or a Prolog cohabiting with PSL (or even a Prolog cohabiting with Fortran or C) cannot adopt the first solution, because the other language(s) can produce any floating point value they please. The cost of the second solution is prohibitive. It is in any case far too easy to miss a case (such as get0(32.0)) whereupon you're still inconsistent.

Many implementors are likely to prefer the fourth solution, on the grounds that if Lisp says (EQN 2 2.0) then Prolog should agree. On the other hand, whenever you have a floating point value 2.0, round-off error means that the true value might have been 2.0-epsilon, and really one has no business using =:= or =\= to compare floating point values at all. Now if N is an integer and X is a floating point number different from N, the following are all true:
	X < N iff X-epsilon < N
	X > N iff X-epsilon > N (since X \= N)
	X > N iff X-epsilon > N
	X < N iff X-epsilon < N (since X \= N)
and of course all comparisons of floats with floats are preserved. Since epsilon is an infinitesimal, X =:= N will never succeed when it should have failed, and though it might fail when it should apparently have succeeded, round-off error (perhaps in the input routine) might have made it fail anyway.

There are many good reasons for distinguishing between integers and floats. For any integer N, we expect that N+N+N will be an integer (or that the system will announce an integer overflow) and that (N+N+N)divN will be 3. But if X is a floating point number which happens to equal N, X+X+X will not in general be the representation of an integer value (because of roundoff error), and if X is positive we can confidently expect that (X+X+X)/X < 3.0 for some values of X.

In order to maximise both efficiency and clarity, then, we have no choice but to rule that
no floating point value ever unifies with, compares equal to as a term, or is regarded as having the same numerical value as an integer

An unpleasant property of floating point representations is that
	float(N) < N and float(N+1) > N+1
can both be true. So it seems unimportant to me whether N.0 is always taken to be less than N, greater than N, either depending on the sign, or either depending on the 17th bit when the moon is full. We're not going to get anything very pleasant no matter what we do. We should insist that if X < and M < N then X < N but this is easily accomplished by perturbing each X based on X's value and not on the value we are comparing it with.

I'm afraid that in this case consistency with the host language can go hang. If you have a program in Lisp (or FORTRAN) that depends on N.0 comparing equal to N for some range of integers, you can be quite sure that it will stop working if you run it on any other machine.

The final question about floating point is what special functions should be available and how should they be defined. Answers such as "copy FORTRAN" make sense. But I would like to make a different suggestion: copy Common Lisp. That is, any extension to the range of forms understood by 'is' beyond those described here should imitate Common Lisp as closely as possible. Given that a large chunk of the Lisp community has decided to pull together instead of apart (and even InterLisp-D has added some functions from Common Lisp, though it has not changed its old ones), it is difficult to imagine any excuse for making Prolog differ from Common Lisp where we can make it agree.

Special "constants"

C Prolog adds "pi", "e", "heapsize", "stacksize", and "cputime" as nullary operations. They return special constants, or information about the current state of the program. The former use is quite a good one, and implementation constants such as min_int, max_int, macheps, and so on (see almost any discussion of software portability) would naturally be handled in this way. The latter use is not really satisfactory. It is a substitute for Dec-10 Prolog's "statistics" predicate, and portability would have been better assisted by something that looked like "statistics". I find it unpleasant that an arithmetic expression can have one value at one time and other value at another time, and it doesn't help program transformation tools (such as the unfolder) much. There is little to lose by having an implementation-dependent
	statistics(Key, ValueOrList)
predicate. In order to make life easier for compilers, other program manipulation tools, and human readers, I suggest the following design principle for arithmetic operations:

The value of any arithmetic expression f(X1,...,Xn) may only depend on the functor f/n and the values of the arguments X1,...,Xn.

This doesn't mean that C Prolog can't have its little instrumentation functions, only that they aren't standard, and that a code unfolder is entitled to rearrange code in ways that will change the values of heapsize, stacksize, and/or cputime. New Prolog systems should provide a statistics/2 feature, perhaps in a machine dependent module.

We should be able to agree on a set of basic constants (and machine- dependent constants) which could usefully be in the standard. There are currently no standard special constants.

Bitwise Operations

We would like to get the same answers no matter what machine we are using, provided its integer arithmetic has "enough" bits. The way to ensure this is to regard each positive integer as a bit string with infinitely many zeros to the left, and each negative integer as a bit string with infinitely many ones to the left. This doesn't mean that the machine has to do 2s complement arithmetic, only that whenever we perform bitwise operations we get the expected answer. (We could have defined sign and magnitude, but what happens to the sign bit when you shift? And sign and magnitude and ones complement both have the negative 0 problem.)

This definition also makes sense of bitwise operations on arbitrary precision integers.

All the bitwise operations demand integer operands. This definition of what a number means as a (logically infinite) bit string makes sense of bitwise operations on arbitrary precision integers. The result will be stored in the most compact format available, but this only matters in a mixed language system, since Prolog does not and MUST not have any way of telling how an integer is stored. (There's no reason why the "haulong" function of Lisp could not be provided. But that is defined solely in terms of a number's value.)

X /\ Y does a bitwise and on X and Y.
X \/ Y does a bitwise or on X and Y.
\Y does a bitwise complement.
X << N shifts X left N bits (right if N < 0).
X >> N shifts X right N bits (left if N < 0).

An interesting thing to note about shifts is that if we had chosen any specific word length (such as 18 bits) we would have had to choose between "logical" and "arithmetic" shifts. But as we regard integers as extending infinitely far to the left (and must do so for portability between machines of different word sizes or to handle arbitrary precision) both kinds of shift coincide. What we get is arithmetic shifts. And although the manual doesn't actually say so, that is what Dec-10 Prolog (and C Prolog) have always used.

These bitwise operations are in principle all we need. We could even make do with (X\Y) meaning \(X/\Y) and >>. (<< can be obtained from plus/3.) However, it is often convenient to have more operations around. Common Lisp has an extensive and well thought out range of bitwise operations, and I suggest that we copy them. That is,

No Prolog implementation is obliged to provide any arithmetic facilities beyond those described here.

Any additional arithmetic functions should use the names, argument orders, and semantics of suitable Common Lisp functions, except that nospread functions may be implemented as binary operations.

What happened to the design principle?

Applying the design principle to predicates which evaluate arithmetic expressions we find that: if there any constant in the tree which is not an integer, a float, or a special constant, or if there is an compound term whose principal functor has no evaluation procedure, the predicate in question should type fail. If it doesn't type fail, but the tree contains any variables, the predicate should report an instantiation fault or delay. It isn't particularly hard to get this right. One way is to walk the tree twice, the first time looking for an excuse to type fail and setting a flag if there is a variable. After this walk, if a variable was detected, report an instantiation fault. Otherwise walk the tree again computing its value, possible reporting overflow.

Of course this is expensive. And if we're going to specify error reporting to this level of detail, we might as well specify the order in which the tree is walked, so that if more than one overflow could be reported, you can guarantee which. That would be a reasonable thing to do, but it seems more useful to relax the rule and to say that
the subexpressions of an arithmetic expression may be evaluated in any order

if more than one error could be reported about the same arithmetic expression, exactly one of them must be reported, but it is not defined which

because a continuable error in an arithmetic expression does not imply that there are no non-continuable errors, continuable errors are not continuable when reported from arithmetic expression evaluation

Comparison

The six arithmetic comparison predicates are easily defined:
X =:= Y :- foo(X, Y, =).
X =\= Y :- baz(X, Y, =).

X < Y :- foo(X, Y, <). X >= Y :- baz(X, Y, <).

X > Y :- foo(X, Y, >). X =< Y :- baz(X, Y, >).

foo(X, Y, R) :- A is X, B is Y, compare(R, X, Y).

baz(X, Y, R) :- A is X, B is Y, \+ compare(R, X, Y).

foo and baz are to be taken as macros, and do not form part of the standard.

Note that the signs are "=<" and ">=", not "<=" or "=>". In a logic based programming language which is often used to manipulate logical or denotational formulae, or to implement term rewriting systems, arrows are too precious to waste on comparison. Note further that these six relational operators must evaluate their arguments; the failure of some otherwise excellent Prolog systems to do this makes it disproportionately hard to move Dec-10 Prolog code to them. (PDP-11 Prolog had the same problem, but then PDP-11 Prolog didn't have negative integers, so one had far worse problems with "X < 0" than bothering about whether X was evaluated or not.) It is true that it is useful to have comparison predicates which do not evaluate their arguments. In fact we have three ways of comparing two numbers:
  lt(X, Y)	X and Y must both be integers
  X @< Y        X and Y may be any terms, but the number order is right
  X < Y		X and Y are arithmetic expressions
With this range of expression there is no need to restrict '<'.

{Term comparison will be dealt with in another note.}

A pretty extension would be to define these relational operators as arithmetic functions. The definition would be
if X R Y then	(X R Y) evaluates to X.
otherwise the predicate evaluating X R Y fails.
This could be useful in "X is Y-1 > 0", but its intended use is of course for X < Y < Z where we expect floating point numbers to be involved.

The main reason for bringing this up is not to suggest that it is useful enough to go in the standard, but to suggest that it is useful enough for the standard to rule out any other interpretation of these symbols as arithmetic operators, in particular to rule out an interpretation similar to that of C or PL/I (values 0 or 1).

A note on ':='

Some people have suggested that ':=' would be a more "natural" symbol than 'is'. Since this standard does not define ':=' to be anything, there is nothing to stop someone providing ':=' as an alternative to 'is' if they're silly enough. The point of this section is to say why it would be a bad idea.

The first reason is that ':=' really is not all that natural. It is only natural to Algol/Simula/Pascal/Ada programmers. To a Prolog programmer it looks like a mis-typed '=:='. To a POP programmer, the natural assignment is Expr -> Var. To a Lisp programmer, setf(Var, Expr) might be more natural. Given that 'is' is single assignment, not destructive assignment, an even more "natural" syntax might be let Var be Expr.

This brings me to the second point. Those programmers who understand ':=' will be misled by it. In Pascal,
	X := 1;
	X := 2
is legal, meaningful, and leaves X = 2. In Prolog,
	X is 1,
	X is 2
is legal, meaningful, and equivalent to fail. In Pascal,
	1 := 1
is illegal and meaningless, while in Prolog,
	1 is 1
is legal, meaningful, and true. I fail to see what we can possibly gain by lying to the programmer this way.

The third point is that it really is very stupid to take away a possibly useful symbol when we already have a perfectly good name for the operation in question. A Prolog system could well be augmented with true destructive assignment. For example, in old Dec-10 Prolog (this compiler is no longer available to users, it is just used for building the Dec-10 Prolog system itself), you can write
haulong(N, B) :-
	N1 is local, B1 is local,
	N1 := N, B1 := 0,
	while N1 > 0 do (N1 := N1/2, B1 +:= 1),
	B is B1.
Here := and +:= are genuine destructive assignment. A Prolog system embedded in Lisp or Pop might well have a use for
	atom := Expr
as equivalent to
	T is Expr, lisp(set(atom,Expr))	
While the use of destructive assignment generally would be a regrettable retrograde step, it may have a place in low level interface code, and if it is to be used at all it would be as well to use a symbol which some people understand.

This is too vague to be a requirement of the standard, so it is just a recommendation:
':='/2 should be used only for true destructive assignment

The Interface to the Host Language

Compiling arithmetic expressions (where variables can be bound to arithmetic expressions and not just to integers) is tricky enough. Recognising arbitrary host language forms is out of the question. That being the case, it is difficult to justify recognising any Lisp or Pop forms in arithmetic expressions. A very strong point against using "is" as the interface between Prolog and a host language is that in most host languages it won't work. Consider "X is Y*Z". In InterLisp, * is a comment function. In this standard, * works on floating point values (as it does in PopLog). But in the MacLisp family, while * does mean multiplication, it means integer multiplication.

A Prolog compiler can learn a lot from an arithmetic expression. The first argument to is/2 must end up bound to a number. Any other source variables are known to be bound to ground arithmetic expressions (because otherwise a non-continuable fault would have been reported), and this fact can be used to avoid re-evaluating a variable. E.g. if we have
	p(Y, Z) :-
		Y > 1,
		Z is (Y+1)/(Y-1).
the compiler can stash the result of evaluating Y in the Y > 1 literal away in a handy place in use it in the next literal. But if Y could be bound to an arbitrary host language call, it might have side effects which make this caching give the wrong answer.

It is therefore expedient to restrict arithmetic expressions to performing arithmetic.

There are at least three other ways of calling a host language such as Pop or Lisp. All of them involve an "apply" style of interface, where Prolog instantiates arguments but does not evaluate them. This simplifies the question of which parts of a host call Prolog should just instantiate and which parts it should give to the host to evaluate. Only the top level gets evaluated. If the host language has 'eval' (and Pop has 'popval') this is no restriction.

It seems a good principle for writing Prolog compilers that
a Prolog compiler in a mixed language system should be completely ignorant of the syntax and special forms of the other languages

in a mixed language system Prolog should be able to call functions in the other languages, passing data structures constructed in the same way as data structures passed to Prolog predicates
In general, it will not be possible to implement Prolog procedure calls and host language procedure calls in the same way and an interface primitive will be needed.

Control Structures.

At first sight, defining Prolog's control structures is supererogatory. After all, Prolog is based on Horn clause logic, and once you have explained that clauses are tried from top to bottom and that goals within a clause are tried from left to right, there is nothing left to explain except the subtleties of the cut. Unfortunately, it isn't true. Prolog has several control structures, and some of them look as though they can easily be added to a basic Prolog system by adding ordinary clauses. Unfortunately, the definitions it is easy to add are wrong.

Another reason for including this section is that some useful control structures have been developed which are not in the Dec-10 Prolog manual (though they are in the library). To stop people inventing the same control structures over and over again, we need to have agreed names and semantics for them. It should go without saying, but experience has regrettably shown that it does not, that one of Prolog's greatest virtues is its simplicity and the "flat" nature of Prolog code. To introduce a great many special forms and to nest special forms deeply in a program are both folly. It is reasonable to expect that a Prolog compiler will understand most of the forms in this section, and will not understand others.

A third reason for this section is to permit some optimisations in Prolog interpreters. To this end I suggest the following principle:
"Assert" may rewrite any special form in any way provided that the transformation is provably equivalent to the original clause without reference to any user-defined predicate. Calls to user-defined predicates, and their arguments, may not be rewritten.

The point of this is that if I call
	assert((p(X) :- true,(p(X+1),(Y=1,Y=2))))
and later look at the data base using listing or clause, I have no right to complain if I see
	p(X) :- p(X+1), fail.
but I do have a right to complain if I see any of
	p(X) :- true, fail.
	p(X) :- fail.
	p(X) :- Y=1, p(X+Y), Y=2.
	p(X) :- p(X+Y), (Y=1,Y=2).
	p(X) :- call(p(X+1)), fail.
In particular, conjunctions and disjunctions may be "flattened" (so that the left argument of (A,B) never has ,/2 as its principal functor), 'true' need no appear at all, and variables appearing in a goal position may have call(_) wrapped around them. Further, if a Prolog system is smart enough to spot that a call to some system defined (but not user defined) predicate must fail, it may replace it by a call to some system defined error reporting predicate.

It is important to permit an implementation this freedom so that efficient methods may be used to store clauses instead of having to store the literal form that is given to assert. Note that we do require that the clause head and all calls to user-defined predicates in the body be preserved intact (apart from the inevitable variable renaming). In particular, while 'assert' is allowed to optimise
	end_of_line(C) :- C is 0'J-64.
to
	end_of_line(C) :- C = 10.
it is not allowed to optimise it to
	end_of_line(10).
and it is not allowed to optimise a false clause out of existence. Thus the sequence of clause heads found by clause or listing must be exactly the same in every implementation.

It could be useful to allow other transformations on clauses. One that would be particularly pleasant would be to rewrite grammar rules into ordinary clauses, thus letting a program assert new grammar rules. However, in the interests of portability,
'Assert' must not transform a clause except as permitted in the previous boldface paragraph.
Why is this non-restrictive? Because the user doesn't have to call the assert family directly. If you want to assert grammar rules, there is nothing to stop you defining
	expand_assert(Formula) :-
		expand_term(Formula, Clause),
		assert(Clause).
and using that instead. This applies to any form of macro expansion. Using expand_assert instead of providing hooks in the ordinary assert predicate means that macro expansion has to be implemented only once, and the user does not have to decide whether to expand a certain macro at read time or at assert time.

true

Is defined as if by the clause
	true.
and no others. As with all standard predicates, it must be impossible to change the effect of this predicate, and although it has a clausal definition, that definition need not be visible to the user and if visible need not agree with that given here.

'true' is mainly needed so that clause/2 can supply something in its second argument as the "body" of a unit clause. It is also needed for conditionals, so that we can write (a->true;b->c;d). This too can be seen as supplying the body of a unit clause.

Although "true." and "true. true. true." are logically equivalent, they are not procedurally equivalent. True must succeed exactly once. However, clause(true, X) might report any number of clauses, and any of them might potentially succeed. All that is required is that at most one of them will succeed in any given call, without producing any side effects.

fail

Is defined as if by the clause
	fail :- integer(fred).

Note that there is no guarantee that clause(fail, X) will find no clauses. It might find any number. But whenever called, none of them may succeed, and there may be no side effects, not even any output. (Appearing in a trace is a main effect of the tracer, not a side effect of fail.)

repeat

Is as if defined by the clauses
	repeat.
	repeat :-
		repeat.

These clauses may or may not be visible to the user. The key property of repeat is that it must not use up any stack space. The question ":- repeat, fail." must run until stopped by the user. Repeat is permitted one side effect: on any "call" (but not on "redo") it may print an insulting message on the standard error channel.

repeat(N)

Is as if defined by the clauses
	repeat(N) :-
		ge(N, 1).
	repeat(N) :-
		gt(N, 1),
		succ(M, N),
		repeat(M).
These clauses may or may not be visible to the user. Repeat(N) must not use up any stack space. If given a variable as argument, it must report an instantiation fault. Given anything other than a non-negative integer, it may type fail or it may just fail. Repeat(N) is required to have one side effect: on every "call" but not on any "redo" it must print an insulting message on the standard error channel.

The reason for putting this in the proposed standard is that there are Prolog systems which use this, and we don't want it meaning anything else. A good definition for repeat(X) might have been
	repeat(Goal) :-
		( call(Goal), fail
		; true
		).
i.e. repeat the Goal until it fails. So that programs written for these other Prologs can be run in a standard conforming Prolog, repeat/1 may not be used for this purpose. That's the only reason that repeat/1 is in this standard, to stop the name being used for any sensible purpose.

There is a better way of getting the same effect, and that is to call between(1,N,X) for some possibly anonymous variable X. That is why repeat/1 is required to print an insulting message, so that the code will be converted to the standard form.

call and apply

The fundamental predicate in this group is call/1. For any given X{Footnote: X is a meta-variable, not a Prolog variable}, call(X) is required to behave as if call/1 had exactly one clause and that
call(X) :-
	X.
Just to make this a wee bit clearer, I do NOT mean that call should behave as if there were a clause
call(X) :-		% non-example
	X.		% non-example
Many Prolog implementations would translate this to call(X):-call(X) as they are entitled to. No, I mean that call(p(1,X,Y)) should behave as if the definition was
call(p(1,X,Y)) :-	% example
	p(1, X, Y).	% example
and that call((p, !, q; r)) should beahve as if the definition was
call((p, !, q; r)) :-	% example
	p, !, q; r.	% example
There are two consequences of this which deserve to be spelled out in full. One is that call/1 is more like LISP's "eval" than like "apply". That is, the term being called can be an instance of any control structure at all: a simple predicate call, a forall/2, an if-then-else, another call/1, and so on. The second is that while X may contain cuts, those cuts cannot escape from the call and affect the parent goal. That is call(!) is allowed and is equivalent to true.

A variable G appearing as a goal is equivalent to call(G). Note that this is not quite what you would expect from substitution, and is not what a number of toy Prologs written in Lisp do. Suppose we know that z/1 has no clauses. Then
	?- assert(( z(X) :- repeat, X )),
	   clause(z(X), Body),
	   X = !.
binds Body = (repeat,call(!)) which is equivalent to repeat. But
	?- X = !,
	   assert(( z(X) :- repeat, X )),
	   clause(z(X), Body).
binds Body = (repeat,!) which is equivalent to (!). Now any program where this makes a difference is very badly written. While the fact that a variable goal can be rewritten by assert into call(G) does spoil some of the substitution properties, it is actually a very useful thing for programming. It means that a predicate which calls one of its arguments does not have to worry about a cut being smuggled in inside a Trojan Horse. This integrity benefit is well worth having, whatever the meretricious attractions of micro-PROLOG's "meta-variable".

This does raise another question. What is a Prolog system to do if a goal is still a variable, or is some constant other than an atom? call/1 has to worry about this. Since a constant other than an atom can never have a definition, it obviously doesn't make sense, so assert can rewrite a clause, turning 3 into call(3) or it can simply reject such a clause as ill formed. So only call/1 has to worry about it. The answer is clearly that call should be much like other predicates: an unbound variable is an instantiation fault and a non-atom constant is a type failure. There is one point though: if it is at all possible the error message should identify the caller of call/1, as that is where the error "really" is. So we have the following cases for call(G):
var(G) -> instantiation_fault;

atomic(G), \+ atom(G) -> type_fail(1,compound);

system(G) -> call the system predicate;

current_predicate(G) -> interpret the clauses for G if any, or call the compiled code if any;

otherwise, G is an undefined predicate.

This needn't stop an implementor using an integer (perhaps the index of a branch of a case statement, or the address of a procedure) as the body of a clause in bootstrap code. It only means that such clauses must never be visible to the user, and it must not be possible for the user to create such clauses, and if the user does call(72) it must type fail instead of invoking primitive 72 (even if there is such a primitive). If the bootstrap is such that there is a need for a predicate like call/1 but which is capable of invoking primitives this way, fine, but it must have another name.

apply(Pred, ArgList) :-
	Pred =.. [F|GivenArgs],
	append(GivenArgs, ArgList, AllArgs),
	Goal =.. [F|AllArgs],
	call(Goal).

This is the predicate which has been used up till now for applying a computed predicate to a computed argument list. It has a number of serious disadvantages. First, it is usually implemented exactly as written above. This is wasteful. It creates a new list which is never needed again, and a new compound term which is never needed again. If it is built in at a low enough level there is no need for these data structures. Even if apply/2 takes care not to construct superfluous data, the calling site will still construct an argument list which isn't really needed. The second problem is that even when we know what Pred is like, it is difficult (in the general case, impossible) to type check a call to apply. Every time I have seen apply used, there has been a specific number of arguments at the calling site, commonly one or two. Therefore, I propose a family of predicates.

call(p(X1,...,Xm), Y1,...,Yn) :-
	p(X1,...,Xm,Y1,...,Yn).

That is, call(P, X, Y) is equivalent to apply(P, [X,Y]) except that no extra data structures have to be created and it can easily be type checked. The requirement for P is the same as for call(P): it must be an atom or a compound term. For complete consistency, call/N for N>1 should be able to handle control structures too; since call((A,B)) works so should call(,(A), B). However, it may be considerably easier to implement this operation if it only works for sytem and current predicates, and does not work for arbitrary control structures. So call/N may be thought of as "apply", with call/1 being "eval". apply would have been the preferred name for this operation if only it hadn't already been used for the list version.

once

Is defined as if by the clause
	once(Body) :-
		call(Body),
		!.
Since cuts cannot get through "call", a cut which appears in the Goal is confined to the once(_) form and does not have the effect of cutting the parent goal.

Once need not be implemented exactly this way. It is quite straightforward for a Prolog compiler to open-code this form, but if it does the final cut and any internal cuts must be implemented in a way which cannot be expressed in Prolog source language. If the Body contains no cuts, the expansion (Body->true) can be used.

Negation

The form \+ Body is defined by macro-expansion into
	( Body -> fail ; true )
Since disjunction and if-then-else are transparent to cuts, so should negation be. That is, if we write
	p :- \+ !.
	p.
this should have exactly the same effect as
	p :- ( ! -> fail ; true ).
	p.
which is to say that every call to p must fail. If however, we write
	p :- X = !, \+ X.
	p.
this is equivalent to
	p :- X = !, ( call(X) -> fail ; true ).
	p.
and as the cut cannot escape from the call, every call to p must succeed (in the second clause).

This is all rather unpleasant. What is particularly unpleasant is that it makes Dec-10 Prolog non-standard. (On the other hand, it was non-standard already, because the compiler does not understand if-then-else). I have tried several times to discover a rule which would enable a user to predict which special forms were transparent to cuts and which were not. Making conjunction the only transparent form would make sense, but the language which results is not powerful enough to write an intelligible meta-circular interpreter. To do that disjunction has to be transparent to cuts as well. The cleanest rule I could come up with was that conjunction, disjunction, and if-then-else should be transparent to cuts (as they are in Dec-10 Prolog), and that any other form was transparent if and only if it could be macro expanded into a form involving these three only, without needing call/1, which is definitely opaque to cuts. A Prolog implementation which gets ',', ';', and '->' compatible with the Dec-10 and C Prolog interpreters can easily get '\+' right by expanding it in assert to precisely this form.

The requirement that \+ X behave exactly like its macro expansion is to ensure that program processing tools which only understand the basic three operations can process \+ X correctly by performing the macro expansion. In general this permits X to be compiled directly instead of constructing a data structure, and this can be no bad thing.

In fact I would be very surprised if there was any code written by Prolog users (as opposed to Prolog implementors testing their systems) which ever has a cut in a negated form. If there is such a cut, the desired behaviour can easily be obtained by defining an auxiliary predicate whose body is the form to be negated, and negating a call to that. Although this definition is different from previous Prologs, the practical impact is slight, and I think we can get away with it.

not

\+ is part of Dec-10 Prolog, 'not' is not. However, there is a library predicate of that name which checks that its argument form is ground. Prolog-X also adopts this convention. It seems like a good one. In order for this check, and the error report which may follow, to proceed, it is necessary that the form exist as a data structure. This therefore involves "call", so cuts cannot escape from "not" even though they can escape from \+.

A coroutining Prolog should delay not(X) until X is ground.

In fact a smart compiler can do a little better than that. Suppose X is a disjunction of conjunctions of literals. (We can convert to DNF.) If each disjunct is ok, the call can proceed, where a conjunct is ok if the only unbound variables in it are referenced nowhere outside the conjunction. What this means for a compiler is that in
disjoint(S1, S2) :-
	not (member(X, S1), memberchk(X, S2)).
it is safe to proceed with the call provided S1 and S2 are ground. But in
non_member(X, S) :-
	not(member(X, S)).
it is not safe to proceed with the call until X is ground as well as S, because X can be seen outside. So a compiler should handle sound negation by compiling code to check that the variables which appear elsewhere in the clause are ground. If you write not(X).

A sound if-then-else can be defined as well. We might as well copy MU-Prolog on that one.

forall

Is defined by macro-expansion. forall(Gen, Test) is equivalent to:
	\+ (Gen, \+ Test)
which is in turn equivalent to
	( Gen, ( Test -> fail ; true) -> fail ; true )
Logically, this is bounded quantification. For example,
	subset(Small, Big) :-
		forall(member(X, Small), memberchk(X, Big)).
Procedurally it is just another form of failure driven loop, permitting iteration without recursion. It is only useful when the only information to leave the loop is the success or failure indication.

Used sparingly, particularly when bounded quantification is intended, this can make programs clearer. But in Dec-10 Prolog it is avoided because the compiler generates a call to the interpreter, and remembering to put in all the public declarations can be a pain. One would be tempted to use the expanded form, if the Dec-10 compiler understood it.

Calls to forall can be open coded quite effectively. However, as it has this macro definition, we see that a cut inside either the generator form or the test form can escape and cut the parent goal. Just like negation, this differs from existing Prologs. Just like negation, I don't think that matters.

Conjunction and Disjunction

Apart from the fact that disjunction is transparent to cuts, these are generally understood. However, there is a problem. The punctuation marks use to represent these operations are not just a matter of syntax, some functor must be used to represent them in clauses. There'd be no problem about that, as the Clocksin & Mellish book says quite clearly what they are.

However, a lot of people would like to see '&' used for conjunction and '|' used for disjunction. Now '|' is already used in Dec-10 and C Prolog as a synonym for ';', but this is not entirely satisfactory as (A|B) is the only instance where functor((A<op>B),<op>,2) fails.

There is no difficulty in defining what Dec-10 Prolog does. But in a system where the native syntax used '&' and '|' for conjunction and disjunction it would be difficult to justify ',' and ';' popping up in clauses. It seems to me that there are two reasonable courses to pursue. One of them is to say that the internal form must correspond to Dec-10 internal form, whatever syntax is normally available to users. That would work, but the "standard" might end up being ignored. The second alternative is to say that an implementation can use whatever functors it please (even to having say a special functor '$!'/2 with $!(A,B) meaning A,!,B), but that it must support some standard kit of predicates for manipulating formulas.

Let me make this absolutely clear: I do not see any point at all in changing the internal representation from (A,B) for conjunction and (A;B) for disjunction, with (A->B) being used in if-then-else forms. I am putting forward the second alternative in this proposed standard to make it acceptable to a number of UK users whose infatuation with the beauty of '&' and '|' renders them incapable of seeing that to use those symbols is to lie to the user.

The suggestion, then, is that there should be some standard predicates for constructing formulas and taking them apart.

prolog_clause(Clause, Head, Body)

prolog_bounded_quantification(Formula, Generator, Test)

prolog_negation(Formula, PositivePart)

prolog_if_branch(Formula, Hypothesis, Conclusion)

prolog_conjunction(Formula, ListOfConjuncts)

prolog_disjunction(Formula, ListOfDisjuncts)
Versions of these predicates for Dec-10 Prolog are given in an appendix. The point of taking the conjuncts &c as lists is to emphasise Prolog's right to normalise these operators, indeed, to allow for the possibility that the internal forms may be some form of decorated lists.

prolog_clause works with any formula, as any atom or compound term can be used as a unit clause. (Indeed, one has to be very careful, as (a:-b) :- p is a perfectly good clause.) The other predicates, if their first argument is instantiated, will only succeed when the formula is the kind of thing they are looking for. This leads to the anomaly that prolog_conjunction(a,X) fails, but prolog_conjunction(Y,[a]) succeeds, binding Y=1a.

Given a Formula, these predicates take it apart and match the pieces against their other arguments. With a variable in the first argument position, they construct a formula from their second argument and instantiate the first argument to it. These predicates are not truly bidirectional. For a given Formula, if prolog_conjunction(Formula,L) succeeds it does not necessarily follow that prolog_conjunction(F,L) will instantiate F to Formula (though it will succeed, and will instantiate F to something with the same effect as F). And for a given List, prolog_conjunction(F,List) having succeeded (and it might fail, if a top level element of List is say an integer) does not mean that prolog_conjunction(F,L) will even succeed, let alone instantiate L to a copy of List. What is required is something rather weaker:
for given L0
if	prolog_conjunction(F1, L0) succeeds
then	prolog_conjunction(F1, L1) succeeds
and	prolog_conjunction(F2, L1) succeeds
and	F2 == F1.

for given F0 if prolog_conjunction(F0, L1) succeeds and prolog_conjunction(F1, L1) succeeds then prolog_conjunction(F1, L2) succeeds and L2 == L1.
with similar axioms for prolog_disjunction and prolog_if_then_else. Any formula has a unique decomposition, if any, it is just that the formula might have been initially constructed by some other means, and might not actually make sense. For example, we might find that
	prolog_conjunction((1,2), [1,2])	succeeds
but	prolog_conjunction(X, [1,2])		fails.

This section of the draft is the most tentative. All things considered, it might be much cleaner to have separate compose_THING and decompose_THING operations.

If-then-else and "cases"

Let us suppose that conjunction and disjunction are well understood. We can explain Ken Kahn's "cases" construct by introducing updatable variables into Prolog (for exposition only!) and translating
	cases
	    Hypothesis1 -> Conclusion1
	;   Hypothesis2 -> Conclusion2
	...
	;   Hypothesisn -> Conclusionn
as follows:
	DoneIt is local,
	Doneit := 0,
	(   Hypothesis1, DoneIt := 1, Conclusion1
	;   DoneIt =:= 0, Hypothesis2, DoneIt := 1, Conclusion2
	...
	;   DoneIt =:= 0, Hypothesisn, Conclusionn
	)
Every case except the first has the test "DoneIt=:=0" before its hypothesis (it is redundant before the first case). Every case except the last has the assignment "DoneIt:=1" after its hypothesis (the variable is dead after the last case). If a case lacks the if-then-else arrow, the translation retains the test but drops the assignment.

This corresponds almost exactly to the desired logical reading of an if-then-else:
	(   H1 and C1
	or  not H1 and H2 and C2
	...
	or  not H1 and not H2 and... and not Hn-1 and Hn and Cn
	)
where the negations are not actually calculated but are implicit from position. In particular, the hypotheses may backtrack; having passed an arrow once it thereafter acts just like a comma. Since tests are generally determinate, this makes little or no practical difference.

The "cases" construct is implemented in C Prolog is exactly this fashion. Given a reasonable Prolog compiler which can handle disjunction, it is very easy to add the cases construct, whether using this technique or whether by fiddling with the choice point list directly. Note that in so far as it can be expressed in Prolog at all, the cases construct is open coded, so that it is "natural" for it to be transparent to cuts. Cases and if-then-else are to a large degree a substitute for the cut, and one might expect that a programmer who used them would not be putting cuts in them. So it might seem a matter of indifference what a cut inside a cases or if-then-else does. However, there is at least one Prolog programmer here whose code includes
	p(...) :-
		...
		( ... -> ! ; true ),
		... .
I laughed immoderately when I saw it, then groaned when I realised that he really meant to keep this in his program and thought it genuinely useful. He does at least deserve to be told when this will work and when it will not, while the rest of us continue to look for efficient alternatives to the cut that make our code even clearer, the Philosopher's Stone, the Elixir of Life, and a 1 GigaLIPS machine.

The conventional if-then-else form is exactly the same except that it doesn't have "cases" in front.
	(   H1 -> C1
	    H2 -> C2
	...
	;   Else
	)
is almost the same as
	(cases
	    once(H1) -> C1
	;   once(H2) -> C2
	...
	;   true -> Else
	)
except that cuts can escape from the Hi (should anyone be so incredibly foolish as to put them there) as well as the Ci. The if-then-else arrow functions exactly as a cut with narrow scope. It is correspondingly easy to implement.

My feeling is that for most practical programs it doesn't matter whether we use the if-then-else form or the cases form, and that beyond that it is too early to say which is better. There is a fair bit of existing Dec-10 and C Prolog code which relies on the current if-then-else and its interaction with cut. So the if-then-else has to be part of the standard. Someone might yet want to port a program from LM-Prolog to Standard Prolog, and anyone might want to experiment with it. To ensure that everyone who implements it does so compatibly, the "cases" form had better be part of the standard as well. On the other hand, the implied negations are only sound when they are ground, and in that case it ought not make any difference whether we write (A -> B; C) or (not(A) -> C; B). So perhaps we don't really need "cases" after all.

The syntax I've shown for "cases" is the C Prolog syntax. I don't particularly care for it, and invite suggestions. That aside, both should be in the standard, and both should be transparent to cuts.

setof and bagof

These should be as described in David Warren's paper and as coded in the Prolog library file SETOF.PL. But there is one point which needs consideration. There is nothing in the specification of these predicates which says that they should do anything to the data base. The definitions in SETOF.PL make no permanent change to the data base unless they are aborted (by 'abort' or ^C and "a"), and are set up so that if they are aborted the junk thus left behind will not interfere with future calls to these predicates. However, it is there. This has two effects. First, if you interrupt CHAT-80 often enough it can fill up the data base with junk that the system has no way of detecting should not be there. Second, while the setof or bagof is proceeding, these working data are visible in the data base to an inquisitive program and vulnerable to a malicious one.

The visibility and vulnerability can be taken care of by making the key used an atom which is not accessible to the user's program. For example, the SRI versions of C Prolog have a way of hiding an atom by removing it from the symbol table, so using keys composed from '$' and then hiding '$' would do the trick.

The problem of filling up the data base with junk if a computation is interrupted is perhaps not so important. User code is faced with that risk no matter what we do. However, it would be no bad thing if the 'abort' predicate cleaned out whatever part of the data base was used by these predicates.

Best of all would be if these predicates didn't use the data base at all. The code in SETOF.PL uses the recorded data base rather than the clause data base both to keep this junk out of harm's way, and also to avoid compilation. A system which compiles clauses when they are asserted makes setof, bagof, and findall unbelievably slow. These clauses or records are also special in that we know when we retract them that there is only the one reference to them (or at least we would like that to be true). It might even be advisable to have special code for copying these notes into the heap and pulling them out again without any compilation or indexing.

The second argument of bagof/3, findall/3, findall/4, or setof/3 is given to call/1, and so cuts cannot escape from it. This need not prevent these predicates being compiled open. None of the control structures mentioned in this section has to appear in traces, any of them except conjunction may.

current_{atom,functor,predicate}

These predicates are really very strange. current_predicate and system are useful for writing Prolog interpreters and debuggers.

current_atom(Atom)
    if var(Atom) {
	try binding Atom to each "currently known" atom in turn;
	the order is not specified
    } else {
	if atom(Atom) then succeed else type_fail(1, atom).
    }
It is obvious that this is the analgoue of InterLisp's MAPATOMS. What is not obvious is which atoms are "currently known". Can the question
	?- current_atom(X), X = fred.
ever fail? If we list all the atoms at one time, and list all the atoms at a later time, must the first lot be a subset of the second? In Dec-10 Prolog, C Prolog, and Prolog-X the answers are no and yes. In these Prologs atoms are never forgotten.

Bearing in mind that future Prolog systems will be able to interface to external data bases, and so may have a much larger turnover of atoms, all that we can reasonably require is that
Every atom appearing in any clause or other internal data base record (possibly as the key) and every atom which currently appears in the stacks is "currently known", but if all references to and from an atom are lost it may become unknown.

current_atom is easily implemented by searching the symbol table rather than the data base and stacks. It is up to the garbage collector to prune the symbol table if it so wishes. A warning to implementors: be very careful with this predicate if you prune the symbol table. It is a part of this predicate's definition that no atom is enumerated twice in the same call. If new atoms are being created while an enumeration is going on, it is permissible for current_atom to miss them. It is only required to enumerate those atoms which were current when it started and which are still current when it finishes, though it may enumerate more.

current_functor(Atom, Term)
    if nonvar(Atom) and \+atom(Atom), type_fail(1, atom).
    if constant(Term) and \+atom(Term), type_fail(2, compound).
    if nonvar(Term) {
	unify Atom with the principal function symbol of Term
    } else {
	current_atom(Atom)
	try functor(Term, Atom, N) for each N such that Atom/N
	is a "currently known" functor; the order is not defined
    }

Here we have an even nastier problem. In Dec-10 Prolog, C Prolog, Prolog-X, and NIP, each current functor is represented by a physical block of memory in the symbol table. In Poplog and Quintus Prolog, however, the functor is implicit. This looks like bad news; we might have to examine the stacks and data base to find out what functors are "current". The answer is to redefine what "current" means.

A functor is current if it appears in a clause or internal data base record (possibly as the key).

This is a minimal requirement; other functors may be "current" as well. One unpleasant aspect of the Dec-10 definition is that if you have a "vector" class which you represent say as vector(E1,...,En), a functor block is left hanging around forever for each size of vector you have ever used. Such behaviour is permitted by this standard, but not required. vector/N must show up if there is such a term somewhere in the data base, but vector-like structures are commonly held only in the stacks, and this definition permits current_functor to ignore the stacks.

An easy way to implement current_functor is to associate a set of integers with each atom in the symbol table. Assert{|a|z} and Record{|a|z} have to update these sets to include all the functors which appear in the data base (and of course they may appear there differently from the way they appear in the stacks), and the garbage collector may but need not prune the sets. The integer 0 need not appear explicitly in these sets, as current_atom(X)=>current_functor(X,X). The mapping might be represented as a hash table in any of several ways. In particular, current_functor is not required to enumerate all the current functors for a given atom consecutively.

current_predicate(Atom, Goal) :-
	current_functor(Atom, Goal),
	"there have been clauses for Goal".

current_predicate(Goal) :- current_predicate(_, Goal).

The Dec-10 Prolog manual is actually very clear on this point. It says "it only generates functors corresponding to predicates for which there currently exists an interpreted procedure". In David Warren's papers, "procedure" means "clause". So what this means is that we could define the Dec-10 version of current_predicate as
current_predicate(Atom, Goal) :-
	current_functor(Atom, Goal),
	\+ \+ clause(Goal, _).

Now if we want that definition, we can easily program it. I have argued earlier in this document that a more useful definition would be that a predicate is "current" if and only if there has been at least one 'assert' for that predicate since its last 'abolish'. One way to accomplish this is to have a mapping from atoms and arities to "procedure headers", where this mapping is void for undefined or abolished predicates, but "retract" leaves the header intact even when it removes the last clause.

system(Atom, Goal) :-
	current_functor(Atom, Goal),
	"Goal has compiled code and compiled code only".

system(Goal) :- system(_, Goal).

The distinction between system/[1-2] and current_predicate/[1-2] is that a "system" predicate has only a compiled definition. There is nothing to stop a system maintaining clauses and compiled code both. However, if there are clauses for a predicate it must be a current_predicate and not a system predicate.

'Assert' and 'retract' must report an error when applied to a system predicate. It is less obvious that 'clause' should do so. However, clause(Head,Body) obviously cannot succeed when system(Head) is true, as by definition there are no clauses. But it should not fail either, as a program might take that as an indication that the predicate in question is undefined, which is not so. This is not restrictive, as the programmer can always define
safe_clause(Head, Body) :-
	current_predicate(Head),
	clause(Head, Body).
which has the additional benefit of being able to search the whole data base. (clause will report an instantiation fault if Head is unbound.) The fact that 'assert' in Dec-10 Prolog does not report an error but quietly ignores additions to system predicates is one of Dec-10 Prolog's less helpful quirks.

system(X) or current_predicate(X) must be true for each of the predicates defined in this document, but the implementor is free to choose which. However, there must be a set of predicates which are system predicates and in terms of which the others are defined so that a Prolog in Prolog interpreter can bottom out somewhere.

Evaluable Predicates Not Described

The following predicates are part of Dec-10 prolog but are not covered in this document, or not adequately covered.

abolish/2, assert/1, assert/2, asserta/1, asserta/2, assertz/1, assertz/2, clause/2, clause/3, compile/1, consult/1, erase/1, instance/2, listing/0, listing/1, reconsult/1, recorda/3, recorded/3, recordz/3, retract/1.

These predicates are concerned with the data base. There are a number of thorny problems here. Should the recorded data base be considered as part of the standard? It is there in Dec-10 Prolog, C Prolog, Prolog X, Quintus Prolog, and even in PopLog. On the other hand, I have a design for a better indexed data base. And on the third hand {Footnote: A kind of vyce familiar to carpenters} this should really be replaced by a general indexed data base mechanism. But there are largish programs that use it. Should data base references be described? They are constants in C Prolog and Prolog-X. But they are undocumented compound terms in Dec-10 Prolog. What about the interaction of retract with running code? It may be that Quintus can come up with a reinterpretation of this nuisance that permits immediate reclamation of retracted clauses, I know they're trying to.

abort, break, debug, debugging, halt, leash/1, nodebug, nospy/1, spy/1, trace/0, unknown/2

These predicates are concerned with debugging.

ancestors/1, compile/1, depth/1, gc/0, gcguide/3, incore/1, log/0, maxdepth/1, nogc/0, nolog/0, plsys/1, reinitialise/0, restore/1, revive/2, save/1, save/2, statistics/0, statistics/2, subgoal_of/1, trimcore/0, version/0, version/1, 'LC'/0, 'NOLC'/0

These are obsolete or implementation dependent. 'NOLC' mode is most definitely obsolete. trimcore/0 may be used to cause some sort of garbage collection, which might be confined to the Prolog stacks. gc/0 merely sets a flag. A general "flag assignment" mechanism should be used in future instead of multitudes of little 0 or 2 argument predicates for individual flags.

close/1, current_op/3, display/1, file_exists/1 (exists/1 in C Prolog), fileerrors/0, get/1, get0/1, nl/0, nofileerrors/0, op/3, portray/1, print/1, prompt/2, put/1, read/1, read/2, rename/2, see/1, seeing/1 (and seeing/2 in C Prolog), seen/0, skip/1, tab/1, tell/1, telling/1 (and telling/2 in C Prolog), told/0, ttyflush/0, ttyget/1, ttyget0/1, ttynl/0, ttyput/1, ttyskip/1, ttytab/1, write/1, writeq/1

These predicates are concerned with input/output. read/1 and read/2 are in the Prolog library, so don't need to be defined here. display/1, print/1, write/1, and writeq/1 will be placed in the Prolog library, so again don't need defining here. The other predicates should certainly be properly defined, but there are aspects of the current file system which are, um, regrettable, and the standard should improve on current practice. In particular, I would like to see input redirected by
	with_input_from_file(FileName, Body)
	with_input_from_chars(ListOfChars, Body)
	with_input_from_string(String, Body)
	with_input_from_call(Pred, Body)
and output redirected by
	with_output_to_file(FileName, Body)
	with_output_to_chars(ListOfChars, Body)
	with_output_to_string(String, Body)
	with_output_to_call(Pred, Body)
and of course, for interacting with the user,
	with_io_to_terminal(Body)
But none of this redesign is anywhere near being well enough thought out to include in a standard, and it does require new machinery in an implementation. (For example, it would be extremely difficult to support with*call/2 in C Prolog.)

'C'/3, expand_term/2, phrase/2.

These are concerned with grammar rules. Grammar rule preprocessors for DCGs, DCSGs, and MSGs are in the Prolog library, and a preprocessor for XGs has been published.

numbervars/3

This is in the Prolog library (file METUTL.PL or /usr/lib/prolog/metalog).

Dec-10 Prolog code for the 'plus' family

@Include(Util:Arith.Pl)

Dec-10 Prolog code for 'univ'

Term =.. [FunctionSymbol|Args] :-
	nonvar(Term),
	!,
	functor(Term, FunctionSymbol, Arity),
	$argument_list(0, Arity, Term, Args).
Term =.. [Term] :-
	atomic(Term),
	!.
Term =.. [FunctionSymbol|Args] :-
	atom(FunctionSymbol),
	length(Args, Arity),
	functor(Term, FunctionSymbol, Arity),
	$argument_list(0, Arity, Term, Args).

$argument_list(N, N, _, []) :- !. $argument_list(I, N, Term, [Arg|Args]) :- J is I+1, arg(J, Term, Arg), $argument_list(J, N, Term, Args).

Dec-10 Prolog code for repeat/1

repeat(N) :-
	telling(Old), tell(user),
	write('It is pointlessly stupid to use repeat/1.'), nl,
	write('Why don''t you use between/3 instead, fathead?'), nl,
	tell(Old),
	between(1, N, _).

Code for the sorting routines

@Include(Util:Sorts.Pl)

Code for prolog_conjunction and friends

@Include(Util:DeCons.Pl)