An Elementary Prolog Library

DRAFT 3

Section 0. Introduction

The ISO Prolog standard does not adequately support the needs of Prolog users. This may seem unbelievable, and it is certainly inexcusable, but the standard does not even define append/3, length/2, or member/2 (although an example on page 33 shows clauses for append/3 and an example on page 85 shows clauses for member/2).

Many areas are crying out for standardisation, including

Basic data structures
POSIX.1 interface
SGML/XML input/output
Interfacing with relational data bases
Graphic User Interfaces
Network protocols
You name it

This is an attempt to deal with basic data structures. Perhaps more important than the specific data structures and operations dealt with is the stress placed on consistent naming and argument order, and the use of projection as a means of ensuring that argument order is consistent.

0.1. Notation.

I provide three kinds of declaration for the "base" predicates:

The first specifies the intended types of the arguments. The notation comes from the DEC-10 Prolog type checker, by way of Mercury. The effect of the predicate is not specified outside those types.
The second is copied from NU-Prolog. I still find NU-Prolog when-declarations the most intuitive way to specify when a predicate should make progress. (If a goal is suspended, when should it be resumed? When the when declaration is satisfied.) Even in a non-coroutining Prolog (like ISO Prolog) the information is still useful: it describes the modes in which it makes sense to call the predicate.
The third is loosely adapted from Mercury mode declarations. The "mode" det would mean that such a query has a single solution and cannot fail; semidet means that there is at most one solution. I intend something stronger here: when a call has the described form, the goal should either fail in a finite amount of time or should terminate in a finite amount of time with the Prolog system "aware" that no choices remain to be explored. The bounded "mode", not found in Mercury, means that such a goal has a finite number of solutions (possibly zero) and that such a goal must be able to report all solutions in a finite amount of time and then fail in a finite amount of time without infinite backtracking. The argument instantiation test list_skel(Var) is satisfied when Var is a proper list; the elements of the list are not constrained.

The sample code uses a convention taken from DEC-10 Prolog via C Prolog: the dollar sign is treated as a lower case letter, and predicates whose names contain a dollar sign are "implementation detail" not visible to the user. Such predicates are only for specification purposes, and need not conform to quality requirements for visible predicates such as steadfastness.

Perhaps the most important notational issue is that I give no indication in the running text of the module a type or predicate is defined in. It is clear that the four "basic" list processing predicates append/3, length/2, member/2, and memberchk/2 are so useful and so used that they had better be treated as part of the language. It would be quite appropriate if the others were all in a module named 'lists'. See plface.pl for one distribution.

I assume the following predefined types:

:- type list(T)
   ---> [] | [T|list(T)].
:- type integer.
:- type float.
:- type number.
:- type atom.
:- type char = atom.       % such that atom_codes(It, [_])
:- type chars = list(char).
:- type code = integer.    % such that between(0, 16'10FFFF, It).
:- type codes = list(code).

Other types will be introduced where they are used.

1. Characters.

1.1. Face reality, adopt Unicode.

With Prolog being used in e-business and stuff like that, we have to deal with HTML 4 (defined in terms of Unicode) and XML (defined in terms of Unicode). If we can't portably deal with Unicode, we can't portably deal with the Web. We can no longer afford to put up with Prolog systems that don't handle Unicode cleanly. Quintus Prolog handled 16-bit character sets via JIS and EUC nearly two decades ago. It was obvious then that the byte-at-a-time way that QP did so was a makeshift with a use-by date in the near future (now the rather less near past).

The ISO Prolog standard was published in 1995. In 1983, when I started work on an informal Prolog standard, it was obvious that 8-bit character sets were dead and Prolog would have to support 16-bit character sets. By 1993, it was obvious that there was a single "wide" character set to use in future language standards: ISO 10646/Unicode. Unicode 1.0.0 came out in October 1991. The reconciliation between ISO 10646 and Unicode resulted in Unicode 1.0.1, which came out in June 1992. This left the ISO Prolog committee with 3 years to reflect on the fact that there was now one ISO character set with round-trip conversion between the main national character set standards and itself.

This was certainly obvious to others. For example, the X/Open technical study E401, "Universal Multiple-Octet Coded Character Set coexistence and Migration" had been published the previous year. Of even greater relevance: a great deal of work had gone into updating Ada for what became Ada 95. By 1993, I already had a pile of Ada 95 background documentation and proposals in my office about 50cm high, and Unicode was clearly going to be the character basis of Ada 95 then.

The main reason why certain people at Quintus were not as hostile to the introduction of the single-character-atom design botch as they should have been was concern for EBCDIC on IBM mainframes. However, I have studied the current z/Architecture Principles of Operation, and note that current IBM mainframes have hardware support for Unicode>.

With this background, we can see that the ISO Prolog committee made some questionable decisions:

The decision not to base Prolog on Unicode. The decision to base Ada 95 on Unicode has worked extremely well for Ada. It has of course worked surpassingly well for Java, and the ISO C++ and ISO C99 standards have followed suit. This decision means that there is no portable means whatsoever of using a variable name such as Æneas or of including my sister-in-law's name "Chéri Ford in a string literal, in marked contrast to Ada, C99, and C++.
The decision to violate every standard of scholarship and common decency and change the semantics of atom_chars/2 and number_chars/2, in the interests of solving a problem which the adoption of Unicode as the character basis of Prolog would have completely exorcised without the need for such wholesale breakage of existing working code.
The decision to leave the programmer with the whole burden of classifying characters, in some unspecified character set.
The decision to leave the programmer with the whole burden of converting between internal (binary) and external (decimal) forms of numbers, using other digits than the ASCII ones.
The decision to impose on implementors a novel predicate which was, to put it politely, both ineffective and dangerous.

1.2. `char_conversion` must go!

"This is not negotiable", for the reason that no sane programmer who is acquainted with the facts would wish to retain char_conversion.

That's strong language. Let's summarise what char_conversion/2 and current_char_conversion/2 do.

There is a function Conv_C which maps single characters in the implementation-defined character set to single characters in the same character set. This function starts out as the identity function; it is not, in general, expected to be a bijection.
char_conversion>('X', 'Y') replaces Conv_C with µ(Conv_C,x,y), where x is the character code of X and y is the character code of Y.
current_char_conversion(A, B) is true when A and B are single-character atoms and b=Conv_C(a) where a and b are the character codes of the first characters of the names of A and B respectively, provided (and this is the most confusing bit) a and b are different.
When a term is read, every unquoted character is effectively replaced by its image under Conv_C.

Concerning the excessive strength of char_conversion/2, consider the goal char_conversion('.', ','). If you give that as a query to the interactive top level, you haven't shot yourself in the foot, but in the head. There is no way to recover from this.

In fact it's worse than that. Section 5.5.3 reads "A processor may support some other initial value of Conv_C ...", which means that there is no portable Prolog syntax at all. This implementation permission should have required all Prolog processors to start with Conv_C mapping the characters used in the Core Prolog syntax to themselves. Indeed, it might have been advisable to decree that char_conversion should not be allowed to change the identity mappings for the core syntax characters.

The :- char_conversion(From, To) directive is unusable in portable programs, even programs intended to be used within a single locale using the same coded character set. The reason is that it is explicitly not specified whether char_conversion directives in a source file are limited in effect (the way it was desired by many that op directives should be) or unlimited in effect (the way op directives were in DEC-10 Prolog and C Prolog). More precisely, a program using this feature can be portable only if every source file spells out the same set of (typically 190) mappings: the fact that it is not specified whether the effect is limited or not means that a program cannot simply consult a single mapping file into each source file.

Concerning the weakness of char_conversion/2, we have to start by looking at the rationale:

"The rationale for providing this facility is because (sic.) some extended character sets (for example, Japanese JIS character sets) are used with the basic character set and contain set (sic.) characters equivalent to those in the basic character set with different encoding. In such cases, users will often wish the meaning of characters in Prolog data and Prolog text to be the same regardless of the encoding."

I take this to refer to the "full width" (zenkaku) and "half width" (hankaku) copies of ASCII present in some Japanese character sets (and therefore also present in Unicode).

The char_conversion/2 machinery does not satisfy the explicit requirement in the rationale that it should apply to Prolog data. For example, it does not apply to atom_chars/2, and there is no analogous built-in predicate which does character conversion. "Words" read from a file are only converted to a normal form by Conv_C mapping if they are read in Prolog syntax using one of the read family. There should surely have been

converted_atom_chars(Atom, Chars) :-
    (   atom(Atom) ->
        atom_chars(Atom, Name),
        $convert_chars(Name, Chars)
    ;   var(Atom) ->
	$convert_chars(Chars, Name),
	atom_chars(Atom, Name)
    ;   $error
    ).

$convert_chars([], []).
$convert_chars([Char|Chars], [Conv|Convs]) :-
    (   current_char_conversion(Char, Conv) -> true	
    ;   Conv = Char
    ),
    $convert_chars(Chars, Convs).

An even worse way in which this misfeature fails to satisfy its explicit rationale is that character conversion does not apply to number_chars/2, and there is no analogous built-in predicate which does character conversion. Numbers read from a file are only converted to a normal form by Conv_C mapping if they are read in Prolog syntax using one of the read family.

The failure to deal with run-time number conversion is a particular nuisance for people using Arabic or Indic scripts. As a preliminary sketch, it would be useful if something like the following predicates were adopted:

:- pred integer_codes(integer, codes, integer, code).

integer_codes(Integer, Codes, Base, Zero) :-
    Delta is Zero - 0'0,
    (   integer(Integer) ->
        integer_codes(Integer, ASCII, Base),
        $map(ASCII, Codes, Delta, 0'., 0'E)
    ;   var(Integer) ->
	$map(ASCII, Codes, Delta, 0'., 0'E),
        integer_codes(Integer, ASCII, Base)
    ;   $error
    ).

integer_codes(Integer, Codes, Base) :-
    between(Base, 2, 36),
    like number_codes/2 but only for integers
    and using the first Base digits from "0..9A..Z".

integer_codes(Integer, Codes) :-
    integer_codes(Integer, Codes, 10).

:- type float_format
   ---> e(integer) | e | f(integer) | f | g(integer) | g
      | k(integer) | k | p.
:- pred float_codes(float, codes, float_format, code, code, code).

float_codes(Float, Codes, Format, Zero, Decimal, Exponent) :-
    Delta is Zero - 0'0,
    (   float(Float) ->
	float_codes(Float, ASCII, Format),
	$map(ASCII, Codes, Delta, Decimal, Exponent)
    ;   var(Float) ->
	$map(ASCII, Codes, Delta, Decimal, Exponent),
	float_codes(Float, ASCII, Format)
    ;   $error

float_codes(Float, Codes, Format, Zero, Decimal) :-
    float_codes(Float, Codes, Format, Zero, Decimal, 0'e).

float_codes(Float, Codes, Format, Zero) :-
    float_codes(Float, Codes, Format, Zero, 0'., 0'e).

float_codes(Float, Codes, Format) :-
    like number_codes/2 but only for floats,
    (   Format = e(P), integer(P), P >= 0 ->
        generate or recognise %.Pe format
    ;   Format = e ->
        generate: use default P, recognise: accept any %e
    ;   Format = f(P), integer(P), P >= 0 ->
        generate or recognise %.Pf format
    ;   Format = f ->
        generate: use default P, recognise: accept any %f
    ;   Format = g(P), integer(P), P >= 0 ->
        generate or recognise %.Pg format
    ;   Format = g ->
        generate: use default P, recognise: accept any float
    ;   Format = k(P), integer(P, P >= 0 ->
        like e(P), but with 1..3 digits before the decimal
        point so that the exponent is a multiple of 3; a very
        useful format when you have it
    ;   Format = k ->
        generate: use default P, recognise: accept any P
        but still allow only results of k(_) formatting
    ;   Format = p -> % precise
        generate: same as e(P) with P large enough that full
        precision is generated, enough digits for round trip
        conversion to be one-to-one
    ;   $error
    ).

$map([0'+|Xs], [0'+|Ys], Z, D, E) :- !, $map(Xs, Ys, Z, D, E).
$map([0'-|Xs], [0'-|Ys], Z, D, E) :- !, $map(Xs, Ys, Z, D, E).
$map([0'.|Xs], [ D |Ys], Z, D, E) :- !, $map(Xs, Ys, Z, D, E).
$map([0'e|Xs], [ E |Ys], Z, D, E) :- !, $map(Xs, Ys, Z, D, E).
$map([0'E|Xs], [ E |Ys], Z, D, E) :- !, $map(Xs, Ys, Z, D, E).
$map([ N |Xs], [ M |Ys], Z, D, E) :- !,
    plus(Z, N, M),
    between(0'0, 0'9, N),
    $map(Xs, Ys, Z, D, E).

With such predicates, we then define

:- pred number_codes(number, codes, code).

number_codes(Number, Codes, Zero) :-
    (   integer(Number) -> integer_codes(Number, Codes, Zero)
    ;   float(Number) -> float_codes(Number, Codes, p, Zero)
    ;   var(Number) >
	(   integer_codes(Number, Codes, Zero) -> true
	;   float_codes(Number, Codes, p, Zero)
	)
    ;   $error
    ).

number_codes(Number, Codes) :-
    number_codes(Number, Codes, 0'0).

At last, 10 years late, this would give people using Arabic or Indic scripts (amongst others) the ability to convert numbers using their own characters, and it would, 10 years late, give Europeans who like a rather ambiguous use of commas to convert floating-point numbers using their preferred decimal point character.

Is there any other way char_conversion/2 fails? Yes, there is. I shall take personal examples. The only languages that really matter to me personally (other than mattering because I think other people should be able to use their own language and script) are English, Latin, Greek, Hebrew, Serbo-Croatian, and Maori.

Writing in English, using ISO Latin 1, I would like to be able to use my father's name, Æneas. The letter ash is not one of the normal Prolog characters, so I would have to map it to two characters: A, e. But I can't do that. I cannot have the atom hæme mapped to haeme either, for the same reason. Since Latin1 includes the characters "¼", "½", and "¾", which are part of the number syntax of my native locale, I would like to use them in numbers, so that ¼=0.25, 2½=2.5, and 10¾=10.75. This could be done if I could map "¼" to ".25" and so on. But with char_conversion/2 so limited, I can't. If char_conversion/2 can't even handle ISO Latin 1, what good is it?

I suppose I should have said "Croatian" rather than "Serbo-Croatian", but there was a generation of New Zealanders who knew that their grandparents were from Yugoslavia who had to look the natal village up on a map to figure out which side they would have been on when Yugoslavia broke up. As it turned out, my relatives were all on the side labelled "refugee", and we have lost touch; the natal village is a heap of rubble. The NZ branch of the family figured out which to call themselves by looking at letters my grandfather's brother had written: they used the Latin script, not the Cyrillic script. Except that Croatian doesn't quite use the Latin script. There are four letters (DZ, DZ with caron, LJ, and NJ) which look like two English letters but are regarded as single letters in the script. It would probably be a good idea to map DZ to d,z; Dz to D,z; and dz to d,z. But I can't do that with char_conversion/2.

New Zealand has two official languages, English and Maori. For historical reasons, there are three spelling conventions for Maori:

Spell long and short vowels the same. This is the convention the missionaries used. It is the oldest convention, and is used in the word "Maori" itself.
Spell long vowels by writing a long vowel as two copies of the corresponding short vowel. This is a Maaori invention.
Spell long vowels by writing a macron over the corresponding short vowel. This is the convention adopted by the United Tribes, on the grounds that it's the least change to the oldest convention and is therefore most respectful to the ancestors.
Spell long vowels by writing an umlaut over the corresponding short vowel. This is not an official convention for Mäori, but it's what you are most likely to see if you look at web pages containing Mäori text. The reason is that instead of adopting a character set containing macronised vowels, people said "let's use our Windows/MacRoman/Latin1 characters but with different fonts so the umlauts look like macrons."

Accordingly, in processing Maaori text in which vowel length is marked, the best scheme would be to map U+0100 (capital A with macron) and U+00C4 (capital A with daieresis) to A,A, and so on for the other 9 upper and lower case vowels. But I can't do that with char_conversion.

Do I need to mention the Unicode compatibility mappings which map one character to several? Maybe I had better. Take the character U+00A0 "DIAERESIS", for example. Its compatibility mapping is "<compat> 0020 0308". A quick tally of the number of replacement characters in the Unicode 3.1.1 database yielded

Replacement Number of
length occurrences
1 3005
2 1585
3 360
4 62
5 15
6 2
8 1
18 1

Replacement	Number of
length	occurrences
1	3005
2	1585
3	360
4	62
5	15
6	2
8	1
18	1

This is of course a lower bound; the replacement rules in that file have to be applied recursively, so the average replacement length is somewhat longer.

In conclusion, the entire char_conversion/2 framework is too narrowly conceived, too little applied, and too dangerous in effect.

In order to deal effectively with Unicode (and it was the plain responsibility of the ISO Prolog committee to address this), some other means entirely will have to be found.

1.3 Normalisation

Unicode Standard Annex #15 defines four normalisation forms. Canonical Equivalence in Applications defines two more. We also need a "verbatim" normalisation form which acts as the identity function.

Name Code Source What it is
v Here No change
NFD d TR15 Canonical decomposition
NFC c TR15 Canonical decomposition then canonical composition
FCD fcd TN5 "Fast C or D" decomposition
FCC fcd TN5 FCC form
NFKD kd TR15 Compatibility decomposition
NFKC kc TR15 Compatibility decomposition then canonical composition

Name	Code	Source	What it is
	v	Here	No change
NFD	d	TR15	Canonical decomposition
NFC	c	TR15	Canonical decomposition then canonical composition
FCD	fcd	TN5	"Fast C or D" decomposition
FCC	fcd	TN5	FCC form
NFKD	kd	TR15	Compatibility decomposition
NFKC	kc	TR15	Compatibility decomposition then canonical composition

Canonical decomposition and (re)composition apply, in principle, to any character set containing both precomposed accented letters and floating diacriticals. Unicode was by no means the first such character set.

FCC and FCD forms are intermediate between NFC and NFD; they make sense whenever NFC and NFD make sense.

Compatibility decomposition applies, in principle, to any character set containing multiple representations of the "same" character. In particular, the Japanese zenkaku and hankaku are just such characters. In fact, compatibility decomposition is pretty much exactly what char_conversion/2 was supposed to be all about, except that it is by no means true that all compatibility mappings in all such character sets have single-character replacements.

Basically, to really do the job that code_conversion/2 was supposed to do but doesn't, we need Normalisation Form KD or even better, Normalisation Form KC.

We need two modules, one for each representation of characters. (Unicode-hacking life would be so much easier if we only had to deal with integers in the range 0..16'10FFFF. For example, while '\xD800\xDC00' is, according to Unicode rules, the encoding of one character, it is not a 'char' according to ISO Prolog rules.)

The following type is shared by codes and chars:

:- type normalisation_form
   ---> v | c | d | fcc | fcd | kc | kd.

It is a matter for debate which normalisation forms should be supported by all Prolog systems. It is a practical fact that some normalisation forms should be supported by most Prolog systems. Whatever set of normalisation forms is eventually agreed on, implementors ought to be allowed to extend the set.

:- module(chars, [...]).
:- pred normalise(normalisation_form, chars, chars, chars).
:- normalise(NF, Xs, Ys0, Ys) when NF, Xs.
:- normalise(NF, Xs, Ys0, Ys) is semidet
   when ground(NF), ground(Xs).

normalise(NF, Xs, Ys0, Ys) :-
    Ys0\Ys is Xs in normalisation form NF.

normalise(NF, Xs, Ys) :-
    normalise(NF, Xs, Ys, []).

normalise(Xs, Ys) :-
    normalise(kd, Xs, Ys).

normalised_compare(NF, R, Xs, Ys) :-
    normalise(NF, Xs, Xn),
    normalise(NF, Ys, Yn),
    compare(R, Xn, Yn).

normalised_compare(R, Xs, Ys) :-
    normalised_compare(kd, R, Xs, Ys).

normalised(NF, Xs) :-
    normalise(NF, Xs, Xs).

normalised(Xs) :-
    normalise(Xs, Xs).
...

Please remember that these are specifications, not implementations. It is possible to implement normalised_compare/[3,4] so that they work in a single pass over their arguments without building any intermediate list. Similarly, it is possible to implement normalised/[1,2] so that they work in a single pass over their argument without building any intermediate list.

:- module(codes, [...]).
:- pred normalise(normalisation_form, codes, codes, codes).
:- normalise(NF, Xs, Ys0, Ys) when NF, Xs.
:- normalise(NF, Xs, Ys0, Ys) is semidet
   when ground(NF), ground(Xs).

normalise(NF, Xs, Ys0, Ys) :-
    Ys0\Ys is Xs in normalisation form NF.

normalise(NF, Xs, Ys) :-
    normalise(NF, Xs, Ys, []).

normalise(Xs, Ys) :-
    normalise(kd, Xs, Ys).

normalised_compare(NF, R, Xs, Ys) :-
    normalise(NF, Xs, Xn),
    normalise(NF, Ys, Yn),
    compare(R, Xn, Yn).

normalised_compare(R, Xs, Ys) :-
    normalised_compare(kd, R, Xs, Ys).

normalised(NF, Xs) :-
    normalise(NF, Xs, Xs).

normalised(Xs) :-
    normalise(Xs, Xs).
...

There needs to be a stream property normalisation_form(NF) which can be used to ensure that characters delivered to a Prolog application are already normalised, and that characters generated by a Prolog application are appropriately normalised before being delivered to their destination.

1.4. Character classification.

In the good old days, it was sufficient to have a library(ctypes) module. With ISO's thrice-accursed single-character-atom representation for characters, we need two modules, as noted in the previous section. So we might as well put the character classification predicates in 'chars' and 'codes'. The classification predicates in each module will have the same name and basically the same meaning, but different argument types.

The ISO Prolog standard uses the following character groups:

layout characters (6.5.4), see is_space/1;
solo characters (6.5.3), see is_solo/1;
meta characters (6.5.5), see is_quote/2;
alphanumeric characters (6.5.2), see is_csym/1;
decimal digit characters (6.5.2), see is_digit/[1,2];
alpha characters (6.5.2), see is_csymf/1;
letter characters (6.5.2), see is_alpha/1;
small letter characters (6.5.2), see is_lower/1;
capital letter characters (6.5.2), see is_upper/1;
graphic characters (6.5.1), see is_symbol/1.

The term "graphic character" is very poorly chosen; one expects it either to have something to do with box drawing or to be strongly related to isgraph() in the C standards. Instead it refers to what was previously known as "symbol characters" or "operator characters".

That's a hint. We want to be able to do what C does, plus we want to be able to cover the categories used in Prolog itself.

Fortunately, there's a freely downloadable table classifying many tens of thousands of characters, actively maintained by someone else. Even more fortunately, the categories it uses can be mapped onto the categories we want quite easily. The table is part of the Unicode data base.

is_unicode/2 and is_digit/2

I propose a single base predicate, from which the others are easily derived.

:- type unicode_category
   ---> lu | ll | lt | lm | lo
      | mn | mc | me
      | nd | nl | no
      | pc | pd | ps | pe | pi | pf | po
      | sm | sc | sk | so
      | zs | zl | zp
      | cc | cf | cs | co | cn.

These codes are simply the Unicode category names converted to lower case for convenience in Prolog.

:- pred is_unicode(code, unicode_category).
:- is_unicode(C, K) when C ; K.
:- is_unicode(C, K) is semidet when ground(C).
:- is_unicode(C, K) is bounded when true.

is_unicode(C, K) :-
    between(0, 16'10FFFF, C),
    UnicodeData.txt says C has category K,
    K \== cn. % not assign

:- pred is_digit(code, integer).
:- is_digit(C, V) when C ; V.
:- is_digit(C, V) is semidet when ground(C).
:- is_digit(C, V) is bounded when true.

is_digit(C, K) :-
    is_unicode(C, nd),
    UnicodeData.txt gives V as the decimal value of C.

The Unicode standard and associated reports explain implementation techniques for this. Even with just one byte per character, a simple implementation would take 1.1 megabytes. Using multilevel tables can compress this dramatically, not least because only 3 of the 17 planes currently have any characters at all.

Derived predicates

is_unicode(C) :-
    is_unicode(C, _).

is_latin1(C, K) :-
    between(0, 16'FF, C),
    is_unicode(C, K).

is_latin1(C, K) :-
    is_latin1(C, _).

is_ascii(C, K) :-
    between(0, 16'7F, C),
    is_unicode(C, K).

is_ascii(C) :-
    is_ascii(C, _).

is_alnum(C) :-
    is_unicode(C, K),
    memberchk(K, [nd,lu,ll,lt,lm,lo]).

is_alpha(C) :-
    is_unicode(C, K),
    memberchk(K, [lu,ll,lt,lm,lo]).

is_blank(C) :- % known as is_white/1 in Quintus Prolog
    is_unicode(C, zs).

is_bracket(L, R) :-
    is_unicode(L, ps),
    is_unicode(R, pe),
    R is the closing bracket corresponding to L.

is_cntrl(C) :- % C0 and C1 controls + formats
    is_unicode(C, K),
    memberchk(C, [cc,cf]).

is_csym(C) :-
    is_unicode(C, K),
    memberchk(C, [pc,nd,lu,ll,lt,lm,lo]).

is_csymf(C) :-
    is_unicode(C, K),
    memberchk(C, [pc,lu,ll,lt,lm,lo]).

is_digit(C) :-
    is_unicode(C, nd).

is_endfile(-1).

is_endline(C) :-
    is_unicode(C, K),
    memberchk(K, [zl,zp]).

is_graph(C) :-
    is_unicode(C, K),
    nonmember(K, [zs,zl,zp,cc,cf]).

is_layout(C) :-

is_lower(C) :-
    is_unicode(C, ll).

is_print(C) :-
    is_unicode(C, K),
    nonmember(K, [zl,zp,cc,cf]).

is_punct(C) :-
    is_unicode(C, K),
    memberchk(K, [pc,pd,ps,pe,pi,pf,po,sm,sc,sk,so]).

is_title(C) :-
    is_unicode(C, lt).

is_quote(0'', 0'').
is_quote(0'", 0'").
is_quote(0'`, 0'`).
is_quote(L, R) :-
    is_unicode(L, pi),
    is_unicode(R, pf),
    R is the final quote corresponding to L.

is_space(C) :-
    is_unicode(C, K),
    memberchk(K, [zs,zl,zp]).

is_upper(C) :-
    is_unicode(C, lu).

is_xdigit(Base, C, V) :-
    between(2, 36, Base),
    Base1 is Base - 1,
    between(0, Base1, V),
    (   V >= 10 ->
        ( C is "A" - 10 + V ; C is "a" - 10 + V )
    ;   is_digit(C, V)
    ).

is_xdigit(C, V) :-
    is_xdigit(16, C, V).

is_xdigit(C) :-
    is_xdigit(16, C, _).

There are two debatable decisions here. In this draft I have decided that the "Other, Format" characters of Unicode should be grouped with the "Other, Control" characters for is_cntrl/1, on the grounds that these things fairly obviously are control characters and that if you just want the C0 and C1 controls it is easy enough to test for 'cc' yourself. Indeed, is_ascii(C, cc) will check for the ASCII (C0) controls. I have also decided that the "csymf" characters should include all the "Punctuation, Connector" characters, not just the LOW LINE "_".

i have read through rather more material about wide character classification in C than was enjoyable, and found stunningly little guidance about how non-ASCII characters should be classified, so I must emphasise that the definitions above are a draft.

The is_bracket/2 and is_quote/2 predicates do not correspond to anything in C, but I found them very useful in Quintus Prolog and equivalents very useful in a text editor. Unicode 3.1.1 adds only single quotes, double quotes, single guillemets, double guillemets, and single and double high reversed-9 quotes to the Pi category, so is_quote/2 isn't much of a burden. For the brackets we'd need a somewhat larger table, but there are plenty of nice looking brackets in Unicode it would be nice to recognise.

The "solo" characters in ISO Prolog are "!,;|%()[]{}". What is the appropriate generalisation for Unicode?

In ASCII, "()[]{}" is the full set of brackets. So Unicode solo characters should include all Unicode brackets. The other characters are all in the Po category, but there are many characters in the Po category which should be symbol characters, not solo characters. The simplest rule seems to be that a non-bracket character should count as a solo character if and only if it is in the Po category and is a script or compatibility variant of one of the existing non-bracket solo characters. In Unicode 3.1.1, that's just the following 24 characters:

U+0021 = EXCLAMATION MARK
U+0025 = PERCENT SIGN
U+002C = COMMA
U+003B = SEMICOLON
U+055C = ARMENIAN EXCLAMATION MARK
U+055D = ARMENIAN COMMA
U+060C = ARABIC COMMA
U+061B = ARABIC SEMICOLON
U+066A = ARABIC PERCENT SIGN
U+1363 = ETHIOPIC COMMA
U+1364 = ETHIOPIC SEMICOLON
U+1802 = MONGOLIAN COMMA
U+1808 = MONGOLIAN MANCHU COMMA
U+3001 = IDEOGRAPHIC COMMA
U+FE50 = SMALL COMMA
U+FE51 = SMALL IDEOGRAPHIC COMMA
U+FE54 = SMALL SEMICOLON
U+FE57 = SMALL EXCLAMATION MARK
U+FE6A = SMALL PERCENT SIGN
U+FF01 = FULLWIDTH EXCLAMATION MARK
U+FF05 = FULLWIDTH PERCENT SIGN
U+FF0C = FULLWIDTH COMMA
U+FF1B = FULLWIDTH SEMICOLON
U+FF64 = HALFWIDTH IDEOGRAPHIC COMMA

It is possible to recognise this set of characters by pattern matching (carefully!) on the names, but since I do not propose access to the names, this has to be expressed by means of a table.

is_solo(C) :-
    is_unicode(C, K),
    memberchk(K, [po,ps,pe]),
    (   K = po -> 
        memberchk(C, [16'0021,16'0025,16'002C,16'003B,16'055C,16'055D,
                      16'060C,16'061B,16'066A,16'1363,16'1364,16'1802,
                      16'1808,16'3001,16'FE50,16'FE51,16'FE54,16'FE57,
		      16'FE6A,16'FF01,16'FF05,16'FF0C,16'FF1B,16'FF64])
    ;   true
    ).

The so-called "meta" characters in ISO Prolog are the ASCII umlaut, acute accent, grave accent, and reverse solidus. This is a rather puzzling grouping. Three of them are used for quoting string-like tokens, one of them isn't. Unlike Lisp, the reverse solidus has no "meta" semantics in Prolog except as part of string-like tokens; when it occurs unquoted it is just a plain symbol character.

Since we already have is_quote/2, which covers the characters that begin string-like tokens, I do not think that an *is_meta/1 character classification predicate is appropriate.

The confusingly named "graphic" characters of ISO Prolog are "#$&*+-./:<=>?@^~". This is especially confusing, because what we want to know is not "which characters stand for themselves in quoted tokens" (the answer is: anything except the corresponding closing quote, the reverse solidus, or a Zl or Zp character), but "which characters can be used in unquoted quoted literals that look vaguely like mathematical operators. That is, we need a character class that includes these characters and also includes the reverse solidus, for the sake of ISO Prolog atoms such as \=, \==, =\=, \+, /\, \/, and \ . This is in fact the set "graphic token char" of section 6.4.2.

If we look these characters up in the Unicode data base, we find that they fall into four groups:

Pd (Punctuation, Dash): -
Po (Punctuation, Other): # & * . | : ? \ @
Sc (Symbol, Currency): $
Sm (Symbol, Mathematical): + - < = > ~

These groupings actually fit our intuitions about which characters should be allowed in unquoted atoms very well; when extended to Latin 1 they yield exactly the set of characters my Prolog-in-C parser has accepted since Latin 1 came out. The only problem is that some Po characters are not symbol characters, but we already have a recogniser for those, so:

is_symbol(C) :-
    is_unicode(C, K),
    memberchk(K, [pd,po,sc,sm]),
    \+ is_solo(C).

Locales

If Prolog uses Unicode internally, then we don't have to worry much about locales for character classification, so I'll leave this section empty for a while.

1.5. Case conversion

In ASCII, every letter was either a lower case letter with a unique context-independent upper case letter or an upper case letter with a unique context-independent lower case letter, and this was the only variation between letter forms.

In the ISO 8859 family of 8-bit character sets, there could be letters which were neither upper nor lower case, and there could be letters which did belong to a case but did not have an opposite case equivalent (there are two of those in Latin-1), and there could be letters with positional variants (there are five of those in 8859-8). Worse still, there could be letters whose opposite case form was not the same length (ß).

Of course it's well known that Greek has two lower case forms for the letter "s" both corresponding to the same upper case form.

Interestingly enough, the Unicode 3.1.1 database defines case mappings, and its case mappings are one-to-one. the letter "ß" is handled by not mapping it to anything else at all, and the Greek letters are handled by mapping the capital "S" to the medial (not the final) letter "s", regardless of the context.

The Unicode standard points out that case mapping is locale-sensitive, and gives the example of Turkish.

Accordingly, we do need some kind of locale objects. This is a very preliminary draft here!

:- type character_locale.  % not specified

:- pred character_locale(atom, character_locale).
:- character_locale(Name, Locale) when Name.
:- character_locale(Name, Locale) is semidet when ground(Name).

character_locale(Name, Locale) :-
    Name is the name of a locale and
    Locale is a private representation of its character facet.

One special locale object is 'unicode', which represents the Unicode case conversion tables. We might also need 'posix' and/or 'C' locales for this. A locale name like 'en_NZ' might or might not be acceptable as a locale object at the implementor's choice.

We get two sets of case conversion predicates. None of them takes a single character as argument or delivers a single character as result. We have, in general, no right to expect that lc(x++y)=lc(x)++lc(y).

:- pred to_lower(character_locale, codes, codes, codes).
:- to_lower(L, Xs, Ys0, Ys) when L, Xs.
:- to_lower(L, Xs, Ys0, Ys) is semidet when ground(L), ground(Xs).

locale_to_lower(Locale, Xs, Ys0, Ys) :-
    Locale is a character locale and Xs is a string,
    Ys0\Ys is Xs converted to lower case according to Locale.

locale_to_lower(Locale, Xs, Ys) :-
    to_lower(Locale, Xs, Ys, []).

to_lower(Xs, Ys0, Ys) :-
    locale_to_lower(unicode, Xs, Ys0, Ys).

to_lower(Xs, Ys) :-
    to_lower(Xs, Ys, []).

:- pred to_title(character_locale, codes, codes, codes).
:- to_title(L, Xs, Ys0, Ys) when L, Xs.
:- to_title(L, Xs, Ys0, Ys) is semidet when ground(L), ground(Xs).

locale_to_title(Locale, Xs, Ys0, Ys) :-
    Locale is a character locale and Xs is a string,
    Ys0\Ys is Xs converted to title case according to Locale.

locale_to_title(Locale, Xs, Ys) :-
    locale_to_title(Locale, Xs, Ys, []).

to_title(Xs, Ys0, Ys) :-
    locale_to_title(unicode, Xs, Ys0, Ys).

to_title(Xs, Ys) :-
    to_title(Xs, Ys, []).

:- pred to_upper(character_locale, codes, codes, codes).
:- to_upper(L, Xs, Ys0, Ys) when L, Xs.
:- to_upper(L, Xs, Ys0, Ys) is semidet when ground(L), ground(Xs).

locale_to_upper(Locale, Xs, Ys0, Ys) :-
    Locale is a character locale and Xs is a string,
    Ys0\Ys is Xs converted to upper case according to Locale.

locale_to_upper(Locale, Xs, Ys) :-
    locale_to_upper(Locale, Xs, Ys, []).

to_upper(Xs, Ys0, Ys) :-
    locale_to_upper(unicode, Xs, Ys0, Ys).

to_upper(Xs, Ys) :-
    to_upper(Xs, Ys, []).

1.6. Text Element Boundaries

The title of this section refers to the corresponding section of the Unicode Standard, where rules are given for finding

character,
word,
sentence, and
line boundaries.

In Latin1, there is a one to one correspondence between the thing you might call a `char' in a program and the thing a normal person looking at a printed text might call a `character'. The imaginary rock band ``Death-Töngue'' (from ``Bloom County'') clearly has 12 whatsits in its name on either construal. If we step through the Prolog list (of Latin1 codes) "Death-Töngue" one element at a time, we shall visit one written character at a time.

This neat equivalence did not hold in ASCII, although that is not widely known. In ASCII, it was legitimate to encode <ö> as <o,BS,"> or <",BS,O>. This accounts for some of the compromise character shapes: " has to serve for opening double quote, closing double quote, and diaresis, ' has to serve for apostrophe, right quote, and acute accent, , has to serve for comma and cedilla, and so on. The result is that except for US English, the relationship between a sequence of characters and a sequence of chars was neither one to one nor unique.

Unicode brings back the complexities of ASCII, doubled and redoubled. There is in principle no upper bound to the number of diacritical characters which may be attached to a base character, hence no upper bound on the number of Unicode code elements which may be required to represent a single scripteme.

I propose the following predicates for inclusion in both the 'chars' module and the 'codes' module. I illustrate only the 'codes' versions, but the 'chars' versions are obvious.

:- pred next_character(normalisation_form, codes, codes, codes).
:- next_character(NF, Xs, Ys0, Ys) when NF, Ys0.
:- next_character(NF, Xs, Ys0, Ys) is semidet
   when ground(NF), ground(Xs), ground(Ys0).
:- next_character(NF, Xs, Ys0, Ys) is semidet
   when NF==v, ground(Ys0).
:- next_character(NF, Xs, Ys0, Ys) is bounded
   when ground(Ys0).

next_character(NF, Xs, Ys0, Ys) :-
    append(Zs, Ys, Ys0),
    Zs \== [],
    normalised_compare(NF, =, Xs, Zs),
    Zs is the longest prefix of Ys0 not containing
      a character boundary.

next_character(Xs, Ys0, Ys) :-
    next_character(v, Xs, Ys0, Ys).

next_character(Xs, Ys0) :-
    next_character(Xs, Ys0, Ys).

:- pred next_word(normalisation_form, codes, codes, codes).
:- next_word(NF, Xs, Ys0, Ys) when NF, Ys0.
:- next_word(NF, Xs, Ys0, Ys) is semidet
   when ground(NF), ground(Xs), ground(Ys0).
:- next_word(NF, Xs, Ys0, Ys) is semidet
   when NF==v, ground(Ys0).
:- next_word(NF, Xs, Ys0, Ys) is bounded
   when ground(Ys0).

next_word(NF, Xs, Ys0, Ys) :-
    append(Zs, Ys, Ys0),
    Zs \== [],
    normalised_compare(NF, =, Xs, Zs),
    Zs is the longest prefix of Ys0 not containing
      a word boundary.

next_word(Xs, Ys0, Ys) :-
    next_word(v, Xs, Ys0, Ys).

next_word(Xs, Ys0) :-
    next_word(Xs, Ys0, Ys).

:- pred next_sentence(normalisation_form, codes, codes, codes).
:- next_sentence(NF, Xs, Ys0, Ys) when NF, Ys0.
:- next_sentence(NF, Xs, Ys0, Ys) is semidet
   when ground(NF), ground(Xs), ground(Ys0).
:- next_sentence(NF, Xs, Ys0, Ys) is semidet
   when NF==v, ground(Ys0).
:- next_sentence(NF, Xs, Ys0, Ys) is bounded
   when ground(Ys0).

next_sentence(NF, Xs, Ys0, Ys) :-
    append(Zs, Ys, Ys0),
    Zs \== [],
    normalised_compare(NF, =, Xs, Zs),
    Zs is the longest prefix of Ys0 not containing
      a sentence boundary.

next_sentence(Xs, Ys0, Ys) :-
    next_sentence(v, Xs, Ys0, Ys).

next_sentence(Xs, Ys0) :-
    next_sentence(Xs, Ys0, Ys).

:- pred next_line(normalisation_form, codes, codes, codes).
:- next_line(NF, Xs, Ys0, Ys) when NF, Ys0.
:- next_line(NF, Xs, Ys0, Ys) is semidet
   when ground(NF), ground(Xs), ground(Ys0).
:- next_line(NF, Xs, Ys0, Ys) is semidet
   when NF==v, ground(Ys0).
:- next_line(NF, Xs, Ys0, Ys) is bounded
   when ground(Ys0).

next_line(NF, Xs, Ys0, Ys) :-
    append(Zs, Ys, Ys0),
    Zs \== [],
    normalised_compare(NF, =, Xs, Zs),
    Zs is the longest prefix of Ys0 not containing
      a line boundary.

next_line(Xs, Ys0, Ys) :-
    next_line(v, Xs, Ys0, Ys).

next_line(Xs, Ys0) :-
    next_line(Xs, Ys0, Ys).

Section 2. Higher-order operations.

2.1. `call/N`

The ISO Prolog standard defines call/1 in section 7.8.3. Looking at the definition is like a slap in the face; the committee rejected my denotational semantics for Prolog on the grounds that it was written in Pascal, then when I pointed out that on the contrary, it was written in a pure functional language that wasn't anything like Pascal, it was rejected on the grounds that it was too "implementation-oriented". If that doesn't describe the definition of call/1, I don't know what does.

If call/1 was not in the language, we could define it thus:

    :- dynamic $call/1.

    call(Goal) :-
        asserta((
            $call(Goal) :-
                retract(($call(_) :- _)),
                !,
                Goal
	)),
	$call(Goal).

In the Prolog type checking paper that Alan Mycroft and I wrote, we proposed the call/N family, in order to permit type-checking Prolog higher-order code. Since then, some Prologs have implemented this feature in the library (notably DEC-10 Prolog, C Prolog, Quintus Prolog) with an upper bound on N, and some Prologs have built it into their language (SWI Prolog and, I believe, NU Prolog).

Roughly speaking, one expects

call(p(X1,X2,X3)) :- p(X1, X2, X3).

call(p(X1,X2), X3) :- p(X1, X2, X3).

call(p(X1), X2, X3) :- p(X1, X2, X3).

call(p, X1, X2, X3) :- p(X1, X2, X3).

Indeed, one way to implement the call/N family is to start with a skeleton

    call(P, ...) :- \+ callable(P), !, error(uncallable term).
    [special purpose code for control structures if N <= 2]
    [gap]
    call(P, ...) :- error(undefined predicate).

and then whenever a predicate p/N is defined, automatically add clauses to call/1, ..., call/N+1 in the appropriate gaps. In a WAM-based system, this can be done by exploiting switch-on-term without any choice-points being created, so call/N in the normal case is just a hash table look-up, a bit of argument shuffling, and a jump. The space requirement is O(P.A²), where P is the number of predicates and A is the arity. Since arities are usually fairly low, this is in effect linear in the size of the program. I have tried this, and found it to be a pleasantly efficient (as well as obvious) way to implement call/N.

There are other ways to implement call/N which are nearly as efficient, and have only O(P) space overhead, the hash table needed to map from a predicate's functor to its address.

This family of predicates has proven its usefulness. It's time they were in the community standard.

:- pred call(void(T1), T1).
:- pred call(void(T1,T2), T1, T2).
:- pred call(void(T1,T2,T3), T1, T2, T3).
:- pred call(void(T1,T2,T3,T4), T1, T2, T3, T4).
:- pred call(void(T1,T2,T3,T4,T5), T1, T2, T3, T4, T5).
:- pred call(void(T1,T2,T3,T4,T5,T6), T1, T2, T3, T4, T5, T6).

% Sample implementation.
% Beware: this is highly inefficient.  Don't do it this way.

call(P, Y1) :-
    P =.. L0,
    append(L0, [Y1], L1),
    Q =.. L1,
    call(Q).

call(P, Y1, Y2) :-
    P =.. L0,
    append(L0, [Y1,Y2], L2),
    Q =.. L2,
    call(Q).

call(P, Y1, Y2, Y3) :-
    P =.. L0,
    append(L0, [Y1,Y2,Y3], L3),
    Q =.. L3,
    call(Q).

call(P, Y1, Y2, Y3, Y4) :-
    P =.. L0,
    append(L0, [Y1,Y2,Y3,Y4], L4),
    Q =.. L4,
    call(Q).

call(P, Y1, Y2, Y3, Y4, Y5) :-
    P =.. L0,
    append(L0, [Y1,Y2,Y3,Y4,Y5], L5),
    Q =.. L5,
    call(Q).

call(P, Y1, Y2, Y3, Y4, Y5, Y6) :-
    P =.. L0,
    append(L0, [Y1,Y2,Y3,Y4,Y5,Y6], L6),
    Q =.. L6,
    call(Q).

2.2. `meta_predicate`

As soon as you have call/1, you have a problem. Cross-referencers don't work. You can fix this for the specific cases of call/1, findall/3, bagof/3, setof/3, and a few others, but as soon as a Prolog programmer writes

gcc(Goal) :- \+ \+ Goal. % garbage-collecting call

the cross-referencer stops working. The C call graph tool on my workstation is similarly limited; if you have a function which is passed to another function and not called directly, it thinks that function is never called. That's not what I call a usable call graph tool; it is worse than having no call graph at all.

Characteristically, the ISO committee missed the point. They thought that meta_predicate has something to do with modules. It does, but only accidentally. The real function of meta_predicate declarations is to provide just enough type information for a cross-referencer to work.

The idea of meta_predicate declarations was to start with existing mode declarations (so the arguments would be +, -, or ?) and change some of the arguments to something else to say "this argument is called".

Had call/1 been the only game in town, something like 'call' would have been adequate, so we'd have had

:- meta_predicate
    findall(?, call, -),    % not final version
    gcc(call).              % not final version

But the call/N family came into Prolog about the time the DEC-10 cross-referencer was written, and replaced prior code that did things like

all(P, [], []).
all(P, [X|Xs], [Y|Ys]) :-
    apply(P, [X,Y]),
    all(P, Xs, Ys).

Very often, when an argument is passed to a meta-predicate, it is a closure, not a goal; the callee will supply more arguments (at the right) before calling it. This is information that a cross-referencer needs.

So, quite independently of any module system, we need meta-predicate declarations of this form:

meta_predicate_directive -->
    [(:-),(meta_predicate)],
    meta_predicate_declarator,
    rest_meta_predicate_declarators.

rest_meta_predicate_declarators --> ['.'].
rest_meta_predicate_declarators --> [','],
    meta_predicate_declarator,
    rest_meta_predicate_declarators.
    
meta_predicate_declarator --> atom(_).
meta_predicate_declarator --> atom(_), ['('],
    meta_predicate_argument,
    rest_meta_predicate_arguments.

rest_meta_predicate_arguments --> [')'].
rest_meta_predicate_arguments --> [','],
    meta_predicate_argument,
    rest_meta_predicate_arguments.

meta_predicate_argument --> [+].
meta_predicate_argument --> [-].
meta_predicate_argument --> [?].
meta_predicate_argument --> integer(N), {N >= 0}.

The mode alternatives, + - ?, are ignored by the system, but may be processed by a mode inference program like Chris Mellish's, and indicate the intended use of the corresponding arguments. As a mode, the meta alternative N has the same force as +. But N tells the cross-referencer that the corresponding argument is to be a goal that is missing its rightmost N arguments.

With the aid of this declaration, we can write

:- meta_predicate
    findall(?, 0, ?),
    gcc(0),
    all(2, ?, ?).

all(_, [], []).
all(P, [X|Xs], [Y|Ys]) :-
    call(P, X, Y),
    all(P, Xs, Ys).

The ISO committee not only got hold of the wrong end of the stick, they broke it, in these ways:

They changed the name to metapredicate, which is downright ugly.
They forbade +, -, and ? as arguments in declarators, and insisted on *.
They thought it was about modules, so kept it out of the base standard, where it is desperately needed.
They omitted the whole flipping point of the ruddy thing, which was the ability to state the number of missing arguments.

The result is that the ISO metapredicate directives are useless for cross-referencers. They are best regarded as obsolete.

But what if you don't care about cross-referencing? I guess you've never used Masterscope for Lisp, or cscope(1) for C, or the Smalltalk IDE, or ... if you can ask such a question. Come to think of it, if you haven't used a reasonably reliable cross-referencer for Prolog, you don't realise what a useful debugging tool it can be. (I leave you to make your own inferences about the ISO Prolog committtee.) But let's suppose you don't care about cross-referencing, and you don't care about the added documentation value from stating the number of missing arguments. Is there any reason why you should care about meta_predicate?

Yes. From its inception (which I can speak with authority about, because I invented it), the intention was that the meta_predicate directive would licence a nonstandard translation of meta-calls. Take all/3 for example:

    all(append(Front), Xs, Ys)

can be translated as

    all(<{address of append/3}, Front>, Xs, Ys)

where <Address,Arg1,...> is a special kind of term that looks like a compound term, but has an address where the functor should be. This permits an implementation of call/3 which spreads the arguments and jumps straight to the given address. This eliminates the space overhead for meta-calls (there is no longer a need for a table of the addresses of all predicates; the compiler has arranged for the addresses of just the meta-called predicates to be available directly).

All right, all right; so that would also make cross-module meta-calls as efficient as any other meta-calls.

There is another thing that meta_predicate does, and it is related to the possibility of unusual implementations of meta-calls, but it has a rather more profound theoretical reason. If T1 and T2 are two terms representing goals, then T1 = T2 is, strictly speaking, a second-order unification (or worse). But it isn't implemented that way. In order to avoid the obvious soundness and completeness problems, we want to check at compile time that two meta-terms are unified if and only if at least one of them is a variable. This means that a meta-argument of a predicate should only be a variable in the head, and that it should only be unified with another variable, and that it should only be passed to a meta-argument of some predicate. This is a type checking problem, and it is again a very useful debugging tool. But you can't even get started on it without meta_predicate; the ISO metapredicate abortion is quite useless for the job.

So even if we don't include modules in the community standard, and even if we don't include call/N, we still need either meta_predicate or a richer type declaration/checking system.

2.3. Lambda

We don't strictly speaking need lambda-expressions in Prolog; ordinary closures suffice quite often, and when they don't we can always write an extra predicate. But then, lambda-lifting shows that you don't strictly speaking need lambda-expressions in functional languages either; you can always move the thing out to top level, give it a name, and partially apply it at the original place of use.

Suppose we had a form something like

    all((\(X, Y) :- Body), Xs, Ys)

A compiler could implement this by finding the intersection of (free variables of Body \ free variables of \(X,Y)) with the free variables of the rest of the clause that contains it, and generating

p(FreeVars, X, Y) :- Body.
...
    all(p(FreeVars), Xs, Ys)

The interesting point is that it isn't possible to do this at run time. If we simply implemented

call((\(X1,X2) :- Body), X1, X2) :- call(Body).

then the first call to the lambda-expression would "use it up". We couldn't use this in a call to all/3, unless all of the elements of Xs were equal and so were all of the elements of Ys. If on the other hand we implemented

call((\(X1,X2):-Body), Y1, Y2) :-
    copy_term(foo(X1,X2,Body), foo(Y1,Y2,Goal)),
    call(Goal).

then

common_prefix(Front, Xs, Ys) :-
    all((\(X, Y) :- append(Front, Xs, Ys))).

wouldn't work, because Front would be copied, and it shouldn't be.

With compiler assistance (as in Mercury), lambda-expressions are fine. Without compiler assistance, lambda-expressions either don't work or require more, and more error-prone, annotation than I think the average programmer would be happy with. I do not want to recommend anything which requires compiler support or even that there be a compiler.

2.4. Recommendations

At a minimum, meta_predicate directives with 0 as a meta-argument annotation distinct from ':'. This is of great practical use. A non-module compiler can simply ignore such directives. A module compiler that cannot exploit the extra information can simply treat them as alternative syntax for metapredicate.
At least the first seven members of the call/N family; they are trivial to implement (I do not say trivial to implement efficiently, but even that is not particularly hard) and very useful.
Do not have any kind of lambda-expression.

Section 3. Lists.

We can derive many of the operations in the Quintus Prolog library (which is a revision and extension of the DEC-10 Prolog library) by projection from a small number of predicates. These "base" predicates are specified in some detail, the specifications of the "derived" predicates are then obvious consequences. I do not claim that the "base" predicates are "basic" in any sense, only that they are "universal relations" from which a range of more useful "views" can be projected.

"Base" predicates have been chosen solely for convenience in specification. I do not say that they are "basic" in any way. They are generally harder to understand than some of their derived predicates. They are not, as a rule, a good implementation basis. (For example, it would be very silly to implement append/3 the way it is specified here.) What makes a predicate a "base" predicate is that it makes a good base for specification, so that details of types and so on have to be repeated as little as possible.

The notion of "projection" is closely related to the notion of "bootstrapping" (did Mrs Malaprop choose that word?) in the ISO Prolog standard. Many (but by no means all) of the "bootstrapped" predicates in the ISO Prolog standard are projections. Predicate P is a projection of predicate Q if and only if P is not only a special case of Q, but has its arguments in the same order as Q. The idea is that the strong semantic link between P and Q should make it easy for a programmer to memorise the argument order of Q and thereafter be able to correctly predict the argument order of P without any additional memory burden.

`append_append_length/7`

If any predicates have universal acceptance in Prolog, they must be append/3, member/2, and length/2. They are all projections of a single predicate.

:- pred append_append_length(
            list(X), list(X), list(X),
            list(Y), list(Y), list(Y),
            integer).

:- append_append_length(A, B, AB, C, D, CD, N)
   when A ; AB ; C ; CD ; N.

:- append_append_length(A, B, AB, C, D, CD, N) is semidet
   when list_skel(A) ; list_skel(C) ; nonvar(N).
:- append_append_length(A, B, AB, C, D, CD, N) is bounded
   when list_skel(AB) ; list_skel(CD).        

%  Sample implementation.

append_append_length(A, B, AB, C, D, CD, N) :-
    (   var(N) ->
        $append_append_length(A, B, AB, C, D, CD, 0, N)
    ;   integer(N) ->        
        N >= 0,
        $append_append_length(A, B, AB, C, D, CD, N)
    ;   $error
    ).

$append_append_length([], B, B, [], D, D, N, N).
$append_append_length([X|A], B, [X|AB], [Y|C], D, [Y|CD], N0, N) :-
    N1 is 1 + N0,
    $append_append_length(A, B, AB, C, D, CD, N1, N).

$append_append_length(A, B, AB, C, D, CD, N0) :-
    (   N0 =:= 0 ->
        A = [], B = AB, C = [], D = CD
    ;   N1 is N0 - 1,
        A = [X|A1], AB = [X|AB1],
        C = [Y|C1], CD = [Y|CD1],
        $append_append_length(A1, B, AB1, C1, D, CD1, N1)
    ).

Semantics: append_append_length(A, B, AB, C, D, CD, N) is true when length(A, N) and length(C, N) and append(A, B, AB) and append(C, D, CD). In Mercury, where the compiler reorders code according to data flow, we could write

append_append_length(A, B, AB, C, D, CD, N) :-
    length(A, N),
    length(C, N),
    append(A, B, AB),
    append(C, D, CD).

In ISO Prolog, however, this implementation is quite unsatisfactory.

?- append_append_length(A, "yz", "xyz", [X], R, "abc", N).

will find the (unique) solution, but if that is rejected, will then backtrack forever, trying ever longer bindings for A.

There is, in fact, no fixed ordering of these goals which is satisfactory for all queries, where "satisfactory" is defined as finding all solutions and terminating in a finite amount of time, for a query having a finite number of solutions.

Derived predicates

append_append(A, B, AB, C, D, CD) :-
    append_append_length(A, B, AB, C, D, CD, _).

append_length(A, B, AB, D, AD, N) :-
    append_append_length(A, B, AB, A, D, AD, N).

append(A, B, AB, D, AD) :-
    append_length(A, B, AB, D, AD, _).

append_length(A, B, AB, N) :-
    append_length(A, B, AB, A, B, AB, N).

append(A, B, AB) :-
    append_length(A, B, AB, _).

same_length(A, C, N) :-
    append(A, [], A, C, [], C, N).

same_length(A, C) :-
    same_length(A, C, _).

one_longer(A, C) :-
    same_length(A, [_|C]).

shorter_list(A, C) :-
    append([_|A], [], [_|A], _, _, C)  

cons(H, T, L) :-
    append([H], T, L).

The following predicates come from library(list_parts) in the Quintus library. However, in order that the argument order of these predicates should be consistent with the other list processing predicates, it is necessary to switch the argument order from Whole,Part to Part,Whole, because that's the way append/3 is.

head(H, L) :-
    cons(H, _, L).

tail(T, L) :-
    cons(_, T, L).

prefix(P, L) :-
    append(P, _, L).

proper_prefix(P, L) :-
    append(P, [_|_], L).

suffix(S, L) :-
    append(_, S, L).

proper_suffix(S, L) :-
    append([_|_], S, L).

The next group of predicates are already fully consistent with append_append_length/7.

length(X, N) :-
    append_length(A, [], A, N).

member(X, L) :-
    append(_, [X|_], L).

memberchk(X, L) :-
    member(X, L),
    !.

:- nonmember(X, L)
   when ground(X), ground(L).

:- nonmember(X, L) is semidet
   when ground(X), ground(L).

nonmember(X, L) :-
    \+ member(X, L). 

select(X, L, R) :-
    append(_, [X|T], L, T, R).

selectchk(X, L, R) :-
    select(X, L, R),
    !.

select(X, Xs, Y, Ys) :-
    append(_, [X|Xs], L, [Y|Ys], R).

selectchk(X, L, Y, R) :-
    select(X, L, Y, R),
    !.

last(ButLast, X, L) :-
    append(ButLast, [X], L).

last(X, L) :-
    last(_, X, L).

nextto(X, Y, L) :-
    append(_, [X,Y|_], L).

This predicate corresponds to the old correspond/4 predicate, but has a consistent argument order:

member_member(X, Xs, Y, Ys) :-
    append(_, [X|_], Xs, _, [Y|_], Ys).

One operation which is commonly asked for is finding the Nth element of a list. Here, unfortunately, we have two clashing archetypes: arg/3 and the append_append_length/7 family. The Quintus library assimilated "Nth element" to arg/3, with predicates

  nth0(N, List, Member)
  nth0(N, List, Member, Residue)
  nth1(N, List, Member)
  nth1(N, List, Member, Residue)

The clash with member(Member, List) and select(Member, List, Residue) is unfortunate. The fact that 0 and 1 have other meanings in the Quintus library (and other meanings in connection with lists) is also unfortunate.

I am driven to the conclusion that the analogy with arg(N, Term, Arg) is bogus. In particular, arg/3 only works one way around: it does not solve for N (although at the price of notably worse generated code, it could) and it cannot solve for Term. This means that its first two arguments are, on any understanding of the term, "inputs", and by the inputs-before-outputs principle, should come first.

The "Nth element" predicates do not share arg/3's input/output restrictions. Indeed, they are more likely to be used to find N for a known Member than the other way around; it is tolerable for linear search to take linear time, but not for indexing.

I use the name component "_offset" to refer to a number of elements to skip. You can think of this as 0-origin indexing if you like, but it is actually more fruitful to think of it quite literally as the length of some list segment. Indeed, "length" could be used to be consistent with the other "_length" predicates, except that in this context it might be confusing.

I use the name component "_index" to refer to a position in a list. Like argument positions in a term, member positions in a list always start with 1.

The Nth element predicates are

member_member_offset(X, Xs, Y, Ys, Offset) :-
    append_length(_, [X|_], Xs, _, [Y|_], Offset).

member_member_index(X, Xs, Y, Ys, Index) :-
    member_member_index(X, [X|Xs], Y, [Y|Ys], Index),
    Index =\= 0.

member_offset(Member, List, Offset) :-
    append_length(_, [Member|_], List, Offset).

member_index(Member, List, Index) :-
    append_length(_, [Member|_], [Member|List], Index),
    Index =\= 0.

select_offset(X, Xs, Y, Ys, Offset) :-
    append_length(_, [X|T], Xs, [Y|T], Ys, Offset).

select_index(X, Xs, Y, Ys, Index) :-
    select_offset(X, [X|Xs], Y, [Y|Ys], Index),
    Index =\= 0.

select_offset(Member, List, Residue, Offset) :-
    append_length(_, [Member|Tail], List, Tail, Residue, Offset).

select_index(Member, List, Residue, Index) :-
    append_length(_, [Member|Tail], [Member|List], Tail, [Member|Residue], Index),
    Index =\= 0.

reverse_append_length/4

Prolog has traditionally had reverse/2. Common Lisp also provides (REVAPPEND - -), and such a predicate is commonly part of the implementation of reverse/2.

:- type reverse_append_length(list(T), list(T), list(T), integer).
:- reverse_append_length(L, B, RB, N)
   when L ; RB ; N.
:- reverse_append_length(L, B, RB, N) is semidet
   when list_skel(L) ; integer(N).
:- reverse_append_length(L, B, RB, N) is bounded
   when list_skel(RB).

%  Sample implementation.

reverse_append_length(L, B, RB, N) :-
    append_length(L, [], L, _, B, RB, N),
    $reverse(L, B, RB).
    
$reverse([], B, B).
$reverse([X|L], B, RB) :-
    $reverse(L, [X|B], RB).

Semantics: reverse_append(L, B, RB, N) is true when RB = reverse(L)++B and length(L) = N.

Notation: There is no new notation here, but I feel that I must repeat the warning that the "sample implementations" of "base" predicates are crafted for exposition and specification. This is not the way that I would implement this predicate for practical use. What implementor would use a two-pass algorithm when a one-pass algorithm could be used?

Except that L is to be reversed, the argument order of this predicate is identical to the argument order of append_length/4, as the name suggests.

Derived predicates

reverse_append(L, B, RB) :-
    reverse_append_length(L, B, RB, _).
    
reverse_length(L, R, N) :-
    reverse_append_length(L, [], R, N).

reverse(L, R) :-
    reverse_length(L, R, _).

`permutation_length`

:- pred permutation_length(list(T), list(T), integer).
:- permutation_length(A, B, N) is bounded
   when list_skel(A) ; list_skel(B) ; nonvar(N).

%  Sample implementation

permutation_length(A, B, N) :-
    same_length(A, B, N),
    $perm(A, B).

$perm([], []).
$perm([X|Xs], Ys1) :-
    $perm(Xs, Ys),
    append(_, R, Ys, [X|R], Ys1).

Derived predicates

permutation(A, B) :-
    permutation(A, B, _).

`pairs_keys_values/3`

Given the great practical utility of keysort/2, it is useful to be able to build and dismantle the lists it requires and produces.

I am not satisfied with the name that was used in the Quintus library (keys_and_values/3, because it does not make clear which argument is which. Nor is there any way to appeal to 'inputs before outputs' because the predicate is used both ways around. I am tentatively calling it pairs_keys_values/3. Whatever it's called, it is a surprisingly useful predicate.

:- pred pairs_keys_values(list(pair(K,V)), list(K), list(V)).
:- pairs_keys_values(Pairs, Keys, Values)
   when Pairs ; Keys ; Values.
:- pairs_keys_values(Pairs, Keys, Values) is semidet
   when list_skel(Pairs) ; list_skel(Keys) ; list_skel(Values).

%  Sample implementation

pairs_keys_values([], [], []).
pairs_keys_values([K-V|Pairs], [K|Keys], [K|Values]) :-
    pairs_keys_values(Pairs, Keys, Values).

Note that the sample implementation doesn't actually satisfy the requirements. In a traditional first-argument-indexing Prolog, if the first argument is a variable but one of the others is a list skeleton, the determinism of the query will not be detected.

Derived predicates

I'm not sure it's worth bothering with these.

pairs_keys(Pairs, Keys) :-
    pairs_keys_values(Pairs, Keys, _).

pairs_values(Pairs, Values) :-
    pairs_keys_values(Pairs, _, Values).

Deletion

The definition of the "delete" predicate is something I struggled with for years. Part of the problem is that if list R does not contain X, there are infinitely many lists L such that deleting X from L yields R, so the predicate really only works as a function. That tells us that L and X should precede R in the argument order. Should it be delete(Member, List, Residue) to be consistent with select/3? Or, because delete/3 is dangerously different from select/3, is it actually a good thing if they are different?

Prolog has three versions of equality: (=)/2, (==)/2, and (=:=)/2. Which of them should be used? Or should we have three predicates, the way Scheme used to have DELQ!, DELV!, and DELETE!?

Should the item to be deleted be a strict input, or should delete/3 be able to backtrack over the elements of the input list as in Cugini's library?

If X does not occur in L, is that a good input, or should it fail?

There is a definition in the Quintus library. It's basically this:

/* Historic background, NOT for inclusion */
:- pred delete(list(T), T, list(T)).
:- delete([], _, _) when true.
:- delete([H|T], X, R) when ground(H), ground(X).
:- delete(L, X, R) is semidet
   when ground(L), ground(X).
   
%  Sample implementation

delete([], _, []).
delete([H|T], X, R) :- H == X, !,
    delete(T, X, R).
delete([H|T], X, [H|R]) :-
    delete(T, X, R).

I don't want to take delete/3 away from the people who use it, if there are any, but it clearly doesn't belong in even an informal standard. The problem is that there are so many choices that none of them is going to be the right one for the majority of uses (if uses there be); none of them deserves the exclusive right to the name delete/3. The history of Scheme may be of interest here. There used to be three delete-from-list functions in Scheme, but there aren't any these days.

One of the questions about deleting from a list is how many elements to delete. I note that when I want to delete a known element from a list, I almost always want to delete just one copy (usually because I know there is only one copy). For that purpose, selectchk(X, L, R) works very well. The result is that despite writing delete/3, I cannot recall ever using it except to test it.

If delete/3 is not standard, what can programmers do instead? One easy answer is to use findall/3:

findall_member(Member, List, Test, Result) :-
    findall(Member, (member(Member, List), \+ \+call(Test)), Result).

... findall_member(Y, L, \+ Y = X, R)
... findall_member(Y, L, Y \== X,  R)
... findall_member(Y, L, Y =\= X,  R)

Using findall_member/4 instead of delete/3 has some important advantages:

It is intention-revealing. When you are reading such code, you don't have to remember what kind of equality test is used for deciding which elements go, you just have to look. The test is there before your eyes.
A single predicate does the work of three deletion predicates. In fact, it does the work of an infinite number of deletion predicates, because
It is open-ended. The criterion for what to retain can be anything you want. While I haven't used delete/3 in earnest, I would have used this.

The only bad thing about this predicate is efficiency. But that's why I introduced findall_member/4 in the first place, instead of calling findall/3 directly. It's comparatively simple for a compiler to recognise calls to findall_member/4 and generate a three-clause loop (essentially the structure of delete/3 shown above), using a cache to ensure that the same code is not generated repeatedly. Or for a compiler such as BinProlog, where findall/3 is fast, the compiler could just convert a findall_member/4 call to the corresponding findall/3 code.

The findall/4 predicate, which combines the effect of findall/3 with the effect of append/3, is unaccountably missing from ISO Prolog. When you know about it, you find nearly as many uses for findall/4 as for findall/3. This suggests to me that the right "base" predicate is not findall_member/4 but findall_member_append/5.

:- pred findall_member_append(T, list(T), void, list(T), list(T)).
:- findall_member(M, L, P, R) is semidet,
   when list_skel(L), nonvar(P).

%  Sample implementation.

findall_member_append(_, [], _, R, R).
findall_member_append(M, [X|Xs], P, R0, R) :-
    \+ (X = M, call(P)),
    !,
    findall_member_append(M, Xs, P, R0, R).
findall_member_append(M, [X|Xs], P, [X|R0], R) :-
    findall_member_append(M, Xs, P, R0, R).

Derived predicates

findall_member(M, Xs, P, R) :-
    findall_member(M, Xs, P, R, []).

delete(L, X, R) :-
    findall_member(M, L, X \== M, R).

intersection(S1, S2, S) :-
    findall_member(M, S1, member(M, S2), S).

difference(S1, S2, S) :-
    findall_member(M, S1, nonmember(M, S2), S).

union(S1, S2, S) :-
    findall_member(M, S1, nonmember(M, S2), S, S2).

symdiff(S1, S2, S) :-
    findall_member(M, S1, nonmember(M, S2), S, S3),
    difference(S1, S2, S3).

Recommendation

I am not actually recommending any of the predicates in this section. In particular, I am not recommending any of the traditional quadratic-time set-as-unordered-list predicates for inclusion in the "community standard library". What I do recommend is that either no such predicates (letting people call findall/3 themselves, or write and document their own deletion predicates), or else one predicate should be included in the library, and that predicate should be findall_member_append/5.

The Quintus library contains delete(L, X, N, R) which deletes the first N copies of X in L, or fewer if there aren't that many. To be honest, it is difficult to imagine any serious use of this. N=0 is just R=L. N=1 is just selectchk/3. N larger than the length of L is just delete/3. I never did find any other uses for it.

Undone

The following predicates from the DEC-10 and/or Quintus libraries have not been slotted into families yet.

subseq(AIB, A, B)
subseq0(AIB, A).
subseq1(AIB, A).
sumlist(Ns, Sum).
transpose(Xss, Yss).
stuff from library(more_lists) for use in DCGs, where a different argument order is appropriate.
The higher order predicates from library(maplist).
The ordering-based predicates, such as {min, max}_{member([P, ]X, L), select([P, ]X, L, R)}. In particular, it is time that msort/2 was adopted.

Higher order list predicates

What self-respecting list-processing language would be caught dead without a basic kit of higher order list processing operations? Only Prolog, I guess.

WARNING: this is a draft. The use of N is not consistent in this section. I'm sending it out now so I can go home.

The following predicate families come from DEC-10 Prolog and Quintus Prolog:

maplist/N+1

The equivalent of 'map' and 'forall'. The functional programming community seem to have settled on 'map' as the name for this; if the name is changed at all, it should be to map/N.

maplist(P, [X11,...,X1n], ..., [Xm1,...,Xmn]) :-
    P(X11, ..., Xm1),
    ...
    P(Xm1, ..., Xmn).

scanlist/N+3

The name of this family was a mistake. I meant to copy the name from APL, which uses 'reduce' for this operation. OOPS. The functional community seem to have settled on the name 'foldl'. DEC-10 Prolog and Quintus Prolog had no equivalent of 'foldr'.

scanlist(P, [X11,...,X1n], ..., [Xm1,...,Xmn], V0, Vn) :-
    P(X11, ..., Xm1, V0, V1),
    ...
    P(Xm1, ..., Xmn, V', Vn).

cumlist/N+3

The APL name for this is 'scan' and the functional programming community have settled on 'scanl' for it.

cumlist(P, [X11,...,X1n], ..., [Xm1,...,Xmn], V0, [V0,V1,...,Vn]) :-
    P(X11, ..., Xmn, V0, V1),
    ...
    P(Xm1, ..., Xmn, V', Vn).

some/N+1

This serves the role of an existential quantifier. For lists, we don't actually need it.

some(P, [X11,...,X1n], ..., [Xm1,...,Xmn]) :-
    P(X11, ..., Xm1)
  ; ...
  ; P(Xm1, ..., Xmn).

Why don't we need it? Suppose we generalise member, to member/2N:

member(X1, [X1|_], ..., Xn, [Xn|_]).
member(X1, [_|Xs1], ..., Xn, [_|Xsn]) :-
    member(X1, Xs1, ..., Xn, Xsn).

Then instead of

    some(P, Xs1, ..., Xsn)

we can write

    member(X1, Xs1, ..., Xn, Xsn),
    P(X1, ..., Xn)

and we not only avoid all overheads of passing P and get to call it directly, if the query is try, we get to find out which values of X1...Xn make it true.

If an analogue of some/N is implementable, so is an analogue of member/2N.

somechk/N

This is to some/N as memberchk/2 is to member/2. For lists, we don't actually need it.

somechk(P, [X11,...,X1n], ..., [Xm1,...,Xmn]) :-
    P(X11, ..., Xm1) -> true
  ; ...
  ; P(Xm1, ..., Xmn) -> true.

Similarly, instead of

    somechk(P, Xs1, ..., Xsn)

we can write

    (member(X1, Xs1, ..., Xn, Xsn), P(X1, ..., Xn) -> true)

and get to find out why P is true as well that it is true.

include/N

The functional community calls this 'filter'.

include(P, [], ..., [], []).
include(P, [X1|Xs1], ..., [Xn|Xsn], Included) :-
    (   call(P, X1, ..., Xn) ->
        Included = [X1|Included1]
    ;   Included = Included1
    ),
    include(P, Xs1, ..., Xsn, Included1).

Note that this really requires P to be determinate, but it does allow variables in P to be bound. For a ground P, we could manage without this, writing

    findall(X1,
        ( member(X1, Xs1, .., Xn, Xsn), P(X1, ..., Xsn) ),
        Included)

would do the job. Some Prolog systems (BinProlog) may implement findall/3 so very efficiently to make this an attractive alternative.

exclude/N

This is include/N with the sense of the test reversed.

exclude(P, [], ..., [], []).
exclude(P, [X1|Xs1], ..., [Xn|Xsn], Included) :-
    (   call(P, X1, ..., Xn) ->
        Included = Included1
    ;   Included = [X1|Included1]
    ),
    exclude(P, Xs1, ..., Xsn, Included1).

There were and are others, but this will do to start with.

The obvious missing predicates are

foldr(P, [X11,...,X1n], ..., [Xm1,...,Xmn], V0, Vn) :-
    P(Xm1, ..., Xmn, V', Vn),
    ...
    P(X11, ..., Xm1, V0, V1).

scanr(P, [X11,...,X1n], ..., [Xm1,...,Xmn], Vn, [V0,V1,...,Vn]) :-
    P(Xm1, ..., Xmn, V', Vn),
    ...,
    P(X11, ..., Xm1, V0, V1).

Recommendations

Do include maplist/N+1, either under that name for backwards compatibility, or under the name map/N+1 for cultural compatibility with the functional programming community.
Do include scanlist/N+3, but under the name foldl/N+3.
Do include cumlist/N+3, but under the name scanl/N+3.
Do include foldr/N+3 and scanr/N+3.
Do include include/N+2 and exclude/N+2, under those names; if we called one of them 'filter', what would we call the other?
Do include member/2N under some name for a useful range of N.
Do not include some/N+1 or somechk/N+1, because it is easy to manage without them. I do not say that nobody should provide them, only that we should not require them of everyone.
The range of N should be at least 1..4.

This is very much a minimal set of higher-order list operations.

4. Sets

Many set representations can be used in Prolog: unordered lists, ordered lists, delta-coded ordered lists, bitstrings, binary search trees of various types, ternary search trees.

It is important that the commonly available implementation of sets should be efficient enough to use without hesitation. It is also important that the operation names should not conflict with widely used operation names for other set representations.

This means that either names such as union/3 must be avoided, or set predicates must not be used without a module prefix to identify the representation.

Without a change to ISO Prolog, we cannot enforce a requirement that module prefixes be used, so we have to avoid commonly used names.

The representation most widely used in Prolog textbooks is the unordered list representation. This is also the representation used in Common Lisp.

The ordered list representation has its flaws (adding, removing, or checking for the presence of a single element is O(|Set|) rather than O(lg|Set|)), but the unordered list representation has no virtues.

One of the flaws of any term comparison-based set representation in ISO Prolog is going to be that it allows variables to be set elements, so that two terms can be in a set because when they were entered they were not identical, but subsequent variable bindings can make them identical. The answer to this is to use a mode system which requires that set elements are sufficiently instantiated, perhaps via a "safe" version of compare/3.

I recommend the ordered set representation. It is space efficient, time efficient for bulk operations, and is already used in Prolog (the sort/2 and setof/3 predicates deliver results of this form, which we might as well exploit).

In order to preserve backwards compatibility and to avoid name clashes with the unordered list representation, all predicate names have the form ord_something.

`ord_op/5`

We can base all the ordered set operations on a single predicate. It uses a bit mask 2'byxBYX where X (x) says whether elements occurring only in Xs should be included in S1 (S2), Y (y) says whether elements occurring only in Ys should be included in S1 (S2), and B (b) says whether elements occurring in both sets should be included in S1 (S2).

:- type ordset(T) = list(T).

:- pred ord_op(integer, ordset(T), ordset(T), ordset(T), ordset(T)).
:- ord_op(M, Xs, Ys, S1, S2)
   when M, Xs, Ys.
:- ord_op(M, Xs, Ys, S1, S2) is semidet
   when ground(M), ground(Xs), ground(Ys).

ord_op(M, [], Ys, S1, S2) :-
    ( M /\ 2'000010 =:= 0 -> S1 = [] | S1 = Ys ),
    ( M /\ 2'010000 =:= 0 -> S2 = [] | S2 = Ys ).
ord_op(M, [X|Xs], Ys, S1, S2) :-
    $ord_op1(M, Xs, Ys, S1, S2, X).

$ord_op1(M, Xs, [], S1, S2, X) :-
    ( M /\ 2'000001 =:= 0 -> S1 = [] | S1 = [X|Xs] ),
    ( M /\ 2'001000 =:= 0 -> S2 = [] | S2 = [X|Xs] ).
$ord_op1(M, Xs, [Y|Ys], S1, S2, X) :-
    compare(R, X, Y),
    $ord_op12(R, Xs, Ys, S1, S2, X, Y, M).

$ord_op2(M, [], Ys, S1, S2, Y) :-
    ( M /\ 2'000001 =:= 0 -> S1 = [] | S1 = [Y|Ys] ),
    ( M /\ 2'001000 =:= 0 -> S2 = [] | S2 = [Y|Ys] ).
$ord_op2(M, [X|Xs], Ys, S1, S2, Y) :-
    compare(R, X, Y),
    $ord_op12(R, Xs, Ys, S1, S2, X, Y, M).

$ord_op12(<, Xs, Ys, S1, S2, X, Y, M) :-
    ( M /\ 2'000001 =:= 0 -> S1 = T1 | S1 = [X|T1] ),
    ( M /\ 2'001000 =:= 0 -> S2 = T2 | S2 = [X|T2] ),
    $ord_op2(M, Xs, Ys, T1, T2, Y).
$ord_op12(>, Xs, Ys, S1, S2, X, Y, M) :-
    ( M /\ 2'000010 =:= 0 -> S1 = T1 | S1 = [Y|T1] ),
    ( M /\ 2'010000 =:= 0 -> S2 = T2 | S2 = [Y|T2] ),
    $ord_op1(M, Xs, Ys, T1, T2, X).
$ord_op12(=, Xs, Ys, S1, S2, X, Y, M) :-
    ( M /\ 2'000100 =:= 0 -> S1 = T1 | S1 = [X|T1] ),
    ( M /\ 2'100000 =:= 0 -> S2 = T2 | S2 = [Y|T2] ),
    ord_op(M, Xs, Ys, T1, T2).

Looking at this predicate, we see that there is a clear notion of inputs (M, Xs, Ys) and outputs (S1, S2).

There are two ways to define is_ordset/1. One enforces the groundness of an ordered set. That is appropriate for systems which can delay calls to (some analogue of) compare/3. The other is useful for hacking, and allows set elements to be non-ground. At this stage I do not wish to make any recommendation, so I merely note the two definitions:

is_ordset(S) :-   % enforces groundness
    ground(S),
    sort(S, S).

is_ordset(S) :-   % allows sets of variables
    nonvar(S),
    $is_ordset(S).

$is_ordset([]).
$is_ordset([X|Xs]) :-
    nonvar(Xs),
    $is_ordset(Xs, X).

$is_ordset([], _).
$is_ordset([Y|Ys], X) :-
    nonvar(Ys),
    X @< Y,
    $is_ordset(Ys, Y).

Derived predicates

The list processing predicates, especially length/2, member/2, and select/3 can be used with ordered sets. The memberchk/2 and selectchk/3 predicates are rather more dubious, and need their own implementations.

list_to_ordset(List, Set) :-
    sort(List, Set).

ord_op(M, Xs, Ys, S1) :-
    ord_op(M, Xs, Ys, S1, []).

ord_compare(R, S1, S2) :-
    ord_op(2'010001, S1, S2, D12, D21),
    $ord_compare(R, D12, D21).

$ord_compare(<, [], [_|_]).
$ord_compare(>, [_|_], []).
$ord_compare(=, [], []).

ord_disjoint(S1, S2) :-
    ord_op(2'100, S1, S2, []).

ord_disjoint_union(S1, S2, S) :-
    ord_op(2'100011, S1, S2, S, []).

ord_intersect(S1, S2) :-
    ord_op(2'100, S1, S2, [_|_]).

ord_intersection(S1, S2, S) :-
    ord_op(2'100, S1, S2, S).

ord_intersection([], []).
ord_intersection([S1|Ss], S) :-
    ord_intersection(Ss, S2),
    ord_intersection(S1, S2, S).

ord_memberchk(X, S) :-
    ord_intersect([X], S).

ord_nonmember(X, S) :-
    ord_disjoint([X], S).

ord_proper_subset(S1, S2) :-
    ord_op(2'010001, S1, S2, [], [_|_]).

ord_proper_superset(S1, S2) :-
    ord_op(2'001010, S1, S2, [], [_|_]).

ord_selectchk(X, S1, S2) :-
    ord_op(2'010, [X], S1, S2).

ord_subset(S1, S2) :-
    ord_op(2'001, S1, S2, []).

ord_subtract(S1, S2, S) :-
    ord_op(2'001, S1, S2, S).

ord_superset(S1, S2) :-
    ord_op(2'010, S1, S2, []).

ord_symdiff(S1, S2, S) :-
    ord_op(2'011, S1, S2, S).

ord_union([], []).
ord_union([S1|Ss], S) :-
    ord_union(Ss, S2),
    ord_union(S1, S2, S).

ord_union(S1, S2, S) :-
    ord_op(2'111, S1, S2, S).

ord_union(S1, S2, U, D) :-
    ord_op(2'001111, S1, S2, U, D).

There are two names I am not happy with here. I would prefer it if ord_subtract/3 had a less "imperative" name, such as ord_difference/3. It would also be clearer if ord_union/4 were named ord_union_difference/4. However, that would break backwards compatibility, and that's bad manners

There are four predicates missing from the old library.

ord_seteq/2 has been generalised to ord_compare/3.
ord_add_element(X, S0, S) is replaced by ord_union([X], S0, S).
ord_del_element(X, S0, S) is renamed to ord_selectchk(X, S0, S) to be compatible with the list operations.
ord_setproduct/3 preserves set ordering but does not depend on it. Its analogue probably belongs among the list operations.

Of course, nothing would prevent an implementation providing those predicates for backwards compatibility.

Higher-order operations on sets

The higher order list operations can be applied to ordered sets. Results will not necessarily be ordered sets, but can be converted to ordered sets by sorting. It is useful to note that [include,exclude]/N+2 do not alter the order of their 2nd argument and do not introduce duplicates, so filtering an ordered set yields an ordered set.

5. Queues

Should be based on q(Unary,L0,L) representation. Each basic operation is O(1).

A queue is a sequence, rather like a list. Adding items or lists of items at either end is rather like appending, and so is removing an item or list of items from the head end. (That is to say, what I'm calling a "queue" here is really an "output-restricted deque").

There are many ways to implement queues. A single list makes some operations cheap and other operations very expensive. A pair of back-to-back lists works nicely in a strict functional language, but since there is a slow rebalancing operation which may be executed repeatedly on backtracking, they are not efficient in Prolog. The representation I propose here was invented by someone at SRI. Fernando Pereira told me about it, and I am sorry to say that I have forgotten the inventor's name. This representation has the nice property that adding an element at either end or removing an element from the head end requires just a single unification. This makes it well suited to Prolog.

There is one flaw in this representation: it uses list differences so that appending an item at the tail end must be regarded as destroying the original queue. Queue uses should be single threaded. Mercury can enforce this, but does not support the partially instantiated terms the technique relies on. Prolog does support partially instantiated terms, but cannot enforce single threading.

We found a similar problem with sets; a representation which Prolog is good at has technical flaws which mean that it has to be used carefully, but the efficiency of the representation is such that we are unwilling to abandon it.

:- type queue(T)
   ---> q(unary, list(T), list(T)).
:- type unary
   ---> z
      ; s(unary).

is_queue(q(U,L0,L)) :-
    nonvar(U), nonvar(L0),
    $is_queue(U, L0, L).

$is_queue(z, L0, L) :-
    L0 == L.
$is_queue(s(U), [_|L1], L) :-
    nonvar(U), nonvar(L1),
    $is_queue(U, L1, L).

item_queue_append(X, q(U,L0,L), q(s(U),[X|L0],L)).

list_queue_append([], Q, Q).
list_queue_append([X|Xs], Q, Q0) :-
    item_queue_append(X, Q1, Q0),
    list_queue_append(Xs, Q, Q1).

queue_item_append(q(U,L0,[X|L]), X, q(s(U),L0,L)).

queue_list_append(Q, [], Q).
queue_list_append(Q0, [X|Xs], Q) :-
    queue_item_append(Q0, X, Q1),
    queue_list_append(Q1, Xs, Q).

queue_list(q(U,L0,L), Xs) :-
    $queue_list(U, L0, L, Xs).

$queue_list(z, L, L, []).
$queue_list(s(U), [X|L1], L, [X|Xs]) :-
    $queue_list(U, L1, L, Xs).

cons_queue(X, Q0, Q1) :-
    item_queue_append(X, Q0, Q1).

head_queue(X, Q1) :-
    item_queue_append(X, _, Q1).

tail_queue(Q0, Q1) :-
    item_queue_append(_, Q0, Q1).

member_queue_offset(X, q(s(U),L0,_), N) :-
    $member(X, U, L0, 0, N).

$member(X, _, [X|_], N, N).
$member(X, s(U), [_|L1], N0, N) :-
    N1 is N0 + 1,
    $member(X, U, L1, N1, N).

member_queue_index(X, q(s(U),L0,_), N) :-
    $member(X, U, L0, 1, N).

member_queue(X, Q) :-
    member_queue_index(X, Q, _).

memberchk_queue(X, Q) :-
    member_queue(X, Q),
    !.

queue_list_length(Q, Xs, N) :-
    Q and Xs have the same elements, N of them.

queue_empty(Q) :-
    queue_list(Q, []).

queue_length(q(U,L0,L), N) :-
    (   integer(N) ->
        N >= 0,
        $queue_length(U, L0, L, N)
    ;   var(N) ->
        $queue_length(U, L0, L, 0, N)
    ;   $error        
    ).

$queue_length(z, L, L, N, N).
$queue_length(s(U), [_|L1], L, N0, N) :-
    N1 is N0 + 1,
    $queue_length(U, L1, L, N1, N).

$queue_length(s(U), [_|L1], L, N0) :-
    N0 > 0,
    !,
    N1 is N0 - 1,
    $queue_length(U, L1, L, N1).
$queue_length(z, L, L, 0).

portray_queue(Q) :-
    (   write('[|'),
        member_queue_offset(X, Q, N),
        ( N =:= 0 -> true ; write(', ') ),
        write(X),
        fail
    ;   write('|]')
    ).

I have omitted the following predicates from the Quintus library:

list_queue(Xs, Q) is queue_list(Q, Xs).
queue_last(Q, X) has the wrong argument order and is not a fast operation.
queue_last(Q0, X, Q) has the wrong argument order and is not a fast operation.

Since the Quintus module is called 'newqueues', this should not be a problem.

Higher order operations on queues

I think I may already have too many operations providing access to the elements of a queue other than by popping them. It doesn't feel right to propose any higher-order operations on them. The idea of having 2^N map*/N+1 predicates,

    map(P, L1, L2)
    map_list_queue(P, L1, Q2)
    map_queue_list(P, Q1, L2)
    map_queue_queue(P, Q1, Q2)

doesn't appeal either. Once the higher order list operations are there, anyone who wants higher order queue operations can easily program them him/herself.

6. Numbers.

There are three predicates which are so basic that the ISO Prolog standard itself uses them, while at the same time refusing to bless Prolog programmers with assured access to them.

succ/2 provides the successor function on natural numbers only. It is essential to the purposes for which succ/2 was invented that it should not accept or produce negative integers.

:- pred succ(integer, integer).
:- succ(X, Y) when X ; Y.
:- succ(X, Y) is semidet when X ; Y.

%  Sample implementation

succ(X, Y) :-
    (   integer(X) -> X >= 0, Y is X + 1
    ;   integer(Y) -> Y > 0, X is Y - 1
    ;   var(X), var(Y) -> instantiation fault
    ;   $error
    ).

plus/3 provides reversible arithmetic on exact numbers. It was intended to work with rational numbers as well as integers; since ISO Prolog omits rational numbers, it is here specified for integers only. It is essential to the purposes for which plus/3 was invented that it should be a proper logical predicate; the nature of floating point arithmetic means that it cannot behave as intended if it allows floating-point arguments.

:- pred plus(integer, integer, integer).
:- plus(X, Y, Z) when X, Y ; Y, Z ; Z, X.
:- plus(X, Y, S) is semidet when X, Y ; Y, Z ; Z, X.

%  Sample implementation

plus(X, Y, Z) :-
    (   integer(X), integer(Y) -> Z is X + Y
    ;   integer(Y), integer(Z) -> X is Z - Y
    ;   integer(Z), integer(X) -> Y is Z - X
    ;   $error
    ).

between/3 is astonishingly useful, not least because of its frequent use in bounding the search space of other predicates. It is difficult to take seriously any Prolog "standard" which omits it.

:- pred between(integer, integer, integer).
:- between(L, U, X) when L, U.
:- between(L, U, X) is semidet
   when ground(L), ground(U), ground(X).
:- between(L, U, X) is bounded
   when ground(L), ground(U).

%  Sample implementation.

between(L, U, X) :-
    (   integer(L), integer(U) ->
        (   integer(X) ->
            L =< X, X =< U
        ;   var(X) ->
            L =< U,
            $between(L, U, X)
	;   $error            
	)
    ;   integer(L), integer(X), L > X -> fail
    ;   integer(U), integer(X), X > U -> fail
    ;   $error
    ).

$between(L, L, L) :- !.
$between(L, _, L).
$between(L, U, X) :-
    M is L + 1,
    $between(M, U, X).

The order in which the sample implementation enumerates solutions for X is part of the specification.

There is of course a better alternative to between/3, and that is to extend the syntax of comparison.

Comparison form between/3 equivalent
E =< X =< F L is E, U is F, between(L, U, X)
E =< X < F L is E, U is F-1, between(L, U, X)
E < X < F L is E+1, U is F-1, between(L, U, X)
E < X =< F L is E+1, U is F, between(L, U, X)
E >= X >= F U is -F, L is -E, between(L, U, Z), X is -Z
E >= X > F U is 1-F, L is -E, between(L, U, Z), X is -Z
E > X > F U is 1-F, L is -1-E, between(L, U, Z), X is -Z
E > X >= F U is -F, L is -1-E, between(L, U, Z), X is -Z

This requires "distfix" operators, which are easy enough to add to Prolog syntax and have been so added in the past. These operators resemble the existing arithmetic comparison operators in evaluating their "outside" arguments; unlike the existing arithmetic comparison operators they do not evaluate their "inside" argument and only accept integers. This approach generalises neatly to multiple generated arguments, e.g., 1 =< X < Y =< N. However, between/3 is the simplest thing that could work and has been proven over many years, so let's stick with it.

Division

The problem with integer division is that there are several different versions, and the one you get is seldom the one you need. We should consider seriously whether Common Lisp analogues such as

floor(N, D, Q, R)
ceiling(N, D, Q, R)
round(N, D, Q, R)
truncate(N, D, Q, R)

predicates with

floor(N, D), floor_remainder(N, D)
ceiling(N, D), ceiling_remainder(N, D)
round(N, D), round_remainder(N, D)
truncate(N, D), truncate_remainder(N, D)

as arithmetic functions, might not be a good idea. Obviously, one pair of these arithmetic functions would duplicate the effect of the existing // and mod, but can you instantly recall, without looking it up, which?

I brought this up in 1984 and with the BSI and with the ISO committee, to a deafening silence. Surely I can't be the only person who is never quite sure what // is going to do with negative numbers?

7. Streams.

8. Not really the end.

This is a draft. More operations will be added, and community discussion may result in adding, changing, or removing some of the operations listed here.

There is nothing here which was not obvious at least two years before the Prolog standard came out. In particular, at a conference in 1985, I warned explicitly that a "minimal" standard was not at all a good idea, and by 1986 most of the predicates listed above could have been designed, had a thorough overhaul of the library been desired instead of minor revisions.

Comparison form	`between/3` equivalent
E =< X =< F	L is E,	U is F,	between(L, U, X)
E =< X < F	L is E,	U is F-1,	between(L, U, X)
E < X < F	L is E+1,	U is F-1,	between(L, U, X)
E < X =< F	L is E+1,	U is F,	between(L, U, X)
E >= X >= F	U is -F,	L is -E,	between(L, U, Z), X is -Z
E >= X > F	U is 1-F,	L is -E,	between(L, U, Z), X is -Z
E > X > F	U is 1-F,	L is -1-E,	between(L, U, Z), X is -Z
E > X >= F	U is -F,	L is -1-E,	between(L, U, Z), X is -Z

An Elementary Prolog Library

DRAFT 3

Section 0. Introduction

0.1. Notation.

1. Characters.

1.1. Face reality, adopt Unicode.

1.2. char_conversion must go!

1.3 Normalisation

1.4. Character classification.

is_unicode/2 and is_digit/2

Derived predicates

Locales

1.5. Case conversion

1.6. Text Element Boundaries

Section 2. Higher-order operations.

2.1. call/N

2.2. meta_predicate

2.3. Lambda

2.4. Recommendations

Section 3. Lists.

append_append_length/7

Derived predicates

reverse_append_length/4

Derived predicates

permutation_length

Derived predicates

pairs_keys_values/3

Derived predicates

Deletion

Derived predicates

Recommendation

Undone

Higher order list predicates

Recommendations

4. Sets

ord_op/5

Derived predicates

Higher-order operations on sets

5. Queues

Higher order operations on queues

6. Numbers.

Division

7. Streams.

8. Not really the end.

1.2. `char_conversion` must go!

2.1. `call/N`

2.2. `meta_predicate`

`append_append_length/7`

`permutation_length`

`pairs_keys_values/3`

`ord_op/5`