The ISO Prolog standard does not adequately support the needs of
Prolog users. This may seem unbelievable, and it is certainly inexcusable,
but the standard does not even define
append/3
, length/2
, or member/2
(although an example on page 33 shows clauses for
append/3
and an example on page 85 shows clauses for
member/2
).
Many areas are crying out for standardisation, including
This is an attempt to deal with basic data structures. Perhaps more important than the specific data structures and operations dealt with is the stress placed on consistent naming and argument order, and the use of projection as a means of ensuring that argument order is consistent.
I provide three kinds of declaration for the "base" predicates:
when
-declarations the most intuitive way to specify when
a predicate should make progress. (If a goal is suspended, when should
it be resumed? When the when
declaration is satisfied.)
Even in a non-coroutining Prolog
(like ISO Prolog) the information is still useful: it describes the
modes in which it makes sense to call the predicate.
mode
declarations. The "mode" det
would mean that such a
query has a single solution and cannot fail; semidet
means that there is at most one solution. I intend something
stronger here: when a call has the described form, the goal should
either fail in a finite amount of time or should terminate in a finite
amount of time with the Prolog system "aware" that no choices remain to
be explored. The bounded
"mode", not found in Mercury,
means that such a goal has a finite number of solutions (possibly zero)
and that such a goal must be able to report all solutions in a finite
amount of time and then fail in a finite amount of time without
infinite backtracking. The argument instantiation test
list_skel(Var)
is satisfied when Var
is a proper list; the elements of the list are not constrained.
The sample code uses a convention taken from DEC-10 Prolog via C Prolog: the dollar sign is treated as a lower case letter, and predicates whose names contain a dollar sign are "implementation detail" not visible to the user. Such predicates are only for specification purposes, and need not conform to quality requirements for visible predicates such as steadfastness.
Perhaps the most important notational issue is that
I give no indication in the running text of the module a type or
predicate is defined in. It is clear that the four "basic" list
processing predicates append/3
, length/2
,
member/2
, and memberchk/2
are so useful and so
used that they had better be treated as part of the language. It would
be quite appropriate if the others were all in a module named 'lists'.
See plface.pl for one distribution.
I assume the following predefined types:
:- type list(T) ---> [] | [T|list(T)]. :- type integer. :- type float. :- type number. :- type atom. :- type char = atom. % such that atom_codes(It, [_]) :- type chars = list(char). :- type code = integer. % such that between(0, 16'10FFFF, It). :- type codes = list(code).
Other types will be introduced where they are used.
With Prolog being used in e-business and stuff like that, we have to deal with HTML 4 (defined in terms of Unicode) and XML (defined in terms of Unicode). If we can't portably deal with Unicode, we can't portably deal with the Web. We can no longer afford to put up with Prolog systems that don't handle Unicode cleanly. Quintus Prolog handled 16-bit character sets via JIS and EUC nearly two decades ago. It was obvious then that the byte-at-a-time way that QP did so was a makeshift with a use-by date in the near future (now the rather less near past).
The ISO Prolog standard was published in 1995. In 1983, when I started work on an informal Prolog standard, it was obvious that 8-bit character sets were dead and Prolog would have to support 16-bit character sets. By 1993, it was obvious that there was a single "wide" character set to use in future language standards: ISO 10646/Unicode. Unicode 1.0.0 came out in October 1991. The reconciliation between ISO 10646 and Unicode resulted in Unicode 1.0.1, which came out in June 1992. This left the ISO Prolog committee with 3 years to reflect on the fact that there was now one ISO character set with round-trip conversion between the main national character set standards and itself.
This was certainly obvious to others. For example, the X/Open technical study E401, "Universal Multiple-Octet Coded Character Set coexistence and Migration" had been published the previous year. Of even greater relevance: a great deal of work had gone into updating Ada for what became Ada 95. By 1993, I already had a pile of Ada 95 background documentation and proposals in my office about 50cm high, and Unicode was clearly going to be the character basis of Ada 95 then.
The main reason why certain people at Quintus were not as hostile to the introduction of the single-character-atom design botch as they should have been was concern for EBCDIC on IBM mainframes. However, I have studied the current z/Architecture Principles of Operation, and note that current IBM mainframes have hardware support for Unicode>.
With this background, we can see that the ISO Prolog committee made some questionable decisions:
Ęneas
or of including my sister-in-law's
name "Chéri Ford
in a string literal, in marked
contrast to Ada, C99, and C++.
atom_chars/2
and
number_chars/2
, in the interests of solving a problem
which the adoption of Unicode as the character basis of Prolog would
have completely exorcised without the need for such wholesale breakage
of existing working code.
char_conversion
must go!"This is not negotiable", for the reason that no sane programmer who
is acquainted with the facts would wish to retain
char_conversion
.
That's strong language. Let's summarise what
char_conversion/2
and current_char_conversion/2
do.
char_conversion>('X', 'Y')
replaces ConvC with
µ(ConvC,x,y),
where x is the character code of X and
y is the character code of Y.
current_char_conversion(A, B)
is true when A and B are single-character atoms
and b=ConvC(a) where
a and b are the character codes of the first
characters of the names of A and B respectively,
provided (and this is the most confusing bit)
a and b are different.
Concerning the excessive strength of char_conversion/2
,
consider the goal char_conversion('.', ',')
. If you give
that as a query to the interactive top level, you haven't shot yourself
in the foot, but in the head. There is no way to recover from this.
In fact it's worse than that. Section 5.5.3 reads "A processor may
support some other initial value of ConvC ...", which
means that there is no portable Prolog syntax at all.
This implementation permission should have required all
Prolog processors to start with ConvC mapping the
characters used in the Core Prolog syntax to themselves. Indeed, it
might have been advisable to decree that char_conversion
should not be allowed to change the identity mappings for the core
syntax characters.
The :- char_conversion(From, To)
directive is unusable
in portable programs, even programs intended to be used within a single
locale using the same coded character set. The reason is that it is
explicitly not specified whether char_conversion
directives
in a source file are limited in effect (the way it was desired by many
that op
directives should be) or unlimited in effect (the
way op
directives were in DEC-10 Prolog and C Prolog).
More precisely, a program using this feature can be portable only if
every source file spells out the same set of (typically
190) mappings: the fact that it is not specified whether the effect is
limited or not means that a program cannot simply consult a single
mapping file into each source file.
Concerning the weakness of char_conversion/2
, we have
to start by looking at the rationale:
I take this to refer to the "full width" (zenkaku) and "half width" (hankaku) copies of ASCII present in some Japanese character sets (and therefore also present in Unicode).
The char_conversion/2
machinery does not satisfy the
explicit requirement in the rationale that it should apply to Prolog
data. For example, it does not apply to
atom_chars/2
, and there is no analogous built-in predicate
which does character conversion. "Words" read from a file are only
converted to a normal form by ConvC mapping if they
are read in Prolog syntax using one of the read
family.
There should surely have been
converted_atom_chars(Atom, Chars) :- ( atom(Atom) -> atom_chars(Atom, Name), $convert_chars(Name, Chars) ; var(Atom) -> $convert_chars(Chars, Name), atom_chars(Atom, Name) ; $error ). $convert_chars([], []). $convert_chars([Char|Chars], [Conv|Convs]) :- ( current_char_conversion(Char, Conv) -> true ; Conv = Char ), $convert_chars(Chars, Convs).
An even worse way in which this misfeature fails to satisfy its
explicit rationale is that character conversion does not apply to
number_chars/2
, and there is no analogous built-in
predicate which does character conversion. Numbers read from a file
are only converted to a normal form by ConvC mapping
if they are read in Prolog syntax using one of the read
family.
The failure to deal with run-time number conversion is a particular nuisance for people using Arabic or Indic scripts. As a preliminary sketch, it would be useful if something like the following predicates were adopted:
:- pred integer_codes(integer, codes, integer, code). integer_codes(Integer, Codes, Base, Zero) :- Delta is Zero - 0'0, ( integer(Integer) -> integer_codes(Integer, ASCII, Base), $map(ASCII, Codes, Delta, 0'., 0'E) ; var(Integer) -> $map(ASCII, Codes, Delta, 0'., 0'E), integer_codes(Integer, ASCII, Base) ; $error ). integer_codes(Integer, Codes, Base) :- between(Base, 2, 36), like number_codes/2 but only for integers and using the first Base digits from "0..9A..Z". integer_codes(Integer, Codes) :- integer_codes(Integer, Codes, 10). :- type float_format ---> e(integer) | e | f(integer) | f | g(integer) | g | k(integer) | k | p. :- pred float_codes(float, codes, float_format, code, code, code). float_codes(Float, Codes, Format, Zero, Decimal, Exponent) :- Delta is Zero - 0'0, ( float(Float) -> float_codes(Float, ASCII, Format), $map(ASCII, Codes, Delta, Decimal, Exponent) ; var(Float) -> $map(ASCII, Codes, Delta, Decimal, Exponent), float_codes(Float, ASCII, Format) ; $error float_codes(Float, Codes, Format, Zero, Decimal) :- float_codes(Float, Codes, Format, Zero, Decimal, 0'e). float_codes(Float, Codes, Format, Zero) :- float_codes(Float, Codes, Format, Zero, 0'., 0'e). float_codes(Float, Codes, Format) :- like number_codes/2 but only for floats, ( Format = e(P), integer(P), P >= 0 -> generate or recognise %.Pe format ; Format = e -> generate: use default P, recognise: accept any %e ; Format = f(P), integer(P), P >= 0 -> generate or recognise %.Pf format ; Format = f -> generate: use default P, recognise: accept any %f ; Format = g(P), integer(P), P >= 0 -> generate or recognise %.Pg format ; Format = g -> generate: use default P, recognise: accept any float ; Format = k(P), integer(P, P >= 0 -> like e(P), but with 1..3 digits before the decimal point so that the exponent is a multiple of 3; a very useful format when you have it ; Format = k -> generate: use default P, recognise: accept any P but still allow only results of k(_) formatting ; Format = p -> % precise generate: same as e(P) with P large enough that full precision is generated, enough digits for round trip conversion to be one-to-one ; $error ). $map([0'+|Xs], [0'+|Ys], Z, D, E) :- !, $map(Xs, Ys, Z, D, E). $map([0'-|Xs], [0'-|Ys], Z, D, E) :- !, $map(Xs, Ys, Z, D, E). $map([0'.|Xs], [ D |Ys], Z, D, E) :- !, $map(Xs, Ys, Z, D, E). $map([0'e|Xs], [ E |Ys], Z, D, E) :- !, $map(Xs, Ys, Z, D, E). $map([0'E|Xs], [ E |Ys], Z, D, E) :- !, $map(Xs, Ys, Z, D, E). $map([ N |Xs], [ M |Ys], Z, D, E) :- !, plus(Z, N, M), between(0'0, 0'9, N), $map(Xs, Ys, Z, D, E).With such predicates, we then define
:- pred number_codes(number, codes, code). number_codes(Number, Codes, Zero) :- ( integer(Number) -> integer_codes(Number, Codes, Zero) ; float(Number) -> float_codes(Number, Codes, p, Zero) ; var(Number) > ( integer_codes(Number, Codes, Zero) -> true ; float_codes(Number, Codes, p, Zero) ) ; $error ). number_codes(Number, Codes) :- number_codes(Number, Codes, 0'0).
At last, 10 years late, this would give people using Arabic or Indic scripts (amongst others) the ability to convert numbers using their own characters, and it would, 10 years late, give Europeans who like a rather ambiguous use of commas to convert floating-point numbers using their preferred decimal point character.
Is there any other way char_conversion/2
fails?
Yes, there is. I shall take personal examples. The only languages
that really matter to me personally (other than mattering because I
think other people should be able to use their own language and script)
are English, Latin, Greek, Hebrew, Serbo-Croatian, and Maori.
Writing in English, using ISO Latin 1, I would like to be able to
use my father's name, Ęneas. The letter ash is not one of the
normal Prolog characters, so I would have to map it to two
characters: A, e. But I can't do that. I cannot have the atom
hęme
mapped to haeme
either, for the
same reason. Since Latin1 includes the characters
"¼", "½", and "¾", which are part of the number
syntax of my native locale, I would like to use them in numbers, so
that ¼=0.25, 2½=2.5, and 10¾=10.75. This could
be done if I could map "¼" to ".25" and so on. But with
char_conversion/2
so limited, I can't. If
char_conversion/2
can't even handle ISO Latin 1, what
good is it?
I suppose I should have said "Croatian" rather than "Serbo-Croatian",
but there was a generation of New Zealanders who knew that their grandparents
were from Yugoslavia who had to look the natal village up on a map to figure
out which side they would have been on when Yugoslavia broke up. As it
turned out, my relatives were all on the side labelled "refugee", and we
have lost touch; the natal village is a heap of rubble. The NZ branch of
the family figured out which to call themselves by looking at letters my
grandfather's brother had written: they used the Latin script, not the
Cyrillic script. Except that Croatian doesn't quite use the
Latin script. There are four letters (DZ, DZ with caron, LJ, and NJ)
which look like two English letters but are regarded as single letters in
the script. It would probably be a good idea to map DZ to d,z;
Dz to D,z; and dz to d,z. But I can't do that with
char_conversion/2
.
New Zealand has two official languages, English and Maori. For historical reasons, there are three spelling conventions for Maori:
Accordingly, in processing Maaori text in which vowel length is
marked, the best scheme would be to map U+0100 (capital A with macron)
and U+00C4 (capital A with daieresis) to A,A, and so on for the other
9 upper and lower case vowels. But I can't do that with
char_conversion
.
Do I need to mention the Unicode compatibility mappings which map one character to several? Maybe I had better. Take the character U+00A0 "DIAERESIS", for example. Its compatibility mapping is "<compat> 0020 0308". A quick tally of the number of replacement characters in the Unicode 3.1.1 database yielded
Replacement | Number of |
---|---|
length | occurrences |
1 | 3005 |
2 | 1585 |
3 | 360 |
4 | 62 |
5 | 15 |
6 | 2 |
8 | 1 |
18 | 1 |
This is of course a lower bound; the replacement rules in that file have to be applied recursively, so the average replacement length is somewhat longer.
In conclusion, the entire char_conversion/2
framework
is too narrowly conceived, too little applied, and too dangerous in
effect.
In order to deal effectively with Unicode (and it was the plain responsibility of the ISO Prolog committee to address this), some other means entirely will have to be found.
Unicode Standard Annex #15 defines four normalisation forms. Canonical Equivalence in Applications defines two more. We also need a "verbatim" normalisation form which acts as the identity function.
Name | Code | Source | What it is |
---|---|---|---|
v | Here | No change | |
NFD | d | TR15 | Canonical decomposition |
NFC | c | TR15 | Canonical decomposition then canonical composition |
FCD | fcd | TN5 | "Fast C or D" decomposition |
FCC | fcd | TN5 | FCC form |
NFKD | kd | TR15 | Compatibility decomposition |
NFKC | kc | TR15 | Compatibility decomposition then canonical composition |
Canonical decomposition and (re)composition apply, in principle, to any character set containing both precomposed accented letters and floating diacriticals. Unicode was by no means the first such character set.
FCC and FCD forms are intermediate between NFC and NFD; they make sense whenever NFC and NFD make sense.
Compatibility decomposition applies, in principle, to any character
set containing multiple representations of the "same" character.
In particular, the Japanese zenkaku and hankaku are just such characters.
In fact, compatibility decomposition is pretty much exactly what
char_conversion/2
was supposed to be all about, except that
it is by no means true that all compatibility mappings in all such
character sets have single-character replacements.
Basically, to really do the job that code_conversion/2
was supposed to do but doesn't, we need Normalisation Form KD or
even better, Normalisation Form KC.
We need two modules, one for each representation of characters. (Unicode-hacking life would be so much easier if we only had to deal with integers in the range 0..16'10FFFF. For example, while '\xD800\xDC00' is, according to Unicode rules, the encoding of one character, it is not a 'char' according to ISO Prolog rules.)
The following type is shared by codes and chars:
:- type normalisation_form ---> v | c | d | fcc | fcd | kc | kd.
It is a matter for debate which normalisation forms should be supported by all Prolog systems. It is a practical fact that some normalisation forms should be supported by most Prolog systems. Whatever set of normalisation forms is eventually agreed on, implementors ought to be allowed to extend the set.
:- module(chars, [...]). :- pred normalise(normalisation_form, chars, chars, chars). :- normalise(NF, Xs, Ys0, Ys) when NF, Xs. :- normalise(NF, Xs, Ys0, Ys) is semidet when ground(NF), ground(Xs). normalise(NF, Xs, Ys0, Ys) :- Ys0\Ys is Xs in normalisation form NF. normalise(NF, Xs, Ys) :- normalise(NF, Xs, Ys, []). normalise(Xs, Ys) :- normalise(kd, Xs, Ys). normalised_compare(NF, R, Xs, Ys) :- normalise(NF, Xs, Xn), normalise(NF, Ys, Yn), compare(R, Xn, Yn). normalised_compare(R, Xs, Ys) :- normalised_compare(kd, R, Xs, Ys). normalised(NF, Xs) :- normalise(NF, Xs, Xs). normalised(Xs) :- normalise(Xs, Xs). ...
Please remember that these are specifications, not
implementations. It is possible to implement
normalised_compare/[3,4]
so that they work in a single
pass over their arguments without building any intermediate list.
Similarly, it is possible to implement
normalised/[1,2]
so that they work in a single pass
over their argument without building any intermediate list.
:- module(codes, [...]). :- pred normalise(normalisation_form, codes, codes, codes). :- normalise(NF, Xs, Ys0, Ys) when NF, Xs. :- normalise(NF, Xs, Ys0, Ys) is semidet when ground(NF), ground(Xs). normalise(NF, Xs, Ys0, Ys) :- Ys0\Ys is Xs in normalisation form NF. normalise(NF, Xs, Ys) :- normalise(NF, Xs, Ys, []). normalise(Xs, Ys) :- normalise(kd, Xs, Ys). normalised_compare(NF, R, Xs, Ys) :- normalise(NF, Xs, Xn), normalise(NF, Ys, Yn), compare(R, Xn, Yn). normalised_compare(R, Xs, Ys) :- normalised_compare(kd, R, Xs, Ys). normalised(NF, Xs) :- normalise(NF, Xs, Xs). normalised(Xs) :- normalise(Xs, Xs). ...
There needs to be a stream property normalisation_form(NF) which can be used to ensure that characters delivered to a Prolog application are already normalised, and that characters generated by a Prolog application are appropriately normalised before being delivered to their destination.
In the good old days, it was sufficient to have a library(ctypes) module. With ISO's thrice-accursed single-character-atom representation for characters, we need two modules, as noted in the previous section. So we might as well put the character classification predicates in 'chars' and 'codes'. The classification predicates in each module will have the same name and basically the same meaning, but different argument types.
The ISO Prolog standard uses the following character groups:
is_space/1
;
is_solo/1
;
is_quote/2
;
is_csym/1
;
is_digit/[1,2]
;
is_csymf/1
;
is_alpha/1
;
is_lower/1
;
is_upper/1
;
is_symbol/1
.
The term "graphic character" is very poorly chosen; one expects it
either to have something to do with box drawing or to be strongly related
to isgraph()
in the C standards. Instead it refers to what
was previously known as "symbol characters" or "operator characters".
That's a hint. We want to be able to do what C does, plus we want to be able to cover the categories used in Prolog itself.
Fortunately, there's a freely downloadable table classifying many tens of thousands of characters, actively maintained by someone else. Even more fortunately, the categories it uses can be mapped onto the categories we want quite easily. The table is part of the Unicode data base.
I propose a single base predicate, from which the others are easily derived.
:- type unicode_category ---> lu | ll | lt | lm | lo | mn | mc | me | nd | nl | no | pc | pd | ps | pe | pi | pf | po | sm | sc | sk | so | zs | zl | zp | cc | cf | cs | co | cn.
These codes are simply the Unicode category names converted to lower case for convenience in Prolog.
:- pred is_unicode(code, unicode_category). :- is_unicode(C, K) when C ; K. :- is_unicode(C, K) is semidet when ground(C). :- is_unicode(C, K) is bounded when true. is_unicode(C, K) :- between(0, 16'10FFFF, C), UnicodeData.txt says C has category K, K \== cn. % not assign :- pred is_digit(code, integer). :- is_digit(C, V) when C ; V. :- is_digit(C, V) is semidet when ground(C). :- is_digit(C, V) is bounded when true. is_digit(C, K) :- is_unicode(C, nd), UnicodeData.txt gives V as the decimal value of C.
The Unicode standard and associated reports explain implementation techniques for this. Even with just one byte per character, a simple implementation would take 1.1 megabytes. Using multilevel tables can compress this dramatically, not least because only 3 of the 17 planes currently have any characters at all.
is_unicode(C) :- is_unicode(C, _). is_latin1(C, K) :- between(0, 16'FF, C), is_unicode(C, K). is_latin1(C, K) :- is_latin1(C, _). is_ascii(C, K) :- between(0, 16'7F, C), is_unicode(C, K). is_ascii(C) :- is_ascii(C, _). is_alnum(C) :- is_unicode(C, K), memberchk(K, [nd,lu,ll,lt,lm,lo]). is_alpha(C) :- is_unicode(C, K), memberchk(K, [lu,ll,lt,lm,lo]). is_blank(C) :- % known as is_white/1 in Quintus Prolog is_unicode(C, zs). is_bracket(L, R) :- is_unicode(L, ps), is_unicode(R, pe), R is the closing bracket corresponding to L. is_cntrl(C) :- % C0 and C1 controls + formats is_unicode(C, K), memberchk(C, [cc,cf]). is_csym(C) :- is_unicode(C, K), memberchk(C, [pc,nd,lu,ll,lt,lm,lo]). is_csymf(C) :- is_unicode(C, K), memberchk(C, [pc,lu,ll,lt,lm,lo]). is_digit(C) :- is_unicode(C, nd). is_endfile(-1). is_endline(C) :- is_unicode(C, K), memberchk(K, [zl,zp]). is_graph(C) :- is_unicode(C, K), nonmember(K, [zs,zl,zp,cc,cf]). is_layout(C) :- is_lower(C) :- is_unicode(C, ll). is_print(C) :- is_unicode(C, K), nonmember(K, [zl,zp,cc,cf]). is_punct(C) :- is_unicode(C, K), memberchk(K, [pc,pd,ps,pe,pi,pf,po,sm,sc,sk,so]). is_title(C) :- is_unicode(C, lt). is_quote(0'', 0''). is_quote(0'", 0'"). is_quote(0'`, 0'`). is_quote(L, R) :- is_unicode(L, pi), is_unicode(R, pf), R is the final quote corresponding to L. is_space(C) :- is_unicode(C, K), memberchk(K, [zs,zl,zp]). is_upper(C) :- is_unicode(C, lu). is_xdigit(Base, C, V) :- between(2, 36, Base), Base1 is Base - 1, between(0, Base1, V), ( V >= 10 -> ( C is "A" - 10 + V ; C is "a" - 10 + V ) ; is_digit(C, V) ). is_xdigit(C, V) :- is_xdigit(16, C, V). is_xdigit(C) :- is_xdigit(16, C, _).
There are two debatable decisions here. In this draft I have decided
that the "Other, Format" characters of Unicode should be grouped with the
"Other, Control" characters for is_cntrl/1
, on the grounds
that these things fairly obviously are control characters and
that if you just want the C0 and C1 controls it is easy enough to test
for 'cc' yourself. Indeed, is_ascii(C, cc)
will check
for the ASCII (C0) controls. I have also decided that the "csymf"
characters should include all the "Punctuation, Connector" characters,
not just the LOW LINE "_".
i have read through rather more material about wide character classification in C than was enjoyable, and found stunningly little guidance about how non-ASCII characters should be classified, so I must emphasise that the definitions above are a draft.
The is_bracket/2
and is_quote/2
predicates
do not correspond to anything in C, but I found them very useful in
Quintus Prolog and equivalents very useful in a text editor.
Unicode 3.1.1 adds only single quotes, double quotes, single guillemets,
double guillemets, and single and double high reversed-9 quotes to the
Pi category, so is_quote/2
isn't much of a burden. For the
brackets we'd need a somewhat larger table, but there are plenty of
nice looking brackets in Unicode it would be nice to recognise.
The "solo" characters in ISO Prolog are "!,;|%()[]{}". What is the appropriate generalisation for Unicode?
In ASCII, "()[]{}" is the full set of brackets. So Unicode solo characters should include all Unicode brackets. The other characters are all in the Po category, but there are many characters in the Po category which should be symbol characters, not solo characters. The simplest rule seems to be that a non-bracket character should count as a solo character if and only if it is in the Po category and is a script or compatibility variant of one of the existing non-bracket solo characters. In Unicode 3.1.1, that's just the following 24 characters:
It is possible to recognise this set of characters by pattern matching (carefully!) on the names, but since I do not propose access to the names, this has to be expressed by means of a table.
is_solo(C) :- is_unicode(C, K), memberchk(K, [po,ps,pe]), ( K = po -> memberchk(C, [16'0021,16'0025,16'002C,16'003B,16'055C,16'055D, 16'060C,16'061B,16'066A,16'1363,16'1364,16'1802, 16'1808,16'3001,16'FE50,16'FE51,16'FE54,16'FE57, 16'FE6A,16'FF01,16'FF05,16'FF0C,16'FF1B,16'FF64]) ; true ).
The so-called "meta" characters in ISO Prolog are the ASCII umlaut, acute accent, grave accent, and reverse solidus. This is a rather puzzling grouping. Three of them are used for quoting string-like tokens, one of them isn't. Unlike Lisp, the reverse solidus has no "meta" semantics in Prolog except as part of string-like tokens; when it occurs unquoted it is just a plain symbol character.
Since we already have is_quote/2
, which covers the
characters that begin string-like tokens, I do not think that an
*is_meta/1
character classification predicate is appropriate.
The confusingly named "graphic" characters of ISO Prolog are "#$&*+-./:<=>?@^~". This is especially confusing, because what we want to know is not "which characters stand for themselves in quoted tokens" (the answer is: anything except the corresponding closing quote, the reverse solidus, or a Zl or Zp character), but "which characters can be used in unquoted quoted literals that look vaguely like mathematical operators. That is, we need a character class that includes these characters and also includes the reverse solidus, for the sake of ISO Prolog atoms such as \=, \==, =\=, \+, /\, \/, and \ . This is in fact the set "graphic token char" of section 6.4.2.
If we look these characters up in the Unicode data base, we find that they fall into four groups:
These groupings actually fit our intuitions about which characters should be allowed in unquoted atoms very well; when extended to Latin 1 they yield exactly the set of characters my Prolog-in-C parser has accepted since Latin 1 came out. The only problem is that some Po characters are not symbol characters, but we already have a recogniser for those, so:
is_symbol(C) :- is_unicode(C, K), memberchk(K, [pd,po,sc,sm]), \+ is_solo(C).
If Prolog uses Unicode internally, then we don't have to worry much about locales for character classification, so I'll leave this section empty for a while.
In ASCII, every letter was either a lower case letter with a unique context-independent upper case letter or an upper case letter with a unique context-independent lower case letter, and this was the only variation between letter forms.
In the ISO 8859 family of 8-bit character sets, there could be letters which were neither upper nor lower case, and there could be letters which did belong to a case but did not have an opposite case equivalent (there are two of those in Latin-1), and there could be letters with positional variants (there are five of those in 8859-8). Worse still, there could be letters whose opposite case form was not the same length (ß).
Of course it's well known that Greek has two lower case forms for the letter "s" both corresponding to the same upper case form.
Interestingly enough, the Unicode 3.1.1 database defines case mappings, and its case mappings are one-to-one. the letter "ß" is handled by not mapping it to anything else at all, and the Greek letters are handled by mapping the capital "S" to the medial (not the final) letter "s", regardless of the context.
The Unicode standard points out that case mapping is locale-sensitive, and gives the example of Turkish.
Accordingly, we do need some kind of locale objects. This is a very preliminary draft here!
:- type character_locale. % not specified :- pred character_locale(atom, character_locale). :- character_locale(Name, Locale) when Name. :- character_locale(Name, Locale) is semidet when ground(Name). character_locale(Name, Locale) :- Name is the name of a locale and Locale is a private representation of its character facet.
One special locale object is 'unicode', which represents the Unicode case conversion tables. We might also need 'posix' and/or 'C' locales for this. A locale name like 'en_NZ' might or might not be acceptable as a locale object at the implementor's choice.
We get two sets of case conversion predicates. None of them takes a single character as argument or delivers a single character as result. We have, in general, no right to expect that lc(x++y)=lc(x)++lc(y).
:- pred to_lower(character_locale, codes, codes, codes). :- to_lower(L, Xs, Ys0, Ys) when L, Xs. :- to_lower(L, Xs, Ys0, Ys) is semidet when ground(L), ground(Xs). locale_to_lower(Locale, Xs, Ys0, Ys) :- Locale is a character locale and Xs is a string, Ys0\Ys is Xs converted to lower case according to Locale. locale_to_lower(Locale, Xs, Ys) :- to_lower(Locale, Xs, Ys, []). to_lower(Xs, Ys0, Ys) :- locale_to_lower(unicode, Xs, Ys0, Ys). to_lower(Xs, Ys) :- to_lower(Xs, Ys, []). :- pred to_title(character_locale, codes, codes, codes). :- to_title(L, Xs, Ys0, Ys) when L, Xs. :- to_title(L, Xs, Ys0, Ys) is semidet when ground(L), ground(Xs). locale_to_title(Locale, Xs, Ys0, Ys) :- Locale is a character locale and Xs is a string, Ys0\Ys is Xs converted to title case according to Locale. locale_to_title(Locale, Xs, Ys) :- locale_to_title(Locale, Xs, Ys, []). to_title(Xs, Ys0, Ys) :- locale_to_title(unicode, Xs, Ys0, Ys). to_title(Xs, Ys) :- to_title(Xs, Ys, []). :- pred to_upper(character_locale, codes, codes, codes). :- to_upper(L, Xs, Ys0, Ys) when L, Xs. :- to_upper(L, Xs, Ys0, Ys) is semidet when ground(L), ground(Xs). locale_to_upper(Locale, Xs, Ys0, Ys) :- Locale is a character locale and Xs is a string, Ys0\Ys is Xs converted to upper case according to Locale. locale_to_upper(Locale, Xs, Ys) :- locale_to_upper(Locale, Xs, Ys, []). to_upper(Xs, Ys0, Ys) :- locale_to_upper(unicode, Xs, Ys0, Ys). to_upper(Xs, Ys) :- to_upper(Xs, Ys, []).
The title of this section refers to the corresponding section of the Unicode Standard, where rules are given for finding
In Latin1, there is a one to one correspondence between the thing
you might call a `char' in a program and the thing a normal person
looking at a printed text might call a `character'. The imaginary
rock band ``Death-Töngue'' (from ``Bloom County'') clearly has
12 whatsits in its name on either construal. If we step through the
Prolog list (of Latin1 codes) "Death-Töngue"
one
element at a time, we shall visit one written character at a time.
This neat equivalence did not hold in ASCII, although that is not
widely known. In ASCII, it was legitimate to encode
<ö
> as <o,BS,"
> or
<",BS,O
>. This accounts for some of the compromise
character shapes: " has to serve for opening double quote, closing
double quote, and diaresis, ' has to serve for apostrophe, right quote,
and acute accent, , has to serve for comma and cedilla, and so on.
The result is that except for US English, the relationship between a
sequence of characters and a sequence of chars was neither one to one
nor unique.
Unicode brings back the complexities of ASCII, doubled and redoubled. There is in principle no upper bound to the number of diacritical characters which may be attached to a base character, hence no upper bound on the number of Unicode code elements which may be required to represent a single scripteme.
I propose the following predicates for inclusion in both the 'chars' module and the 'codes' module. I illustrate only the 'codes' versions, but the 'chars' versions are obvious.
:- pred next_character(normalisation_form, codes, codes, codes). :- next_character(NF, Xs, Ys0, Ys) when NF, Ys0. :- next_character(NF, Xs, Ys0, Ys) is semidet when ground(NF), ground(Xs), ground(Ys0). :- next_character(NF, Xs, Ys0, Ys) is semidet when NF==v, ground(Ys0). :- next_character(NF, Xs, Ys0, Ys) is bounded when ground(Ys0). next_character(NF, Xs, Ys0, Ys) :- append(Zs, Ys, Ys0), Zs \== [], normalised_compare(NF, =, Xs, Zs), Zs is the longest prefix of Ys0 not containing a character boundary. next_character(Xs, Ys0, Ys) :- next_character(v, Xs, Ys0, Ys). next_character(Xs, Ys0) :- next_character(Xs, Ys0, Ys). :- pred next_word(normalisation_form, codes, codes, codes). :- next_word(NF, Xs, Ys0, Ys) when NF, Ys0. :- next_word(NF, Xs, Ys0, Ys) is semidet when ground(NF), ground(Xs), ground(Ys0). :- next_word(NF, Xs, Ys0, Ys) is semidet when NF==v, ground(Ys0). :- next_word(NF, Xs, Ys0, Ys) is bounded when ground(Ys0). next_word(NF, Xs, Ys0, Ys) :- append(Zs, Ys, Ys0), Zs \== [], normalised_compare(NF, =, Xs, Zs), Zs is the longest prefix of Ys0 not containing a word boundary. next_word(Xs, Ys0, Ys) :- next_word(v, Xs, Ys0, Ys). next_word(Xs, Ys0) :- next_word(Xs, Ys0, Ys). :- pred next_sentence(normalisation_form, codes, codes, codes). :- next_sentence(NF, Xs, Ys0, Ys) when NF, Ys0. :- next_sentence(NF, Xs, Ys0, Ys) is semidet when ground(NF), ground(Xs), ground(Ys0). :- next_sentence(NF, Xs, Ys0, Ys) is semidet when NF==v, ground(Ys0). :- next_sentence(NF, Xs, Ys0, Ys) is bounded when ground(Ys0). next_sentence(NF, Xs, Ys0, Ys) :- append(Zs, Ys, Ys0), Zs \== [], normalised_compare(NF, =, Xs, Zs), Zs is the longest prefix of Ys0 not containing a sentence boundary. next_sentence(Xs, Ys0, Ys) :- next_sentence(v, Xs, Ys0, Ys). next_sentence(Xs, Ys0) :- next_sentence(Xs, Ys0, Ys). :- pred next_line(normalisation_form, codes, codes, codes). :- next_line(NF, Xs, Ys0, Ys) when NF, Ys0. :- next_line(NF, Xs, Ys0, Ys) is semidet when ground(NF), ground(Xs), ground(Ys0). :- next_line(NF, Xs, Ys0, Ys) is semidet when NF==v, ground(Ys0). :- next_line(NF, Xs, Ys0, Ys) is bounded when ground(Ys0). next_line(NF, Xs, Ys0, Ys) :- append(Zs, Ys, Ys0), Zs \== [], normalised_compare(NF, =, Xs, Zs), Zs is the longest prefix of Ys0 not containing a line boundary. next_line(Xs, Ys0, Ys) :- next_line(v, Xs, Ys0, Ys). next_line(Xs, Ys0) :- next_line(Xs, Ys0, Ys).
call/
NThe ISO Prolog standard defines call/1
in section 7.8.3.
Looking at the definition is like a slap in the face; the committee
rejected my denotational semantics for Prolog on the grounds that it
was written in Pascal, then when I pointed out that on the contrary,
it was written in a pure functional language that wasn't anything
like Pascal, it was rejected on the grounds that it was too
"implementation-oriented". If that doesn't describe the definition
of call/1
, I don't know what does.
If call/1
was not in the language, we could define it
thus:
:- dynamic $call/1. call(Goal) :- asserta(( $call(Goal) :- retract(($call(_) :- _)), !, Goal )), $call(Goal).
In the Prolog type checking paper that Alan Mycroft and I wrote,
we proposed the call/
N family, in order to
permit type-checking Prolog higher-order code. Since then, some
Prologs have implemented this feature in the library (notably DEC-10
Prolog, C Prolog, Quintus Prolog) with an upper bound on N,
and some Prologs have built it into their language (SWI Prolog and,
I believe, NU Prolog).
Roughly speaking, one expects
call(p(X1,X2,X3)) :- p(X1, X2, X3). call(p(X1,X2), X3) :- p(X1, X2, X3). call(p(X1), X2, X3) :- p(X1, X2, X3). call(p, X1, X2, X3) :- p(X1, X2, X3).
Indeed, one way to implement the call/
N
family is to start with a skeleton
call(P, ...) :- \+ callable(P), !, error(uncallable term). [special purpose code for control structures if N <= 2] [gap] call(P, ...) :- error(undefined predicate).
and then whenever a predicate p/N is defined,
automatically add clauses to call/1, ..., call/N+1
in the appropriate gaps. In a WAM-based system, this can be
done by exploiting switch-on-term without any choice-points
being created, so call/N in the normal case is just
a hash table look-up, a bit of argument shuffling, and a jump.
The space requirement is O(P.A2), where P is the
number of predicates and A is the arity. Since arities are
usually fairly low, this is in effect linear in the size of the
program. I have tried this, and found it to be a pleasantly
efficient (as well as obvious) way to implement call/
N.
There are other ways to implement call/
N which are
nearly as efficient, and have only O(P) space overhead, the hash table needed
to map from a predicate's functor to its address.
This family of predicates has proven its usefulness. It's time they were in the community standard.
:- pred call(void(T1), T1). :- pred call(void(T1,T2), T1, T2). :- pred call(void(T1,T2,T3), T1, T2, T3). :- pred call(void(T1,T2,T3,T4), T1, T2, T3, T4). :- pred call(void(T1,T2,T3,T4,T5), T1, T2, T3, T4, T5). :- pred call(void(T1,T2,T3,T4,T5,T6), T1, T2, T3, T4, T5, T6). % Sample implementation. % Beware: this is highly inefficient. Don't do it this way. call(P, Y1) :- P =.. L0, append(L0, [Y1], L1), Q =.. L1, call(Q). call(P, Y1, Y2) :- P =.. L0, append(L0, [Y1,Y2], L2), Q =.. L2, call(Q). call(P, Y1, Y2, Y3) :- P =.. L0, append(L0, [Y1,Y2,Y3], L3), Q =.. L3, call(Q). call(P, Y1, Y2, Y3, Y4) :- P =.. L0, append(L0, [Y1,Y2,Y3,Y4], L4), Q =.. L4, call(Q). call(P, Y1, Y2, Y3, Y4, Y5) :- P =.. L0, append(L0, [Y1,Y2,Y3,Y4,Y5], L5), Q =.. L5, call(Q). call(P, Y1, Y2, Y3, Y4, Y5, Y6) :- P =.. L0, append(L0, [Y1,Y2,Y3,Y4,Y5,Y6], L6), Q =.. L6, call(Q).
meta_predicate
As soon as you have call/1
, you have a problem.
Cross-referencers don't work. You can fix this for the specific
cases of call/1
, findall/3
,
bagof/3
, setof/3
, and a few others,
but as soon as a Prolog programmer writes
gcc(Goal) :- \+ \+ Goal. % garbage-collecting call
the cross-referencer stops working. The C call graph tool on my workstation is similarly limited; if you have a function which is passed to another function and not called directly, it thinks that function is never called. That's not what I call a usable call graph tool; it is worse than having no call graph at all.
Characteristically, the ISO committee missed the point.
They thought that meta_predicate
has something to do with modules. It does, but only accidentally.
The real function of meta_predicate
declarations is to
provide just enough type information for a cross-referencer to work.
The idea of meta_predicate
declarations was to start with
existing mode declarations (so the arguments would be +, -, or ?) and
change some of the arguments to something else to say "this argument is
called".
Had call/1
been the only game in town, something like
'call' would have been adequate, so we'd have had
:- meta_predicate findall(?, call, -), % not final version gcc(call). % not final version
But the call/
N family came into Prolog about
the time the DEC-10 cross-referencer was written, and replaced prior
code that did things like
all(P, [], []). all(P, [X|Xs], [Y|Ys]) :- apply(P, [X,Y]), all(P, Xs, Ys).
Very often, when an argument is passed to a meta-predicate, it is a closure, not a goal; the callee will supply more arguments (at the right) before calling it. This is information that a cross-referencer needs.
So, quite independently of any module system, we need meta-predicate declarations of this form:
meta_predicate_directive --> [(:-),(meta_predicate)], meta_predicate_declarator, rest_meta_predicate_declarators. rest_meta_predicate_declarators --> ['.']. rest_meta_predicate_declarators --> [','], meta_predicate_declarator, rest_meta_predicate_declarators. meta_predicate_declarator --> atom(_). meta_predicate_declarator --> atom(_), ['('], meta_predicate_argument, rest_meta_predicate_arguments. rest_meta_predicate_arguments --> [')']. rest_meta_predicate_arguments --> [','], meta_predicate_argument, rest_meta_predicate_arguments. meta_predicate_argument --> [+]. meta_predicate_argument --> [-]. meta_predicate_argument --> [?]. meta_predicate_argument --> integer(N), {N >= 0}.
The mode alternatives, + - ?, are ignored by the system, but may be processed by a mode inference program like Chris Mellish's, and indicate the intended use of the corresponding arguments. As a mode, the meta alternative N has the same force as +. But N tells the cross-referencer that the corresponding argument is to be a goal that is missing its rightmost N arguments.
With the aid of this declaration, we can write
:- meta_predicate findall(?, 0, ?), gcc(0), all(2, ?, ?). all(_, [], []). all(P, [X|Xs], [Y|Ys]) :- call(P, X, Y), all(P, Xs, Ys).
The ISO committee not only got hold of the wrong end of the stick, they broke it, in these ways:
metapredicate
,
which is downright ugly.
The result is that the ISO metapredicate
directives
are useless for cross-referencers. They are best regarded as obsolete.
But what if you don't care about cross-referencing? I guess you've
never used Masterscope for Lisp, or cscope(1) for C, or the Smalltalk
IDE, or ... if you can ask such a question. Come to think of it, if
you haven't used a reasonably reliable cross-referencer for Prolog,
you don't realise what a useful debugging tool it can be. (I leave you
to make your own inferences about the ISO Prolog committtee.) But let's
suppose you don't care about cross-referencing, and you
don't care about the added documentation value from stating
the number of missing arguments. Is there any reason why you
should care about meta_predicate
?
Yes. From its inception (which I can speak with authority about,
because I invented it), the intention was that the
meta_predicate
directive would licence a nonstandard
translation of meta-calls. Take all/3
for example:
all(append(Front), Xs, Ys)
can be translated as
all(<{address of append/3}, Front>, Xs, Ys)
where <Address,Arg1,...> is a special kind of
term that looks like a compound term, but has an address where the
functor should be. This permits an
implementation of call/3
which spreads the arguments
and jumps straight to the given address. This eliminates the space
overhead for meta-calls (there is no longer a need for a table of
the addresses of all predicates; the compiler has arranged for the
addresses of just the meta-called predicates to be available directly).
All right, all right; so that would also make cross-module meta-calls as efficient as any other meta-calls.
There is another thing that meta_predicate
does,
and it is related to the possibility of unusual implementations
of meta-calls, but it has a rather more profound theoretical reason.
If T1 and T2 are two terms representing goals, then T1 = T2 is,
strictly speaking, a second-order unification (or worse). But it
isn't implemented that way. In order to avoid the obvious soundness
and completeness problems, we want to check at compile time that
two meta-terms are unified if and only if at least one of them is
a variable. This means that a meta-argument of a predicate
should only be a variable in the head, and that it should only be
unified with another variable, and that it should only be passed to
a meta-argument of some predicate. This is a type checking problem,
and it is again a very useful debugging tool. But you can't even
get started on it without meta_predicate
; the ISO
metapredicate
abortion is quite useless for the job.
So even if we don't include modules in the community standard,
and even if we don't include call/
N, we
still need either meta_predicate
or a richer
type declaration/checking system.
We don't strictly speaking need lambda-expressions in Prolog; ordinary closures suffice quite often, and when they don't we can always write an extra predicate. But then, lambda-lifting shows that you don't strictly speaking need lambda-expressions in functional languages either; you can always move the thing out to top level, give it a name, and partially apply it at the original place of use.
Suppose we had a form something like
all((\(X, Y) :- Body), Xs, Ys)
A compiler could implement this by finding the intersection of (free variables of Body \ free variables of \(X,Y)) with the free variables of the rest of the clause that contains it, and generating
p(FreeVars, X, Y) :- Body. ... all(p(FreeVars), Xs, Ys)
The interesting point is that it isn't possible to do this at run time. If we simply implemented
call((\(X1,X2) :- Body), X1, X2) :- call(Body).
then the first call to the lambda-expression would "use it up".
We couldn't use this in a call to all/3
, unless all of the
elements of Xs were equal and so were all of the elements of Ys.
If on the other hand we implemented
call((\(X1,X2):-Body), Y1, Y2) :- copy_term(foo(X1,X2,Body), foo(Y1,Y2,Goal)), call(Goal).
then
common_prefix(Front, Xs, Ys) :- all((\(X, Y) :- append(Front, Xs, Ys))).
wouldn't work, because Front would be copied, and it shouldn't be.
With compiler assistance (as in Mercury), lambda-expressions are fine. Without compiler assistance, lambda-expressions either don't work or require more, and more error-prone, annotation than I think the average programmer would be happy with. I do not want to recommend anything which requires compiler support or even that there be a compiler.
meta_predicate
directives with 0
as a meta-argument annotation distinct from ':'. This is of great
practical use. A non-module compiler can simply ignore such directives.
A module compiler that cannot exploit the extra information can simply
treat them as alternative syntax for metapredicate
.
call/
N
family; they are trivial to implement (I do not say trivial to implement
efficiently, but even that is not particularly hard) and very
useful.
We can derive many of the operations in the Quintus Prolog library (which is a revision and extension of the DEC-10 Prolog library) by projection from a small number of predicates. These "base" predicates are specified in some detail, the specifications of the "derived" predicates are then obvious consequences. I do not claim that the "base" predicates are "basic" in any sense, only that they are "universal relations" from which a range of more useful "views" can be projected.
"Base" predicates have been chosen solely for convenience in
specification. I do not say that they are "basic" in any
way. They are generally harder to understand than some of their
derived predicates. They are not, as a rule, a good implementation
basis. (For example, it would be very silly to implement
append/3
the way it is specified here.) What makes a
predicate a "base" predicate is that it makes a good base for
specification, so that details of types and so on have to be
repeated as little as possible.
The notion of "projection" is closely related to the notion of "bootstrapping" (did Mrs Malaprop choose that word?) in the ISO Prolog standard. Many (but by no means all) of the "bootstrapped" predicates in the ISO Prolog standard are projections. Predicate P is a projection of predicate Q if and only if P is not only a special case of Q, but has its arguments in the same order as Q. The idea is that the strong semantic link between P and Q should make it easy for a programmer to memorise the argument order of Q and thereafter be able to correctly predict the argument order of P without any additional memory burden.
append_append_length/7
If any predicates have universal acceptance in Prolog, they must
be append/3
, member/2
, and length/2
.
They are all projections of a single predicate.
:- pred append_append_length( list(X), list(X), list(X), list(Y), list(Y), list(Y), integer). :- append_append_length(A, B, AB, C, D, CD, N) when A ; AB ; C ; CD ; N. :- append_append_length(A, B, AB, C, D, CD, N) is semidet when list_skel(A) ; list_skel(C) ; nonvar(N). :- append_append_length(A, B, AB, C, D, CD, N) is bounded when list_skel(AB) ; list_skel(CD). % Sample implementation. append_append_length(A, B, AB, C, D, CD, N) :- ( var(N) -> $append_append_length(A, B, AB, C, D, CD, 0, N) ; integer(N) -> N >= 0, $append_append_length(A, B, AB, C, D, CD, N) ; $error ). $append_append_length([], B, B, [], D, D, N, N). $append_append_length([X|A], B, [X|AB], [Y|C], D, [Y|CD], N0, N) :- N1 is 1 + N0, $append_append_length(A, B, AB, C, D, CD, N1, N). $append_append_length(A, B, AB, C, D, CD, N0) :- ( N0 =:= 0 -> A = [], B = AB, C = [], D = CD ; N1 is N0 - 1, A = [X|A1], AB = [X|AB1], C = [Y|C1], CD = [Y|CD1], $append_append_length(A1, B, AB1, C1, D, CD1, N1) ).
Semantics: append_append_length(A, B, AB, C, D, CD, N)
is true when length(A, N)
and length(C, N)
and
append(A, B, AB)
and append(C, D, CD)
. In
Mercury, where the compiler reorders code according to data flow, we could
write
append_append_length(A, B, AB, C, D, CD, N) :- length(A, N), length(C, N), append(A, B, AB), append(C, D, CD).
In ISO Prolog, however, this implementation is quite unsatisfactory.
?- append_append_length(A, "yz", "xyz", [X], R, "abc", N).
will find the (unique) solution, but if that is rejected, will then
backtrack forever, trying ever longer bindings for A
.
There is, in fact, no fixed ordering of these goals which is satisfactory for all queries, where "satisfactory" is defined as finding all solutions and terminating in a finite amount of time, for a query having a finite number of solutions.
append_append(A, B, AB, C, D, CD) :- append_append_length(A, B, AB, C, D, CD, _). append_length(A, B, AB, D, AD, N) :- append_append_length(A, B, AB, A, D, AD, N). append(A, B, AB, D, AD) :- append_length(A, B, AB, D, AD, _). append_length(A, B, AB, N) :- append_length(A, B, AB, A, B, AB, N). append(A, B, AB) :- append_length(A, B, AB, _). same_length(A, C, N) :- append(A, [], A, C, [], C, N). same_length(A, C) :- same_length(A, C, _). one_longer(A, C) :- same_length(A, [_|C]). shorter_list(A, C) :- append([_|A], [], [_|A], _, _, C) cons(H, T, L) :- append([H], T, L).
The following predicates come from library(list_parts)
in the Quintus library. However, in order that the argument order of
these predicates should be consistent with the other list processing
predicates, it is necessary to switch the argument order from
Whole,Part to Part,Whole, because that's the way append/3
is.
head(H, L) :- cons(H, _, L). tail(T, L) :- cons(_, T, L). prefix(P, L) :- append(P, _, L). proper_prefix(P, L) :- append(P, [_|_], L). suffix(S, L) :- append(_, S, L). proper_suffix(S, L) :- append([_|_], S, L).
The next group of predicates are already fully consistent with
append_append_length/7
.
length(X, N) :- append_length(A, [], A, N). member(X, L) :- append(_, [X|_], L). memberchk(X, L) :- member(X, L), !. :- nonmember(X, L) when ground(X), ground(L). :- nonmember(X, L) is semidet when ground(X), ground(L). nonmember(X, L) :- \+ member(X, L). select(X, L, R) :- append(_, [X|T], L, T, R). selectchk(X, L, R) :- select(X, L, R), !. select(X, Xs, Y, Ys) :- append(_, [X|Xs], L, [Y|Ys], R). selectchk(X, L, Y, R) :- select(X, L, Y, R), !. last(ButLast, X, L) :- append(ButLast, [X], L). last(X, L) :- last(_, X, L). nextto(X, Y, L) :- append(_, [X,Y|_], L).
This predicate corresponds to the old correspond/4
predicate, but has a consistent argument order:
member_member(X, Xs, Y, Ys) :- append(_, [X|_], Xs, _, [Y|_], Ys).
One operation which is commonly asked for is finding the
Nth element of a list. Here, unfortunately, we have
two clashing archetypes: arg/3
and the
append_append_length/7
family. The Quintus library
assimilated "Nth element" to arg/3
,
with predicates
nth0(N, List, Member) nth0(N, List, Member, Residue) nth1(N, List, Member) nth1(N, List, Member, Residue)
The clash with member(Member, List)
and
select(Member, List, Residue)
is unfortunate.
The fact that 0 and 1 have other meanings in the Quintus library
(and other meanings in connection with lists) is also unfortunate.
I am driven to the conclusion that the analogy with
arg(N, Term, Arg)
is bogus. In particular,
arg/3
only works one way around: it does not
solve for N
(although at the price of notably worse
generated code, it could) and it cannot solve for
Term
. This means that its first two arguments are,
on any understanding of the term, "inputs", and by the
inputs-before-outputs principle, should come first.
The "Nth element" predicates do not share
arg/3
's input/output restrictions. Indeed, they
are more likely to be used to find N for a known
Member than the other way around; it is tolerable
for linear search to take linear time, but not for indexing.
I use the name component "_offset" to refer to a number of elements to skip. You can think of this as 0-origin indexing if you like, but it is actually more fruitful to think of it quite literally as the length of some list segment. Indeed, "length" could be used to be consistent with the other "_length" predicates, except that in this context it might be confusing.
I use the name component "_index" to refer to a position in a list. Like argument positions in a term, member positions in a list always start with 1.
The Nth element predicates are
member_member_offset(X, Xs, Y, Ys, Offset) :- append_length(_, [X|_], Xs, _, [Y|_], Offset). member_member_index(X, Xs, Y, Ys, Index) :- member_member_index(X, [X|Xs], Y, [Y|Ys], Index), Index =\= 0. member_offset(Member, List, Offset) :- append_length(_, [Member|_], List, Offset). member_index(Member, List, Index) :- append_length(_, [Member|_], [Member|List], Index), Index =\= 0. select_offset(X, Xs, Y, Ys, Offset) :- append_length(_, [X|T], Xs, [Y|T], Ys, Offset). select_index(X, Xs, Y, Ys, Index) :- select_offset(X, [X|Xs], Y, [Y|Ys], Index), Index =\= 0. select_offset(Member, List, Residue, Offset) :- append_length(_, [Member|Tail], List, Tail, Residue, Offset). select_index(Member, List, Residue, Index) :- append_length(_, [Member|Tail], [Member|List], Tail, [Member|Residue], Index), Index =\= 0.
Prolog has traditionally had reverse/2
.
Common Lisp also provides (REVAPPEND - -)
,
and such a predicate is commonly part of the implementation
of reverse/2
.
:- type reverse_append_length(list(T), list(T), list(T), integer). :- reverse_append_length(L, B, RB, N) when L ; RB ; N. :- reverse_append_length(L, B, RB, N) is semidet when list_skel(L) ; integer(N). :- reverse_append_length(L, B, RB, N) is bounded when list_skel(RB). % Sample implementation. reverse_append_length(L, B, RB, N) :- append_length(L, [], L, _, B, RB, N), $reverse(L, B, RB). $reverse([], B, B). $reverse([X|L], B, RB) :- $reverse(L, [X|B], RB).
Semantics: reverse_append(L, B, RB, N)
is true when RB = reverse(L)++B
and
length(L) = N
.
Notation: There is no new notation here, but I feel that I must repeat the warning that the "sample implementations" of "base" predicates are crafted for exposition and specification. This is not the way that I would implement this predicate for practical use. What implementor would use a two-pass algorithm when a one-pass algorithm could be used?
Except that L
is to be reversed, the argument
order of this predicate is identical to the argument order of
append_length/4
, as the name suggests.
reverse_append(L, B, RB) :- reverse_append_length(L, B, RB, _). reverse_length(L, R, N) :- reverse_append_length(L, [], R, N). reverse(L, R) :- reverse_length(L, R, _).
permutation_length
:- pred permutation_length(list(T), list(T), integer). :- permutation_length(A, B, N) is bounded when list_skel(A) ; list_skel(B) ; nonvar(N). % Sample implementation permutation_length(A, B, N) :- same_length(A, B, N), $perm(A, B). $perm([], []). $perm([X|Xs], Ys1) :- $perm(Xs, Ys), append(_, R, Ys, [X|R], Ys1).
permutation(A, B) :- permutation(A, B, _).
pairs_keys_values/3
Given the great practical utility of keysort/2
,
it is useful to be able to build and dismantle the lists it
requires and produces.
I am not satisfied with the name that was used in the Quintus
library (keys_and_values/3
, because it does not make clear
which argument is which. Nor is there any way to appeal to 'inputs
before outputs' because the predicate is used both ways around. I am
tentatively calling it pairs_keys_values/3
.
Whatever it's called, it is a surprisingly useful predicate.
:- pred pairs_keys_values(list(pair(K,V)), list(K), list(V)). :- pairs_keys_values(Pairs, Keys, Values) when Pairs ; Keys ; Values. :- pairs_keys_values(Pairs, Keys, Values) is semidet when list_skel(Pairs) ; list_skel(Keys) ; list_skel(Values). % Sample implementation pairs_keys_values([], [], []). pairs_keys_values([K-V|Pairs], [K|Keys], [K|Values]) :- pairs_keys_values(Pairs, Keys, Values).
Note that the sample implementation doesn't actually satisfy the requirements. In a traditional first-argument-indexing Prolog, if the first argument is a variable but one of the others is a list skeleton, the determinism of the query will not be detected.
I'm not sure it's worth bothering with these.
pairs_keys(Pairs, Keys) :- pairs_keys_values(Pairs, Keys, _). pairs_values(Pairs, Values) :- pairs_keys_values(Pairs, _, Values).
The definition of the "delete" predicate is something I
struggled with for years. Part of the problem is that if list
R does not contain X, there are infinitely many lists L such
that deleting X from L yields R, so the predicate really only
works as a function. That tells us that L and X should precede
R in the argument order. Should it be
delete(Member, List, Residue)
to be consistent
with select/3
? Or, because delete/3
is dangerously different from select/3
, is it
actually a good thing if they are different?
Prolog has three versions of equality: (=)/2, (==)/2, and (=:=)/2. Which of them should be used? Or should we have three predicates, the way Scheme used to have DELQ!, DELV!, and DELETE!?
Should the item to be deleted be a strict input, or should
delete/3
be able to backtrack over the elements of
the input list as in Cugini's library?
If X does not occur in L, is that a good input, or should it fail?
There is a definition in the Quintus library. It's basically this:
/* Historic background, NOT for inclusion */ :- pred delete(list(T), T, list(T)). :- delete([], _, _) when true. :- delete([H|T], X, R) when ground(H), ground(X). :- delete(L, X, R) is semidet when ground(L), ground(X). % Sample implementation delete([], _, []). delete([H|T], X, R) :- H == X, !, delete(T, X, R). delete([H|T], X, [H|R]) :- delete(T, X, R).
I don't want to take delete/3
away from the
people who use it, if there are any, but it clearly doesn't
belong in even an informal standard. The problem is that there
are so many choices that none of them is going to be the right
one for the majority of uses (if uses there be); none of them
deserves the exclusive right to the name delete/3
.
The history of Scheme may be of interest here. There used to be
three delete-from-list functions in Scheme, but there aren't
any these days.
One of the questions about deleting from a list is how many
elements to delete. I note that when I want to delete a
known element from a list, I almost always want to delete just
one copy (usually because I know there is only one copy). For
that purpose, selectchk(X, L, R)
works very well.
The result is that despite writing delete/3
,
I cannot recall ever using it except to test it.
If delete/3
is not standard, what can programmers
do instead? One easy answer is to use findall/3
:
findall_member(Member, List, Test, Result) :- findall(Member, (member(Member, List), \+ \+call(Test)), Result). ... findall_member(Y, L, \+ Y = X, R) ... findall_member(Y, L, Y \== X, R) ... findall_member(Y, L, Y =\= X, R)
Using findall_member/4
instead of
delete/3
has some important advantages:
delete/3
in earnest, I would have used this.
The only bad thing about this predicate is efficiency.
But that's why I introduced findall_member/4
in
the first place, instead of calling findall/3
directly. It's comparatively simple for a compiler to recognise
calls to findall_member/4
and generate a three-clause
loop (essentially the structure of delete/3
shown
above), using a cache to ensure that the same code is not generated
repeatedly. Or for a compiler such as BinProlog, where
findall/3
is fast, the compiler could just convert
a findall_member/4
call to the corresponding
findall/3
code.
The findall/4
predicate, which combines the effect
of findall/3
with the effect of append/3
,
is unaccountably missing from ISO Prolog. When you know about it,
you find nearly as many uses for findall/4
as for
findall/3
. This suggests to me that the right "base"
predicate is not findall_member/4
but
findall_member_append/5
.
:- pred findall_member_append(T, list(T), void, list(T), list(T)). :- findall_member(M, L, P, R) is semidet, when list_skel(L), nonvar(P). % Sample implementation. findall_member_append(_, [], _, R, R). findall_member_append(M, [X|Xs], P, R0, R) :- \+ (X = M, call(P)), !, findall_member_append(M, Xs, P, R0, R). findall_member_append(M, [X|Xs], P, [X|R0], R) :- findall_member_append(M, Xs, P, R0, R).
findall_member(M, Xs, P, R) :- findall_member(M, Xs, P, R, []). delete(L, X, R) :- findall_member(M, L, X \== M, R). intersection(S1, S2, S) :- findall_member(M, S1, member(M, S2), S). difference(S1, S2, S) :- findall_member(M, S1, nonmember(M, S2), S). union(S1, S2, S) :- findall_member(M, S1, nonmember(M, S2), S, S2). symdiff(S1, S2, S) :- findall_member(M, S1, nonmember(M, S2), S, S3), difference(S1, S2, S3).
I am not actually recommending any of the predicates in this
section. In particular, I am not recommending any of the
traditional quadratic-time set-as-unordered-list predicates for
inclusion in the "community standard library". What I do
recommend is that either no such predicates (letting people call
findall/3
themselves, or write and document their
own deletion predicates), or else one predicate should be
included in the library, and that predicate should be
findall_member_append/5
.
The Quintus library contains delete(L, X, N, R)
which deletes the first N
copies of X
in L
, or fewer if there aren't that many. To be
honest, it is difficult to imagine any serious use of this.
N=0
is just R=L
. N=1
is just selectchk/3
. N
larger than
the length of L
is just delete/3
. I
never did find any other uses for it.
The following predicates from the DEC-10 and/or Quintus libraries have not been slotted into families yet.
min
,
max
}_{member([P, ]X, L)
,
select([P, ]X, L, R)
}. In particular, it is time that
msort/2
was adopted.
What self-respecting list-processing language would be caught dead without a basic kit of higher order list processing operations? Only Prolog, I guess.
WARNING: this is a draft. The use of N is not consistent in this section. I'm sending it out now so I can go home.
The following predicate families come from DEC-10 Prolog and Quintus Prolog:
map/
N.
maplist(P, [X11,...,X1n], ..., [Xm1,...,Xmn]) :- P(X11, ..., Xm1), ... P(Xm1, ..., Xmn).
scanlist(P, [X11,...,X1n], ..., [Xm1,...,Xmn], V0, Vn) :- P(X11, ..., Xm1, V0, V1), ... P(Xm1, ..., Xmn, V', Vn).
cumlist(P, [X11,...,X1n], ..., [Xm1,...,Xmn], V0, [V0,V1,...,Vn]) :- P(X11, ..., Xmn, V0, V1), ... P(Xm1, ..., Xmn, V', Vn).
some(P, [X11,...,X1n], ..., [Xm1,...,Xmn]) :- P(X11, ..., Xm1) ; ... ; P(Xm1, ..., Xmn).
Why don't we need it? Suppose we generalise member,
to member
/2N:
member(X1, [X1|_], ..., Xn, [Xn|_]). member(X1, [_|Xs1], ..., Xn, [_|Xsn]) :- member(X1, Xs1, ..., Xn, Xsn).
Then instead of
some(P, Xs1, ..., Xsn)
we can write
member(X1, Xs1, ..., Xn, Xsn), P(X1, ..., Xn)
and we not only avoid all overheads of passing P and get to call it directly, if the query is try, we get to find out which values of X1...Xn make it true.
If an analogue of some
/N is implementable, so is an
analogue of member
/2N.
somechk(P, [X11,...,X1n], ..., [Xm1,...,Xmn]) :- P(X11, ..., Xm1) -> true ; ... ; P(Xm1, ..., Xmn) -> true.
Similarly, instead of
somechk(P, Xs1, ..., Xsn)
we can write
(member(X1, Xs1, ..., Xn, Xsn), P(X1, ..., Xn) -> true)
and get to find out why P is true as well that it is true.
include(P, [], ..., [], []). include(P, [X1|Xs1], ..., [Xn|Xsn], Included) :- ( call(P, X1, ..., Xn) -> Included = [X1|Included1] ; Included = Included1 ), include(P, Xs1, ..., Xsn, Included1).
Note that this really requires P to be determinate, but it does allow variables in P to be bound. For a ground P, we could manage without this, writing
findall(X1, ( member(X1, Xs1, .., Xn, Xsn), P(X1, ..., Xsn) ), Included)
would do the job. Some Prolog systems (BinProlog) may implement findall/3 so very efficiently to make this an attractive alternative.
exclude(P, [], ..., [], []). exclude(P, [X1|Xs1], ..., [Xn|Xsn], Included) :- ( call(P, X1, ..., Xn) -> Included = Included1 ; Included = [X1|Included1] ), exclude(P, Xs1, ..., Xsn, Included1).
There were and are others, but this will do to start with.
The obvious missing predicates are
foldr(P, [X11,...,X1n], ..., [Xm1,...,Xmn], V0, Vn) :- P(Xm1, ..., Xmn, V', Vn), ... P(X11, ..., Xm1, V0, V1). scanr(P, [X11,...,X1n], ..., [Xm1,...,Xmn], Vn, [V0,V1,...,Vn]) :- P(Xm1, ..., Xmn, V', Vn), ..., P(X11, ..., Xm1, V0, V1).
maplist
/N+1,
either under that name for backwards compatibility,
or under the name map
/N+1
for cultural compatibility with the functional programming
community.
scanlist
/N+3,
but under the name foldl
/N+3.
cumlist
/N+3,
but under the name scanl
/N+3.
foldr
/N+3 and
scanr
/N+3.
include
/N+2 and
exclude
/N+2, under those names;
if we called one of them 'filter', what would we call the
other?
member
/2N under some name
for a useful range of N.
some
/N+1
or somechk/N+1, because it is easy to
manage without them. I do not say that nobody should
provide them, only that we should not require them of
everyone.
This is very much a minimal set of higher-order list operations.
Many set representations can be used in Prolog: unordered lists, ordered lists, delta-coded ordered lists, bitstrings, binary search trees of various types, ternary search trees.
It is important that the commonly available implementation of sets should be efficient enough to use without hesitation. It is also important that the operation names should not conflict with widely used operation names for other set representations.
This means that either names such as union/3
must
be avoided, or set predicates must not be used without a module
prefix to identify the representation.
Without a change to ISO Prolog, we cannot enforce a requirement that module prefixes be used, so we have to avoid commonly used names.
The representation most widely used in Prolog textbooks is the unordered list representation. This is also the representation used in Common Lisp.
The ordered list representation has its flaws (adding, removing, or checking for the presence of a single element is O(|Set|) rather than O(lg|Set|)), but the unordered list representation has no virtues.
One of the flaws of any term comparison-based set representation
in ISO Prolog is going to be that it allows variables to be set
elements, so that two terms can be in a set because when they were
entered they were not identical, but subsequent variable bindings
can make them identical. The answer to this is to use a mode system
which requires that set elements are sufficiently instantiated,
perhaps via a "safe" version of compare/3
.
I recommend the ordered set representation. It is space efficient,
time efficient for bulk operations, and is already used in Prolog
(the sort/2
and setof/3
predicates deliver
results of this form, which we might as well exploit).
In order to preserve backwards compatibility and to avoid name
clashes with the unordered list representation, all predicate names
have the form ord_
something.
ord_op/5
:- type ordset(T) = list(T). :- pred ord_op(integer, ordset(T), ordset(T), ordset(T), ordset(T)). :- ord_op(M, Xs, Ys, S1, S2) when M, Xs, Ys. :- ord_op(M, Xs, Ys, S1, S2) is semidet when ground(M), ground(Xs), ground(Ys). ord_op(M, [], Ys, S1, S2) :- ( M /\ 2'000010 =:= 0 -> S1 = [] | S1 = Ys ), ( M /\ 2'010000 =:= 0 -> S2 = [] | S2 = Ys ). ord_op(M, [X|Xs], Ys, S1, S2) :- $ord_op1(M, Xs, Ys, S1, S2, X). $ord_op1(M, Xs, [], S1, S2, X) :- ( M /\ 2'000001 =:= 0 -> S1 = [] | S1 = [X|Xs] ), ( M /\ 2'001000 =:= 0 -> S2 = [] | S2 = [X|Xs] ). $ord_op1(M, Xs, [Y|Ys], S1, S2, X) :- compare(R, X, Y), $ord_op12(R, Xs, Ys, S1, S2, X, Y, M). $ord_op2(M, [], Ys, S1, S2, Y) :- ( M /\ 2'000001 =:= 0 -> S1 = [] | S1 = [Y|Ys] ), ( M /\ 2'001000 =:= 0 -> S2 = [] | S2 = [Y|Ys] ). $ord_op2(M, [X|Xs], Ys, S1, S2, Y) :- compare(R, X, Y), $ord_op12(R, Xs, Ys, S1, S2, X, Y, M). $ord_op12(<, Xs, Ys, S1, S2, X, Y, M) :- ( M /\ 2'000001 =:= 0 -> S1 = T1 | S1 = [X|T1] ), ( M /\ 2'001000 =:= 0 -> S2 = T2 | S2 = [X|T2] ), $ord_op2(M, Xs, Ys, T1, T2, Y). $ord_op12(>, Xs, Ys, S1, S2, X, Y, M) :- ( M /\ 2'000010 =:= 0 -> S1 = T1 | S1 = [Y|T1] ), ( M /\ 2'010000 =:= 0 -> S2 = T2 | S2 = [Y|T2] ), $ord_op1(M, Xs, Ys, T1, T2, X). $ord_op12(=, Xs, Ys, S1, S2, X, Y, M) :- ( M /\ 2'000100 =:= 0 -> S1 = T1 | S1 = [X|T1] ), ( M /\ 2'100000 =:= 0 -> S2 = T2 | S2 = [Y|T2] ), ord_op(M, Xs, Ys, T1, T2).
Looking at this predicate, we see that there is a clear notion of inputs (M, Xs, Ys) and outputs (S1, S2).
There are two ways to define is_ordset/1
. One
enforces the groundness of an ordered set. That is appropriate
for systems which can delay calls to (some analogue of)
compare/3
. The other is useful for hacking, and
allows set elements to be non-ground. At this stage I do not
wish to make any recommendation, so I merely note the two definitions:
is_ordset(S) :- % enforces groundness ground(S), sort(S, S). is_ordset(S) :- % allows sets of variables nonvar(S), $is_ordset(S). $is_ordset([]). $is_ordset([X|Xs]) :- nonvar(Xs), $is_ordset(Xs, X). $is_ordset([], _). $is_ordset([Y|Ys], X) :- nonvar(Ys), X @< Y, $is_ordset(Ys, Y).
The list processing predicates, especially
length/2
, member/2
, and select/3
can be used with ordered sets. The memberchk/2
and
selectchk/3
predicates are rather more dubious, and
need their own implementations.
list_to_ordset(List, Set) :- sort(List, Set). ord_op(M, Xs, Ys, S1) :- ord_op(M, Xs, Ys, S1, []). ord_compare(R, S1, S2) :- ord_op(2'010001, S1, S2, D12, D21), $ord_compare(R, D12, D21). $ord_compare(<, [], [_|_]). $ord_compare(>, [_|_], []). $ord_compare(=, [], []). ord_disjoint(S1, S2) :- ord_op(2'100, S1, S2, []). ord_disjoint_union(S1, S2, S) :- ord_op(2'100011, S1, S2, S, []). ord_intersect(S1, S2) :- ord_op(2'100, S1, S2, [_|_]). ord_intersection(S1, S2, S) :- ord_op(2'100, S1, S2, S). ord_intersection([], []). ord_intersection([S1|Ss], S) :- ord_intersection(Ss, S2), ord_intersection(S1, S2, S). ord_memberchk(X, S) :- ord_intersect([X], S). ord_nonmember(X, S) :- ord_disjoint([X], S). ord_proper_subset(S1, S2) :- ord_op(2'010001, S1, S2, [], [_|_]). ord_proper_superset(S1, S2) :- ord_op(2'001010, S1, S2, [], [_|_]). ord_selectchk(X, S1, S2) :- ord_op(2'010, [X], S1, S2). ord_subset(S1, S2) :- ord_op(2'001, S1, S2, []). ord_subtract(S1, S2, S) :- ord_op(2'001, S1, S2, S). ord_superset(S1, S2) :- ord_op(2'010, S1, S2, []). ord_symdiff(S1, S2, S) :- ord_op(2'011, S1, S2, S). ord_union([], []). ord_union([S1|Ss], S) :- ord_union(Ss, S2), ord_union(S1, S2, S). ord_union(S1, S2, S) :- ord_op(2'111, S1, S2, S). ord_union(S1, S2, U, D) :- ord_op(2'001111, S1, S2, U, D).
There are two names I am not happy with here.
I would prefer it if ord_subtract/3
had a less "imperative"
name, such as ord_difference/3
. It would also be clearer
if ord_union/4
were named ord_union_difference/4
.
However, that would break backwards compatibility, and that's bad manners
There are four predicates missing from the old library.
ord_seteq/2
has been generalised to
ord_compare/3
.
ord_add_element(X, S0, S)
is replaced by
ord_union([X], S0, S)
.
ord_del_element(X, S0, S)
is renamed to
ord_selectchk(X, S0, S)
to be compatible with
the list operations.
ord_setproduct/3
preserves set ordering but does not
depend on it. Its analogue probably belongs among the list operations.
Of course, nothing would prevent an implementation providing those predicates for backwards compatibility.
The higher order list operations can be applied to ordered sets.
Results will not necessarily be ordered sets, but can be converted
to ordered sets by sorting. It is useful to note that
[include,exclude]
/N+2 do not alter the order
of their 2nd argument and do not introduce duplicates, so filtering
an ordered set yields an ordered set.
Should be based on q(Unary,L0,L) representation. Each basic operation is O(1).
A queue is a sequence, rather like a list. Adding items or lists of items at either end is rather like appending, and so is removing an item or list of items from the head end. (That is to say, what I'm calling a "queue" here is really an "output-restricted deque").
There are many ways to implement queues. A single list makes some operations cheap and other operations very expensive. A pair of back-to-back lists works nicely in a strict functional language, but since there is a slow rebalancing operation which may be executed repeatedly on backtracking, they are not efficient in Prolog. The representation I propose here was invented by someone at SRI. Fernando Pereira told me about it, and I am sorry to say that I have forgotten the inventor's name. This representation has the nice property that adding an element at either end or removing an element from the head end requires just a single unification. This makes it well suited to Prolog.
There is one flaw in this representation: it uses list differences so that appending an item at the tail end must be regarded as destroying the original queue. Queue uses should be single threaded. Mercury can enforce this, but does not support the partially instantiated terms the technique relies on. Prolog does support partially instantiated terms, but cannot enforce single threading.
We found a similar problem with sets; a representation which Prolog is good at has technical flaws which mean that it has to be used carefully, but the efficiency of the representation is such that we are unwilling to abandon it.
:- type queue(T) ---> q(unary, list(T), list(T)). :- type unary ---> z ; s(unary). is_queue(q(U,L0,L)) :- nonvar(U), nonvar(L0), $is_queue(U, L0, L). $is_queue(z, L0, L) :- L0 == L. $is_queue(s(U), [_|L1], L) :- nonvar(U), nonvar(L1), $is_queue(U, L1, L). item_queue_append(X, q(U,L0,L), q(s(U),[X|L0],L)). list_queue_append([], Q, Q). list_queue_append([X|Xs], Q, Q0) :- item_queue_append(X, Q1, Q0), list_queue_append(Xs, Q, Q1). queue_item_append(q(U,L0,[X|L]), X, q(s(U),L0,L)). queue_list_append(Q, [], Q). queue_list_append(Q0, [X|Xs], Q) :- queue_item_append(Q0, X, Q1), queue_list_append(Q1, Xs, Q). queue_list(q(U,L0,L), Xs) :- $queue_list(U, L0, L, Xs). $queue_list(z, L, L, []). $queue_list(s(U), [X|L1], L, [X|Xs]) :- $queue_list(U, L1, L, Xs). cons_queue(X, Q0, Q1) :- item_queue_append(X, Q0, Q1). head_queue(X, Q1) :- item_queue_append(X, _, Q1). tail_queue(Q0, Q1) :- item_queue_append(_, Q0, Q1). member_queue_offset(X, q(s(U),L0,_), N) :- $member(X, U, L0, 0, N). $member(X, _, [X|_], N, N). $member(X, s(U), [_|L1], N0, N) :- N1 is N0 + 1, $member(X, U, L1, N1, N). member_queue_index(X, q(s(U),L0,_), N) :- $member(X, U, L0, 1, N). member_queue(X, Q) :- member_queue_index(X, Q, _). memberchk_queue(X, Q) :- member_queue(X, Q), !. queue_list_length(Q, Xs, N) :- Q and Xs have the same elements, N of them. queue_empty(Q) :- queue_list(Q, []). queue_length(q(U,L0,L), N) :- ( integer(N) -> N >= 0, $queue_length(U, L0, L, N) ; var(N) -> $queue_length(U, L0, L, 0, N) ; $error ). $queue_length(z, L, L, N, N). $queue_length(s(U), [_|L1], L, N0, N) :- N1 is N0 + 1, $queue_length(U, L1, L, N1, N). $queue_length(s(U), [_|L1], L, N0) :- N0 > 0, !, N1 is N0 - 1, $queue_length(U, L1, L, N1). $queue_length(z, L, L, 0). portray_queue(Q) :- ( write('[|'), member_queue_offset(X, Q, N), ( N =:= 0 -> true ; write(', ') ), write(X), fail ; write('|]') ).
I have omitted the following predicates from the Quintus library:
list_queue(Xs, Q)
is queue_list(Q, Xs)
.
queue_last(Q, X)
has the wrong argument order and
is not a fast operation.
queue_last(Q0, X, Q)
has the wrong argument order
and is not a fast operation.
Since the Quintus module is called 'newqueues', this should not be a problem.
I think I may already have too many operations providing access to the elements of a queue other than by popping them. It doesn't feel right to propose any higher-order operations on them. The idea of having 2N map*/N+1 predicates,
map(P, L1, L2) map_list_queue(P, L1, Q2) map_queue_list(P, Q1, L2) map_queue_queue(P, Q1, Q2)
doesn't appeal either. Once the higher order list operations are there, anyone who wants higher order queue operations can easily program them him/herself.
There are three predicates which are so basic that the ISO Prolog standard itself uses them, while at the same time refusing to bless Prolog programmers with assured access to them.
succ/2
provides the successor function on
natural numbers only. It is essential to the purposes
for which succ/2
was invented that it should not accept or
produce negative integers.
:- pred succ(integer, integer). :- succ(X, Y) when X ; Y. :- succ(X, Y) is semidet when X ; Y. % Sample implementation succ(X, Y) :- ( integer(X) -> X >= 0, Y is X + 1 ; integer(Y) -> Y > 0, X is Y - 1 ; var(X), var(Y) -> instantiation fault ; $error ).
plus/3
provides reversible arithmetic on exact numbers.
It was intended to work with rational numbers as well as integers; since
ISO Prolog omits rational numbers, it is here specified for integers only.
It is essential to the purposes for which plus/3
was
invented that it should be a proper logical predicate; the nature of
floating point arithmetic means that it cannot behave as intended
if it allows floating-point arguments.
:- pred plus(integer, integer, integer). :- plus(X, Y, Z) when X, Y ; Y, Z ; Z, X. :- plus(X, Y, S) is semidet when X, Y ; Y, Z ; Z, X. % Sample implementation plus(X, Y, Z) :- ( integer(X), integer(Y) -> Z is X + Y ; integer(Y), integer(Z) -> X is Z - Y ; integer(Z), integer(X) -> Y is Z - X ; $error ).
between/3
is astonishingly useful, not least because of
its frequent use in bounding the search space of other predicates. It
is difficult to take seriously any Prolog "standard" which omits it.
:- pred between(integer, integer, integer). :- between(L, U, X) when L, U. :- between(L, U, X) is semidet when ground(L), ground(U), ground(X). :- between(L, U, X) is bounded when ground(L), ground(U). % Sample implementation. between(L, U, X) :- ( integer(L), integer(U) -> ( integer(X) -> L =< X, X =< U ; var(X) -> L =< U, $between(L, U, X) ; $error ) ; integer(L), integer(X), L > X -> fail ; integer(U), integer(X), X > U -> fail ; $error ). $between(L, L, L) :- !. $between(L, _, L). $between(L, U, X) :- M is L + 1, $between(M, U, X).
The order in which the sample implementation enumerates solutions for X is part of the specification.
There is of course a better alternative to between/3
,
and that is to extend the syntax of comparison.
Comparison form | between/3 equivalent
| ||
E =< X =< F | L is E, | U is F, | between(L, U, X) |
E =< X < F | L is E, | U is F-1, | between(L, U, X) |
E < X < F | L is E+1, | U is F-1, | between(L, U, X) |
E < X =< F | L is E+1, | U is F, | between(L, U, X) |
E >= X >= F | U is -F, | L is -E, | between(L, U, Z), X is -Z |
E >= X > F | U is 1-F, | L is -E, | between(L, U, Z), X is -Z |
E > X > F | U is 1-F, | L is -1-E, | between(L, U, Z), X is -Z |
E > X >= F | U is -F, | L is -1-E, | between(L, U, Z), X is -Z |
This requires "distfix" operators, which are easy enough to add to
Prolog syntax and have been so added in the past. These operators
resemble the existing arithmetic comparison operators in evaluating
their "outside" arguments; unlike the existing arithmetic comparison
operators they do not evaluate their "inside" argument and only accept
integers. This approach generalises neatly to multiple generated
arguments, e.g.,
1 =< X < Y =< N.
However, between/3
is the simplest thing that could work
and has been proven over many years, so let's stick with it.
The problem with integer division is that there are several different versions, and the one you get is seldom the one you need. We should consider seriously whether Common Lisp analogues such as
floor(N, D, Q, R) ceiling(N, D, Q, R) round(N, D, Q, R) truncate(N, D, Q, R)
predicates with
floor(N, D), floor_remainder(N, D) ceiling(N, D), ceiling_remainder(N, D) round(N, D), round_remainder(N, D) truncate(N, D), truncate_remainder(N, D)
as arithmetic functions, might not be a good idea. Obviously, one pair of these arithmetic functions would duplicate the effect of the existing // and mod, but can you instantly recall, without looking it up, which?
I brought this up in 1984 and with the BSI and with the ISO committee, to a deafening silence. Surely I can't be the only person who is never quite sure what // is going to do with negative numbers?
This is a draft. More operations will be added, and community discussion may result in adding, changing, or removing some of the operations listed here.
There is nothing here which was not obvious at least two years before the Prolog standard came out. In particular, at a conference in 1985, I warned explicitly that a "minimal" standard was not at all a good idea, and by 1986 most of the predicates listed above could have been designed, had a thorough overhaul of the library been desired instead of minor revisions.