The ISO Prolog standard does not adequately support the needs of
Prolog users. This may seem unbelievable, and it is certainly inexcusable,
but the standard does not even define
append/3
, length/2
, or member/2
(although an example on page 33 shows clauses for
append/3
and an example on page 85 shows clauses for
member/2
).
Many areas are crying out for standardisation, including
This is an attempt to deal with basic data structures. Perhaps more important than the specific data structures and operations dealt with is the stress placed on consistent naming and argument order, and the use of projection as a means of ensuring that argument order is consistent.
I provide three kinds of declaration for the "base" predicates:
when
-declarations the most intuitive way to specify when
a predicate should make progress. (If a goal is suspended, when should
it be resumed? When the when
declaration is satisfied.)
Even in a non-coroutining Prolog
(like ISO Prolog) the information is still useful: it describes the
modes in which it makes sense to call the predicate.
mode
declarations. The "mode" det
would mean that such a
query has a single solution and cannot fail; semidet
means that there is at most one solution. I intend something
stronger here: when a call has the described form, the goal should
either fail in a finite amount of time or should terminate in a finite
amount of time with the Prolog system "aware" that no choices remain to
be explored. The bounded
"mode", not found in Mercury,
means that such a goal has a finite number of solutions (possibly zero)
and that such a goal must be able to report all solutions in a finite
amount of time and then fail in a finite amount of time without
infinite backtracking. The argument instantiation test
list_skel(Var)
is satisfied when Var
is a proper list; the elements of the list are not constrained.
The sample code uses a convention taken from DEC-10 Prolog via C Prolog: the dollar sign is treated as a lower case letter, and predicates whose names contain a dollar sign are "implementation detail" not visible to the user. Such predicates are only for specification purposes, and need not conform to quality requirements for visible predicates such as steadfastness.
Perhaps the most important notational issue is that
I give no indication in the running text of the module a type or
predicate is defined in. It is clear that the four "basic" list
processing predicates append/3
, length/2
,
member/2
, and memberchk/2
are so useful and so
used that they had better be treated as part of the language. It would
be quite appropriate if the others were all in a module named 'lists'.
See plface.pl for one distribution.
I assume the following predefined types:
:- type list(T) ---> [] | [T|list(T)]. :- type order ---> (<) | (=) | (>). :- type pair(K,T) ---> K-T. :- type integer. :- type float. :- type number. :- type atom. :- type char = atom. % such that atom_codes(It, [_]) :- type chars = list(char). :- type code = integer. % such that between(0, 16'10FFFF, It). :- type codes = list(code).
Other types will be introduced where they are used.
With Prolog being used in e-business and stuff like that, we have to deal with HTML 4 (defined in terms of Unicode) and XML (defined in terms of Unicode). If we can't portably deal with Unicode, we can't portably deal with the Web. We can no longer afford to put up with Prolog systems that don't handle Unicode cleanly. Quintus Prolog handled 16-bit character sets via JIS and EUC nearly two decades ago. It was obvious then that the byte-at-a-time way that QP did so was a makeshift with a use-by date in the near future (now the rather less near past).
The ISO Prolog standard was published in 1995. In 1983, when I started work on an informal Prolog standard, it was obvious that 8-bit character sets were dead and Prolog would have to support 16-bit character sets. By 1993, it was obvious that there was a single "wide" character set to use in future language standards: ISO 10646/Unicode. Unicode 1.0.0 came out in October 1991. The reconciliation between ISO 10646 and Unicode resulted in Unicode 1.0.1, which came out in June 1992. This left the ISO Prolog committee with 3 years to reflect on the fact that there was now one ISO character set with round-trip conversion between the main national character set standards and itself.
This was certainly obvious to others. For example, the X/Open technical study E401, "Universal Multiple-Octet Coded Character Set coexistence and Migration" had been published the previous year. Of even greater relevance: a great deal of work had gone into updating Ada for what became Ada 95. By 1993, I already had a pile of Ada 95 background documentation and proposals in my office about 50cm high, and Unicode was clearly going to be the character basis of Ada 95 then.
The main reason why certain people at Quintus were not as hostile to the introduction of the single-character-atom design botch as they should have been was concern for EBCDIC on IBM mainframes. However, I have studied the current z/Architecture Principles of Operation, and note that current IBM mainframes have hardware support for Unicode.
With this background, we can see that the ISO Prolog committee made some questionable decisions:
Æneas
or of including my sister-in-law's
name "Chéri Ford"
in a string literal, in marked
contrast to Ada, C99, and C++.
atom_chars/2
and
number_chars/2
, in the interests of solving a problem
which the adoption of Unicode as the character basis of Prolog would
have completely exorcised without the need for such wholesale breakage
of existing working code.
char_conversion
must go!"This is not negotiable", for the reason that no sane programmer who
is acquainted with the facts would wish to retain
char_conversion
.
That's strong language. Let's summarise what
char_conversion/2
and current_char_conversion/2
do.
char_conversion('X', 'Y')
replaces ConvC with
µ(ConvC,x,y),
where x is the character code of X and
y is the character code of Y.
current_char_conversion(A, B)
is true when A and B are single-character atoms
and b=ConvC(a) where
a and b are the character codes of the first
characters of the names of A and B respectively,
provided (and this is the most confusing bit)
a and b are different.
Concerning the excessive strength of char_conversion/2
,
consider the goal char_conversion('.', ',')
. If you give
that as a query to the interactive top level, you haven't shot yourself
in the foot, but in the head. There is no way to recover from this.
In fact it's worse than that. Section 5.5.3 reads "A processor may
support some other initial value of ConvC ...", which
means that there is no portable Prolog syntax at all.
This implementation permission should have required all
Prolog processors to start with ConvC mapping the
characters used in the Core Prolog syntax to themselves. Indeed, it
might have been advisable to decree that char_conversion
should not be allowed to change the identity mappings for the core
syntax characters.
The :- char_conversion(From, To)
directive is unusable
in portable programs, even programs intended to be used within a single
locale using the same coded character set. The reason is that it is
explicitly not specified whether char_conversion
directives
in a source file are limited in effect (the way it was desired by many
that op
directives should be) or unlimited in effect (the
way op
directives were in DEC-10 Prolog and C Prolog).
More precisely, a program using this feature can be portable only if
every source file spells out the same set of (typically
190) mappings: the fact that it is not specified whether the effect is
limited or not means that a program cannot simply consult a single
mapping file into each source file.
Concerning the weakness of char_conversion/2
, we have
to start by looking at the rationale:
I take this to refer to the "full width" (zenkaku) and "half width" (hankaku) copies of ASCII present in some Japanese character sets (and therefore also present in Unicode).
The char_conversion/2
machinery does not satisfy the
explicit requirement in the rationale that it should apply to Prolog
data. For example, it does not apply to
atom_chars/2
, and there is no analogous built-in predicate
which does character conversion. "Words" read from a file are only
converted to a normal form by ConvC mapping if they
are read in Prolog syntax using one of the read
family.
There should surely have been
converted_atom_chars(Atom, Chars) :- ( atom(Atom) -> atom_chars(Atom, Name), $convert_chars(Name, Chars) ; var(Atom) -> $convert_chars(Chars, Name), atom_chars(Atom, Name) ; $error ). $convert_chars([], []). $convert_chars([Char|Chars], [Conv|Convs]) :- ( current_char_conversion(Char, Conv) -> true ; Conv = Char ), $convert_chars(Chars, Convs).
An even worse way in which this misfeature fails to satisfy its
explicit rationale is that character conversion does not apply to
number_chars/2
, and there is no analogous built-in
predicate which does character conversion. Numbers read from a file
are only converted to a normal form by ConvC mapping
if they are read in Prolog syntax using one of the read
family.
The failure to deal with run-time number conversion is a particular nuisance for people using Arabic or Indic scripts. As a preliminary sketch, it would be useful if something like the following predicates were adopted:
:- pred integer_codes(integer, codes, integer, code). integer_codes(Integer, Codes, Base, Zero) :- Delta is Zero - 0'0, ( integer(Integer) -> integer_codes(Integer, ASCII, Base), $map(ASCII, Codes, Delta, 0'., 0'E) ; var(Integer) -> $map(ASCII, Codes, Delta, 0'., 0'E), integer_codes(Integer, ASCII, Base) ; $error ). integer_codes(Integer, Codes, Base) :- between(Base, 2, 36), like number_codes/2 but only for integers and using the first Base digits from "0..9A..Z". integer_codes(Integer, Codes) :- integer_codes(Integer, Codes, 10). :- type float_format ---> e(integer) | e | f(integer) | f | g(integer) | g | k(integer) | k | p. :- pred float_codes(float, codes, float_format, code, code, code). float_codes(Float, Codes, Format, Zero, Decimal, Exponent) :- Delta is Zero - 0'0, ( float(Float) -> float_codes(Float, ASCII, Format), $map(ASCII, Codes, Delta, Decimal, Exponent) ; var(Float) -> $map(ASCII, Codes, Delta, Decimal, Exponent), float_codes(Float, ASCII, Format) ; $error float_codes(Float, Codes, Format, Zero, Decimal) :- float_codes(Float, Codes, Format, Zero, Decimal, 0'e). float_codes(Float, Codes, Format, Zero) :- float_codes(Float, Codes, Format, Zero, 0'., 0'e). float_codes(Float, Codes, Format) :- like number_codes/2 but only for floats, ( Format = e(P), integer(P), P >= 0 -> generate or recognise %.Pe format ; Format = e -> generate: use default P, recognise: accept any %e ; Format = f(P), integer(P), P >= 0 -> generate or recognise %.Pf format ; Format = f -> generate: use default P, recognise: accept any %f ; Format = g(P), integer(P), P >= 0 -> generate or recognise %.Pg format ; Format = g -> generate: use default P, recognise: accept any float ; Format = k(P), integer(P, P >= 0 -> like e(P), but with 1..3 digits before the decimal point so that the exponent is a multiple of 3; a very useful format when you have it ; Format = k -> generate: use default P, recognise: accept any P but still allow only results of k(_) formatting ; Format = p -> % precise generate: same as e(P) with P large enough that full precision is generated, enough digits for round trip conversion to be one-to-one ; $error ). $map([0'+|Xs], [0'+|Ys], Z, D, E) :- !, $map(Xs, Ys, Z, D, E). $map([0'-|Xs], [0'-|Ys], Z, D, E) :- !, $map(Xs, Ys, Z, D, E). $map([0'.|Xs], [ D |Ys], Z, D, E) :- !, $map(Xs, Ys, Z, D, E). $map([0'e|Xs], [ E |Ys], Z, D, E) :- !, $map(Xs, Ys, Z, D, E). $map([0'E|Xs], [ E |Ys], Z, D, E) :- !, $map(Xs, Ys, Z, D, E). $map([ N |Xs], [ M |Ys], Z, D, E) :- !, plus(Z, N, M), between(0'0, 0'9, N), $map(Xs, Ys, Z, D, E).With such predicates, we then define
:- pred number_codes(number, codes, code). number_codes(Number, Codes, Zero) :- ( integer(Number) -> integer_codes(Number, Codes, Zero) ; float(Number) -> float_codes(Number, Codes, p, Zero) ; var(Number) > ( integer_codes(Number, Codes, Zero) -> true ; float_codes(Number, Codes, p, Zero) ) ; $error ). number_codes(Number, Codes) :- number_codes(Number, Codes, 0'0).
At last, 10 years late, this would give people using Arabic or Indic scripts (amongst others) the ability to convert numbers using their own characters, and it would, 10 years late, give Europeans who like a rather ambiguous use of commas to convert floating-point numbers using their preferred decimal point character.
Is there any other way char_conversion/2
fails?
Yes, there is. I shall take personal examples. The only languages
that really matter to me personally (other than mattering because I
think other people should be able to use their own language and script)
are English, Latin, Greek, Hebrew, Serbo-Croatian, and Maori.
Writing in English, using ISO Latin 1, I would like to be able to
use my father's name, Æneas. The letter ash is not one of the
normal Prolog characters, so I would have to map it to two
characters: A, e. But I can't do that. I cannot have the atom
hæme
mapped to haeme
either, for the
same reason. Since Latin1 includes the characters
"¼", "½", and "¾", which are part of the number
syntax of my native locale, I would like to use them in numbers, so
that ¼=0.25, 2½=2.5, and 10¾=10.75. This could
be done if I could map "¼" to ".25" and so on. But with
char_conversion/2
so limited, I can't. If
char_conversion/2
can't even handle ISO Latin 1, what
good is it?
I suppose I should have said "Croatian" rather than "Serbo-Croatian",
but there was a generation of New Zealanders who knew that their grandparents
were from Yugoslavia who had to look the natal village up on a map to figure
out which side they would have been on when Yugoslavia broke up. As it
turned out, my relatives were all on the side labelled "refugee", and we
have lost touch; the natal village is a heap of rubble. The NZ branch of
the family figured out which to call themselves by looking at letters my
grandfather's brother had written: they used the Latin script, not the
Cyrillic script. Except that Croatian doesn't quite use the
Latin script. There are four letters (DZ, DZ with caron, LJ, and NJ)
which look like two English letters but are regarded as single letters in
the script. It would probably be a good idea to map DZ to D,Z;
Dz to D,z; and dz to d,z. But I can't do that with
char_conversion/2
.
New Zealand has two official languages, English and Maori. For historical reasons, there are three spelling conventions for Maori:
Accordingly, in processing Maaori text in which vowel length is
marked, the best scheme would be to map U+0100 (capital A with macron)
and U+00C4 (capital A with daieresis) to A,A, and so on for the other
9 upper and lower case vowels. But I can't do that with
char_conversion
.
Do I need to mention the Unicode compatibility mappings which map one character to several? Maybe I had better. Take the character U+00A0 "DIAERESIS", for example. Its compatibility mapping is "<compat> 0020 0308". A quick tally of the number of replacement characters in the Unicode 3.1.1 database yielded
Replacement | Number of |
---|---|
length | occurrences |
1 | 3005 |
2 | 1585 |
3 | 360 |
4 | 62 |
5 | 15 |
6 | 2 |
8 | 1 |
18 | 1 |
This is of course a lower bound; the replacement rules in that file have to be applied recursively, so the average replacement length is somewhat longer.
In conclusion, the entire char_conversion/2
framework
is too narrowly conceived, too little applied, and too dangerous in
effect.
In order to deal effectively with Unicode (and it was the plain responsibility of the ISO Prolog committee to address this), some other means entirely will have to be found.
Unicode Standard Annex #15 defines four normalisation forms. Canonical Equivalence in Applications defines two more. We also need a "verbatim" normalisation form which acts as the identity function.
Name | Code | Source | What it is |
---|---|---|---|
v | Here | No change | |
NFD | d | TR15 | Canonical decomposition |
NFC | c | TR15 | Canonical decomposition then canonical composition |
FCD | fcd | TN5 | "Fast C or D" decomposition |
FCC | fcd | TN5 | FCC form |
NFKD | kd | TR15 | Compatibility decomposition |
NFKC | kc | TR15 | Compatibility decomposition then canonical composition |
Canonical decomposition and (re)composition apply, in principle, to any character set containing both precomposed accented letters and floating diacriticals. Unicode was by no means the first such character set.
FCC and FCD forms are intermediate between NFC and NFD; they make sense whenever NFC and NFD make sense.
Compatibility decomposition applies, in principle, to any character
set containing multiple representations of the "same" character.
In particular, the Japanese zenkaku and hankaku are just such characters.
In fact, compatibility decomposition is pretty much exactly what
char_conversion/2
was supposed to be all about, except that
it is by no means true that all compatibility mappings in all such
character sets have single-character replacements.
Basically, to really do the job that code_conversion/2
was supposed to do but doesn't, we need Normalisation Form KD or
even better, Normalisation Form KC.
We need two modules, one for each representation of characters. (Unicode-hacking life would be so much easier if we only had to deal with integers in the range 0..16'10FFFF. For example, while '\xD800\xDC00' is, according to Unicode rules, the encoding of one character, it is not a 'char' according to ISO Prolog rules.)
The following type is shared by codes
and chars
:
:- type normalisation_form ---> v | c | d | fcc | fcd | kc | kd.
It is a matter for debate which normalisation forms should be supported by all Prolog systems. It is a practical fact that some normalisation forms should be supported by most Prolog systems. Whatever set of normalisation forms is eventually agreed on, implementors ought to be allowed to extend the set.
:- module(chars, [...]). :- pred normalise(normalisation_form, chars, chars, chars). :- normalise(NF, Xs, Ys0, Ys) when NF, Xs. :- normalise(NF, Xs, Ys0, Ys) is semidet when ground(NF), ground(Xs). normalise(NF, Xs, Ys0, Ys) :- Ys0\Ys is Xs in normalisation form NF. normalise(NF, Xs, Ys) :- normalise(NF, Xs, Ys, []). normalise(Xs, Ys) :- normalise(kd, Xs, Ys). normalised_compare(NF, R, Xs, Ys) :- normalise(NF, Xs, Xn), normalise(NF, Ys, Yn), compare(R, Xn, Yn). normalised_compare(R, Xs, Ys) :- normalised_compare(kd, R, Xs, Ys). normalised(NF, Xs) :- normalise(NF, Xs, Xs). normalised(Xs) :- normalise(Xs, Xs). ...
Please remember that these are specifications, not
implementations. It is possible to implement
normalised_compare/[3,4]
so that they work in a single
pass over their arguments without building any intermediate list.
Similarly, it is possible to implement
normalised/[1,2]
so that they work in a single pass
over their argument without building any intermediate list.
As specified above, normalised_compare/3
is defined
only on character lists. It could be extended to all ground terms.
I do not recommend this.
:- module(codes, [...]). :- pred normalise(normalisation_form, codes, codes, codes). :- normalise(NF, Xs, Ys0, Ys) when NF, Xs. :- normalise(NF, Xs, Ys0, Ys) is semidet when ground(NF), ground(Xs). normalise(NF, Xs, Ys0, Ys) :- Ys0\Ys is Xs in normalisation form NF. normalise(NF, Xs, Ys) :- normalise(NF, Xs, Ys, []). normalise(Xs, Ys) :- normalise(kd, Xs, Ys). normalised_compare(NF, R, Xs, Ys) :- normalise(NF, Xs, Xn), normalise(NF, Ys, Yn), compare(R, Xn, Yn). normalised_compare(R, Xs, Ys) :- normalised_compare(kd, R, Xs, Ys). normalised(NF, Xs) :- normalise(NF, Xs, Xs). normalised(Xs) :- normalise(Xs, Xs). ...
As specified above, normalised_compare/3
is defined
only on character code lists. It could not be extended to all ground
terms, because there is no general way of telling which lists of positive
integers are supposed to be Unicode strings and which are not.
There needs to be a stream property
normalisation_form
(NF) which
can be used to ensure that characters delivered to a Prolog application
are already normalised, and that characters generated by a Prolog
application are appropriately normalised before being delivered to their
destination.
Several Prolog implementations (Xerox Quintus Prolog and SWI Prolog
amongst them) have a "string" data type. Such implementations should
have a string
module analogous to codes
and
chars
.
In the good old days, it was sufficient to have a library(ctypes) module. With ISO's thrice-accursed single-character-atom representation for characters, we need two modules, as noted in the previous section. So we might as well put the character classification predicates in 'chars' and 'codes'. The classification predicates in each module will have the same name and basically the same meaning, but different argument types.
The ISO Prolog standard uses the following character groups:
is_space/1
;
is_solo/1
;
is_quote/2
;
is_csym/1
;
is_digit/[1,2]
;
is_csymf/1
;
is_alpha/1
;
is_lower/1
;
is_upper/1
;
is_symbol/1
.
The term "graphic character" is very poorly chosen; one expects it
either to have something to do with box drawing or to be strongly related
to isgraph()
in the C standards. Instead it refers to what
was previously known as "symbol characters" or "operator characters".
That's a hint. We want to be able to do what C does, plus we want to be able to cover the categories used in Prolog itself.
Fortunately, there's a freely downloadable table classifying many tens of thousands of characters, actively maintained by someone else. Even more fortunately, the categories it uses can be mapped onto the categories we want quite easily. The table is part of the Unicode data base.
I propose a single base predicate, from which the others are easily derived.
:- type unicode_category ---> lu | ll | lt | lm | lo | mn | mc | me | nd | nl | no | pc | pd | ps | pe | pi | pf | po | sm | sc | sk | so | zs | zl | zp | cc | cf | cs | co | cn.
These codes are simply the Unicode category names converted to lower case for convenience in Prolog.
:- pred is_unicode(code, unicode_category). :- is_unicode(C, K) when C ; K. :- is_unicode(C, K) is semidet when ground(C). :- is_unicode(C, K) is bounded when true. is_unicode(C, K) :- between(0, 16'10FFFF, C), UnicodeData.txt says C has category K, K \== cn. % not assign :- pred is_digit(code, integer). :- is_digit(C, V) when C ; V. :- is_digit(C, V) is semidet when ground(C). :- is_digit(C, V) is bounded when true. is_digit(C, K) :- is_unicode(C, nd), UnicodeData.txt gives V as the decimal value of C.
The Unicode standard and associated reports explain implementation techniques for this. Even with just one byte per character, a simple implementation would take 1.1 megabytes. Using multilevel tables can compress this dramatically, not least because only 3 of the 17 planes currently have any characters at all.
is_unicode(C) :- is_unicode(C, _). is_latin1(C, K) :- between(0, 16'FF, C), is_unicode(C, K). is_latin1(C, K) :- is_latin1(C, _). is_ascii(C, K) :- between(0, 16'7F, C), is_unicode(C, K). is_ascii(C) :- is_ascii(C, _). is_alnum(C) :- is_unicode(C, K), memberchk(K, [nd,lu,ll,lt,lm,lo]). is_alpha(C) :- is_unicode(C, K), memberchk(K, [lu,ll,lt,lm,lo]). is_blank(C) :- % known as is_white/1 in Quintus Prolog is_unicode(C, zs). is_bracket(L, R) :- is_unicode(L, ps), is_unicode(R, pe), R is the closing bracket corresponding to L. is_cntrl(C) :- % C0 and C1 controls + formats is_unicode(C, K), memberchk(C, [cc,cf]). is_csym(C) :- is_unicode(C, K), memberchk(C, [pc,nd,lu,ll,lt,lm,lo]). is_csymf(C) :- is_unicode(C, K), memberchk(C, [pc,lu,ll,lt,lm,lo]). is_digit(C) :- is_unicode(C, nd). is_endfile(-1). is_endline(C) :- is_unicode(C, K), memberchk(K, [zl,zp]). is_graph(C) :- is_unicode(C, K), nonmember(K, [zs,zl,zp,cc,cf]). is_layout(C) :- is_lower(C) :- is_unicode(C, ll). is_print(C) :- is_unicode(C, K), nonmember(K, [zl,zp,cc,cf]). is_punct(C) :- is_unicode(C, K), memberchk(K, [pc,pd,ps,pe,pi,pf,po,sm,sc,sk,so]). is_title(C) :- is_unicode(C, lt). is_quote(0'', 0''). is_quote(0'", 0'"). is_quote(0'`, 0'`). is_quote(L, R) :- is_unicode(L, pi), is_unicode(R, pf), R is the final quote corresponding to L. is_space(C) :- is_unicode(C, K), memberchk(K, [zs,zl,zp]). is_upper(C) :- is_unicode(C, lu). is_xdigit(Base, C, V) :- between(2, 36, Base), Base1 is Base - 1, between(0, Base1, V), ( V >= 10 -> ( C is "A" - 10 + V ; C is "a" - 10 + V ) ; is_digit(C, V) ). is_xdigit(C, V) :- is_xdigit(16, C, V). is_xdigit(C) :- is_xdigit(16, C, _).
There are two debatable decisions here. In this draft I have decided
that the "Other, Format" characters of Unicode should be grouped with the
"Other, Control" characters for is_cntrl/1
, on the grounds
that these things fairly obviously are control characters and
that if you just want the C0 and C1 controls it is easy enough to test
for 'cc' yourself. Indeed, is_ascii(C, cc)
will check
for the ASCII (C0) controls. I have also decided that the "csymf"
characters should include all the "Punctuation, Connector" characters,
not just the LOW LINE "_".
i have read through rather more material about wide character classification in C than was enjoyable, and found stunningly little guidance about how non-ASCII characters should be classified, so I must emphasise that the definitions above are a draft.
The is_bracket/2
and is_quote/2
predicates
do not correspond to anything in C, but I found them very useful in
Quintus Prolog and equivalents very useful in a text editor.
Unicode 3.1.1 adds only single quotes, double quotes, single guillemets,
double guillemets, and single and double high reversed-9 quotes to the
Pi category, so is_quote/2
isn't much of a burden. For the
brackets we'd need a somewhat larger table, but there are plenty of
nice looking brackets in Unicode it would be nice to recognise.
The "solo" characters in ISO Prolog are "!,;|%()[]{}". What is the appropriate generalisation for Unicode?
In ASCII, "()[]{}" is the full set of brackets. So Unicode solo characters should include all Unicode brackets. The other characters are all in the Po category, but there are many characters in the Po category which should be symbol characters, not solo characters. The simplest rule seems to be that a non-bracket character should count as a solo character if and only if it is in the Po category and is a script or compatibility variant of one of the existing non-bracket solo characters. In Unicode 3.1.1, that's just the following 24 characters:
It is possible to recognise this set of characters by pattern matching (carefully!) on the names, but since I do not propose access to the names, this has to be expressed by means of a table.
is_solo(C) :- is_unicode(C, K), memberchk(K, [po,ps,pe]), ( K = po -> memberchk(C, [16'0021,16'0025,16'002C,16'003B,16'055C,16'055D, 16'060C,16'061B,16'066A,16'1363,16'1364,16'1802, 16'1808,16'3001,16'FE50,16'FE51,16'FE54,16'FE57, 16'FE6A,16'FF01,16'FF05,16'FF0C,16'FF1B,16'FF64]) ; true ).
The so-called "meta" characters in ISO Prolog are the ASCII umlaut, acute accent, grave accent, and reverse solidus. This is a rather puzzling grouping. Three of them are used for quoting string-like tokens, one of them isn't. Unlike Lisp, the reverse solidus has no "meta" semantics in Prolog except as part of string-like tokens; when it occurs unquoted it is just a plain symbol character.
Since we already have is_quote/2
, which covers the
characters that begin string-like tokens, I do not think that an
*is_meta/1
character classification predicate is appropriate.
The confusingly named "graphic" characters of ISO Prolog are "#$&*+-./:<=>?@^~". This is especially confusing, because what we want to know is not "which characters stand for themselves in quoted tokens" (the answer is: anything except the corresponding closing quote, the reverse solidus, or a Zl or Zp character), but "which characters can be used in unquoted quoted literals that look vaguely like mathematical operators. That is, we need a character class that includes these characters and also includes the reverse solidus, for the sake of ISO Prolog atoms such as \=, \==, =\=, \+, /\, \/, and \ . This is in fact the set "graphic token char" of section 6.4.2.
If we look these characters up in the Unicode data base, we find that they fall into four groups:
These groupings actually fit our intuitions about which characters should be allowed in unquoted atoms very well; when extended to Latin 1 they yield exactly the set of characters my Prolog-in-C parser has accepted since Latin 1 came out. The only problem is that some Po characters are not symbol characters, but we already have a recogniser for those, so:
is_symbol(C) :- is_unicode(C, K), memberchk(K, [pd,po,sc,sm]), \+ is_solo(C).
If Prolog uses Unicode internally, then we don't have to worry much about locales for character classification, so I'll leave this section empty for a while.
In ASCII, every letter was either a lower case letter with a unique context-independent upper case letter or an upper case letter with a unique context-independent lower case letter, and this was the only variation between letter forms.
In the ISO 8859 family of 8-bit character sets, there could be letters which were neither upper nor lower case, and there could be letters which did belong to a case but did not have an opposite case equivalent (there are two of those in Latin-1), and there could be letters with positional variants (there are five of those in 8859-8). Worse still, there could be letters whose opposite case form was not the same length (ß).
Of course it's well known that Greek has two lower case forms for the letter "s" both corresponding to the same upper case form.
Interestingly enough, the Unicode 3.1.1 database defines case mappings, and its case mappings are one-to-one. the letter "ß" is handled by not mapping it to anything else at all, and the Greek letters are handled by mapping the capital "S" to the medial (not the final) letter "s", regardless of the context.
The Unicode standard points out that case mapping is locale-sensitive, and gives the example of Turkish.
Accordingly, we do need some kind of locale objects. This is a very preliminary draft here!
:- type character_locale. % not specified :- pred character_locale(atom, character_locale). :- character_locale(Name, Locale) when Name. :- character_locale(Name, Locale) is semidet when ground(Name). character_locale(Name, Locale) :- Name is the name of a locale and Locale is a private representation of its character facet.
One special locale object is 'unicode', which represents the Unicode case conversion tables. We might also need 'posix' and/or 'C' locales for this. A locale name like 'en_NZ' might or might not be acceptable as a locale object at the implementor's choice.
We get two sets of case conversion predicates. None of them takes a single character as argument or delivers a single character as result. We have, in general, no right to expect that lc(x++y)=lc(x)++lc(y).
:- pred to_lower(character_locale, codes, codes, codes). :- to_lower(L, Xs, Ys0, Ys) when L, Xs. :- to_lower(L, Xs, Ys0, Ys) is semidet when ground(L), ground(Xs). locale_to_lower(Locale, Xs, Ys0, Ys) :- Locale is a character locale and Xs is a string, Ys0\Ys is Xs converted to lower case according to Locale. locale_to_lower(Locale, Xs, Ys) :- to_lower(Locale, Xs, Ys, []). to_lower(Xs, Ys0, Ys) :- locale_to_lower(unicode, Xs, Ys0, Ys). to_lower(Xs, Ys) :- to_lower(Xs, Ys, []). :- pred to_title(character_locale, codes, codes, codes). :- to_title(L, Xs, Ys0, Ys) when L, Xs. :- to_title(L, Xs, Ys0, Ys) is semidet when ground(L), ground(Xs). locale_to_title(Locale, Xs, Ys0, Ys) :- Locale is a character locale and Xs is a string, Ys0\Ys is Xs converted to title case according to Locale. locale_to_title(Locale, Xs, Ys) :- locale_to_title(Locale, Xs, Ys, []). to_title(Xs, Ys0, Ys) :- locale_to_title(unicode, Xs, Ys0, Ys). to_title(Xs, Ys) :- to_title(Xs, Ys, []). :- pred to_upper(character_locale, codes, codes, codes). :- to_upper(L, Xs, Ys0, Ys) when L, Xs. :- to_upper(L, Xs, Ys0, Ys) is semidet when ground(L), ground(Xs). locale_to_upper(Locale, Xs, Ys0, Ys) :- Locale is a character locale and Xs is a string, Ys0\Ys is Xs converted to upper case according to Locale. locale_to_upper(Locale, Xs, Ys) :- locale_to_upper(Locale, Xs, Ys, []). to_upper(Xs, Ys0, Ys) :- locale_to_upper(unicode, Xs, Ys0, Ys). to_upper(Xs, Ys) :- to_upper(Xs, Ys, []).
The title of this section refers to the corresponding section of the Unicode Standard, where rules are given for finding
In Latin1, there is a one to one correspondence between the thing
you might call a `char' in a program and the thing a normal person
looking at a printed text might call a `character'. The imaginary
rock band ``Death-Töngue'' (from ``Bloom County'') clearly has
12 whatsits in its name on either construal. If we step through the
Prolog list (of Latin1 codes) "Death-Töngue"
one
element at a time, we shall visit one written character at a time.
This neat equivalence did not hold in ASCII, although that is not
widely known. In ASCII, it was legitimate to encode
<ö
> as <o,BS,"
> or
<",BS,O
>. This accounts for some of the compromise
character shapes: " has to serve for opening double quote, closing
double quote, and diaresis, ' has to serve for apostrophe, right quote,
and acute accent, , has to serve for comma and cedilla, and so on.
The result is that except for US English, the relationship between a
sequence of characters and a sequence of chars was neither one to one
nor unique.
Unicode brings back the complexities of ASCII, doubled and redoubled. There is in principle no upper bound to the number of diacritical characters which may be attached to a base character, hence no upper bound on the number of Unicode code elements which may be required to represent a single scripteme.
I propose the following predicates for inclusion in both the 'chars' module and the 'codes' module. I illustrate only the 'codes' versions, but the 'chars' versions are obvious.
:- pred next_character(normalisation_form, codes, codes, codes). :- next_character(NF, Xs, Ys0, Ys) when NF, Ys0. :- next_character(NF, Xs, Ys0, Ys) is semidet when ground(NF), ground(Xs), ground(Ys0). :- next_character(NF, Xs, Ys0, Ys) is semidet when NF==v, ground(Ys0). :- next_character(NF, Xs, Ys0, Ys) is bounded when ground(Ys0). next_character(NF, Xs, Ys0, Ys) :- append(Zs, Ys, Ys0), Zs \== [], normalised_compare(NF, =, Xs, Zs), Zs is the longest prefix of Ys0 not containing a character boundary. next_character(Xs, Ys0, Ys) :- next_character(v, Xs, Ys0, Ys). next_character(Xs, Ys0) :- next_character(Xs, Ys0, Ys). :- pred next_word(normalisation_form, codes, codes, codes). :- next_word(NF, Xs, Ys0, Ys) when NF, Ys0. :- next_word(NF, Xs, Ys0, Ys) is semidet when ground(NF), ground(Xs), ground(Ys0). :- next_word(NF, Xs, Ys0, Ys) is semidet when NF==v, ground(Ys0). :- next_word(NF, Xs, Ys0, Ys) is bounded when ground(Ys0). next_word(NF, Xs, Ys0, Ys) :- append(Zs, Ys, Ys0), Zs \== [], normalised_compare(NF, =, Xs, Zs), Zs is the longest prefix of Ys0 not containing a word boundary. next_word(Xs, Ys0, Ys) :- next_word(v, Xs, Ys0, Ys). next_word(Xs, Ys0) :- next_word(Xs, Ys0, Ys). :- pred next_sentence(normalisation_form, codes, codes, codes). :- next_sentence(NF, Xs, Ys0, Ys) when NF, Ys0. :- next_sentence(NF, Xs, Ys0, Ys) is semidet when ground(NF), ground(Xs), ground(Ys0). :- next_sentence(NF, Xs, Ys0, Ys) is semidet when NF==v, ground(Ys0). :- next_sentence(NF, Xs, Ys0, Ys) is bounded when ground(Ys0). next_sentence(NF, Xs, Ys0, Ys) :- append(Zs, Ys, Ys0), Zs \== [], normalised_compare(NF, =, Xs, Zs), Zs is the longest prefix of Ys0 not containing a sentence boundary. next_sentence(Xs, Ys0, Ys) :- next_sentence(v, Xs, Ys0, Ys). next_sentence(Xs, Ys0) :- next_sentence(Xs, Ys0, Ys). :- pred next_line(normalisation_form, codes, codes, codes). :- next_line(NF, Xs, Ys0, Ys) when NF, Ys0. :- next_line(NF, Xs, Ys0, Ys) is semidet when ground(NF), ground(Xs), ground(Ys0). :- next_line(NF, Xs, Ys0, Ys) is semidet when NF==v, ground(Ys0). :- next_line(NF, Xs, Ys0, Ys) is bounded when ground(Ys0). next_line(NF, Xs, Ys0, Ys) :- append(Zs, Ys, Ys0), Zs \== [], normalised_compare(NF, =, Xs, Zs), Zs is the longest prefix of Ys0 not containing a line boundary. next_line(Xs, Ys0, Ys) :- next_line(v, Xs, Ys0, Ys). next_line(Xs, Ys0) :- next_line(Xs, Ys0, Ys).
call/
NThe ISO Prolog standard defines call/1
in section 7.8.3.
Looking at the definition is like a slap in the face; the committee
rejected my denotational semantics for Prolog on the grounds that it
was written in Pascal, then when I pointed out that on the contrary,
it was written in a pure functional language that wasn't anything
like Pascal, it was rejected on the grounds that it was too
"implementation-oriented". If that doesn't describe the definition
of call/1
, I don't know what does.
If call/1
was not in the language, we could define it
thus:
:- dynamic $call/1. call(Goal) :- asserta(( $call(Goal) :- retract(($call(_) :- _)), !, Goal )), $call(Goal).
In the Prolog type checking paper that Alan Mycroft and I wrote,
we proposed the call/
N family, in order to
permit type-checking Prolog higher-order code. Since then, some
Prologs have implemented this feature in the library (notably DEC-10
Prolog, C Prolog, Quintus Prolog) with an upper bound on N,
and some Prologs have built it into their language (SWI Prolog and,
I believe, NU Prolog).
Roughly speaking, one expects
call(p(X1,X2,X3)) :- p(X1, X2, X3). call(p(X1,X2), X3) :- p(X1, X2, X3). call(p(X1), X2, X3) :- p(X1, X2, X3). call(p, X1, X2, X3) :- p(X1, X2, X3).
Indeed, one way to implement the call/
N
family is to start with a skeleton
call(P, ...) :- \+ callable(P), !, error(uncallable term). [special purpose code for control structures if N <= 2] [gap] call(P, ...) :- error(undefined predicate).
and then whenever a predicate p/N is defined,
automatically add clauses to call/1, ..., call/N+1
in the appropriate gaps. In a WAM-based system, this can be
done by exploiting switch-on-term without any choice-points
being created, so call/N in the normal case is just
a hash table look-up, a bit of argument shuffling, and a jump.
The space requirement is O(P.A2), where P is the
number of predicates and A is the arity. Since arities are
usually fairly low, this is in effect linear in the size of the
program. I have tried this, and found it to be a pleasantly
efficient (as well as obvious) way to implement call/
N.
There are other ways to implement call/
N which are
nearly as efficient, and have only O(P) space overhead, the hash table needed
to map from a predicate's functor to its address.
This family of predicates has proven its usefulness. It's time they were in the community standard.
:- pred call(void(T1), T1). :- pred call(void(T1,T2), T1, T2). :- pred call(void(T1,T2,T3), T1, T2, T3). :- pred call(void(T1,T2,T3,T4), T1, T2, T3, T4). :- pred call(void(T1,T2,T3,T4,T5), T1, T2, T3, T4, T5). :- pred call(void(T1,T2,T3,T4,T5,T6), T1, T2, T3, T4, T5, T6). % Sample implementation. % Beware: this is highly inefficient. Don't do it this way. call(P, Y1) :- P =.. L0, append(L0, [Y1], L1), Q =.. L1, call(Q). call(P, Y1, Y2) :- P =.. L0, append(L0, [Y1,Y2], L2), Q =.. L2, call(Q). call(P, Y1, Y2, Y3) :- P =.. L0, append(L0, [Y1,Y2,Y3], L3), Q =.. L3, call(Q). call(P, Y1, Y2, Y3, Y4) :- P =.. L0, append(L0, [Y1,Y2,Y3,Y4], L4), Q =.. L4, call(Q). call(P, Y1, Y2, Y3, Y4, Y5) :- P =.. L0, append(L0, [Y1,Y2,Y3,Y4,Y5], L5), Q =.. L5, call(Q). call(P, Y1, Y2, Y3, Y4, Y5, Y6) :- P =.. L0, append(L0, [Y1,Y2,Y3,Y4,Y5,Y6], L6), Q =.. L6, call(Q).
meta_predicate
As soon as you have call/1
, you have a problem.
Cross-referencers don't work. You can fix this for the specific
cases of call/1
, findall/3
,
bagof/3
, setof/3
, and a few others,
but as soon as a Prolog programmer writes
gcc(Goal) :- \+ \+ Goal. % garbage-collecting call
the cross-referencer stops working. The C call graph tool on my workstation is similarly limited; if you have a function which is passed to another function and not called directly, it thinks that function is never called. That's not what I call a usable call graph tool; it is worse than having no call graph at all.
Characteristically, the ISO committee missed the point.
They thought that meta_predicate
has something to do with modules. It does, but only accidentally.
The real function of meta_predicate
declarations is to
provide just enough type information for a cross-referencer to work.
The idea of meta_predicate
declarations was to start with
existing mode declarations (so the arguments would be +, -, or ?) and
change some of the arguments to something else to say "this argument is
called".
Had call/1
been the only game in town, something like
'call' would have been adequate, so we'd have had
:- meta_predicate findall(?, call, -), % not final version gcc(call). % not final version
But the call/
N family came into Prolog about
the time the DEC-10 cross-referencer was written, and replaced prior
code that did things like
all(P, [], []). all(P, [X|Xs], [Y|Ys]) :- apply(P, [X,Y]), all(P, Xs, Ys).
Very often, when an argument is passed to a meta-predicate, it is a closure, not a goal; the callee will supply more arguments (at the right) before calling it. This is information that a cross-referencer needs.
So, quite independently of any module system, we need meta-predicate declarations of this form:
meta_predicate_directive --> [(:-),(meta_predicate)], meta_predicate_declarator, rest_meta_predicate_declarators. rest_meta_predicate_declarators --> ['.']. rest_meta_predicate_declarators --> [','], meta_predicate_declarator, rest_meta_predicate_declarators. meta_predicate_declarator --> atom(_). meta_predicate_declarator --> atom(_), ['('], meta_predicate_argument, rest_meta_predicate_arguments. rest_meta_predicate_arguments --> [')']. rest_meta_predicate_arguments --> [','], meta_predicate_argument, rest_meta_predicate_arguments. meta_predicate_argument --> [+]. meta_predicate_argument --> [-]. meta_predicate_argument --> [?]. meta_predicate_argument --> [:]. % for modules. meta_predicate_argument --> integer(N), {N >= 0}.
The meta_predicate arguments (+), (-), and (?) are ignored by the system, but may be processed by a mode inference program like Chris Mellish's, and indicate the intended use of the corresponding arguments.
The meta_predicate argument (:) is only useful when there is a module system, otherwise it is just like (+). It says that the corresponding argument will not be treated as a goal or closure, but should be subject to module name expansion anyway.
As a mode, the meta_predicate argument N has the same force as (+). But N tells the cross-referencer that the corresponding argument is to be a goal that is missing its rightmost N arguments.
With the aid of this declaration, we can write
:- meta_predicate findall(?, 0, ?), gcc(0), all(2, ?, ?). all(_, [], []). all(P, [X|Xs], [Y|Ys]) :- call(P, X, Y), all(P, Xs, Ys).
The ISO committee not only got hold of the wrong end of the stick, they broke it, in these ways:
metapredicate
,
which is downright ugly.
The result is that the ISO metapredicate
directives
are useless for cross-referencers. They are best regarded as obsolete.
But what if you don't care about cross-referencing? I guess you've
never used Masterscope for Lisp, or cscope(1) for C, or the Smalltalk
IDE, or ... if you can ask such a question. Come to think of it, if
you haven't used a reasonably reliable cross-referencer for Prolog,
you don't realise what a useful debugging tool it can be. (I leave you
to make your own inferences about the ISO Prolog committtee.) But let's
suppose you don't care about cross-referencing, and you
don't care about the added documentation value from stating
the number of missing arguments. Is there any reason why you
should care about meta_predicate
?
Yes. From its inception (which I can speak with authority about,
because I invented it), the intention was that the
meta_predicate
directive would licence a nonstandard
translation of meta-calls. Take all/3
for example:
all(append(Front), Xs, Ys)
can be translated as
all(<{address of append/3}, Front>, Xs, Ys)
where <Address,Arg1,...> is a special kind of
term that looks like a compound term, but has an address where the
functor should be. This permits an
implementation of call/3
which spreads the arguments
and jumps straight to the given address. This eliminates the space
overhead for meta-calls (there is no longer a need for a table of
the addresses of all predicates; the compiler has arranged for the
addresses of just the meta-called predicates to be available directly).
All right, all right; so that would also make cross-module meta-calls as efficient as any other meta-calls.
There is another thing that meta_predicate
does,
and it is related to the possibility of unusual implementations
of meta-calls, but it has a rather more profound theoretical reason.
If T1 and T2 are two terms representing goals, then T1 = T2 is,
strictly speaking, a second-order unification (or worse). But it
isn't implemented that way. In order to avoid the obvious soundness
and completeness problems, we want to check at compile time that
two meta-terms are unified only if at least one of them is
an unbound variable. This means that a meta-argument of a predicate
should only be a variable in the head, and that it should only be
unified with another variable, and that it should only be passed to
a meta-argument of some predicate. This is a type checking problem,
and it is again a very useful debugging tool. But you can't even
get started on it without meta_predicate
; the ISO
metapredicate
abortion is quite useless for the job.
So even if we don't include modules in the community standard,
and even if we don't include call/
N, we
still need either meta_predicate
or a richer
type declaration/checking system.
We don't strictly speaking need lambda-expressions in Prolog; ordinary closures suffice quite often, and when they don't we can always write an extra predicate. But then, lambda-lifting shows that you don't strictly speaking need lambda-expressions in functional languages either; you can always move the thing out to top level, give it a name, and partially apply it at the original place of use.
Suppose we had a form something like
all((\(X, Y) :- Body), Xs, Ys)
A compiler could implement this by finding the intersection of (free variables of Body \ free variables of \(X,Y)) with the free variables of the rest of the clause that contains it, and generating
p(FreeVars, X, Y) :- Body. ... all(p(FreeVars), Xs, Ys)
The interesting point is that it isn't possible to do this at run time. If we simply implemented
call((\(X1,X2) :- Body), X1, X2) :- call(Body).
then the first call to the lambda-expression would "use it up".
We couldn't use this in a call to all/3
, unless all of the
elements of Xs were equal and so were all of the elements of Ys.
If on the other hand we implemented
call((\(X1,X2):-Body), Y1, Y2) :- copy_term(foo(X1,X2,Body), foo(Y1,Y2,Goal)), call(Goal).
then
common_prefix(Front, Xs, Ys) :- all((\(X, Y) :- append(Front, Xs, Ys))).
wouldn't work, because Front would be copied, and it shouldn't be.
With compiler assistance (as in Mercury), lambda-expressions are fine. Without compiler assistance, lambda-expressions either don't work or require more, and more error-prone, annotation than I think the average programmer would be happy with. I do not want to recommend anything which requires compiler support or even that there be a compiler.
meta_predicate
directives with 0
as a meta-argument annotation distinct from ':'. This is of great
practical use. A non-module compiler can simply ignore such directives.
A module compiler that cannot exploit the extra information can simply
treat them as alternative syntax for metapredicate
.
call/
N
family; they are trivial to implement (I do not say trivial to implement
efficiently, but even that is not particularly hard) and very
useful.
We can derive many of the operations in the Quintus Prolog library (which is a revision and extension of the DEC-10 Prolog library) by projection from a small number of predicates. These "base" predicates are specified in some detail, the specifications of the "derived" predicates are then obvious consequences. I do not claim that the "base" predicates are "basic" in any sense, only that they are "universal relations" from which a range of more useful "views" can be projected.
"Base" predicates have been chosen solely for convenience in
specification. I do not say that they are "basic" in any
way. They are generally harder to understand than some of their
derived predicates. They are not, as a rule, a good implementation
basis. (For example, it would be very silly to implement
append/3
the way it is specified here.) What makes a
predicate a "base" predicate is that it makes a good base for
specification, so that details of types and so on have to be
repeated as little as possible.
The notion of "projection" is closely related to the notion of "bootstrapping" (did Mrs Malaprop choose that word?) in the ISO Prolog standard. Many (but by no means all) of the "bootstrapped" predicates in the ISO Prolog standard are projections. Predicate P is a projection of predicate Q if and only if P is not only a special case of Q, but has its arguments in the same order as Q. The idea is that the strong semantic link between P and Q should make it easy for a programmer to memorise the argument order of Q and thereafter be able to correctly predict the argument order of P without any additional memory burden.
append_append_length/7
If any predicates have universal acceptance in Prolog, they must
be append/3
, member/2
, and length/2
.
They are all projections of a single predicate.
:- pred append_append_length( list(X), list(X), list(X), list(Y), list(Y), list(Y), integer). :- append_append_length(A, B, AB, C, D, CD, N) when A ; AB ; C ; CD ; N. :- append_append_length(A, B, AB, C, D, CD, N) is semidet when list_skel(A) ; list_skel(C) ; nonvar(N). :- append_append_length(A, B, AB, C, D, CD, N) is bounded when list_skel(AB) ; list_skel(CD). % Sample implementation. append_append_length(A, B, AB, C, D, CD, N) :- ( var(N) -> $append_append_length(A, B, AB, C, D, CD, 0, N) ; integer(N) -> N >= 0, $append_append_length(A, B, AB, C, D, CD, N) ; $error ). $append_append_length([], B, B, [], D, D, N, N). $append_append_length([X|A], B, [X|AB], [Y|C], D, [Y|CD], N0, N) :- N1 is 1 + N0, $append_append_length(A, B, AB, C, D, CD, N1, N). $append_append_length(A, B, AB, C, D, CD, N0) :- ( N0 =:= 0 -> A = [], B = AB, C = [], D = CD ; N1 is N0 - 1, A = [X|A1], AB = [X|AB1], C = [Y|C1], CD = [Y|CD1], $append_append_length(A1, B, AB1, C1, D, CD1, N1) ).
Semantics: append_append_length(A, B, AB, C, D, CD, N)
is true when length(A, N)
and length(C, N)
and
append(A, B, AB)
and append(C, D, CD)
. In
Mercury, where the compiler reorders code according to data flow, we could
write
append_append_length(A, B, AB, C, D, CD, N) :- length(A, N), length(C, N), append(A, B, AB), append(C, D, CD).
In ISO Prolog, however, this implementation is quite unsatisfactory.
?- append_append_length(A, "yz", "xyz", [X], R, "abc", N).
will find the (unique) solution, but if that is rejected, will then
backtrack forever, trying ever longer bindings for A
.
There is, in fact, no fixed ordering of these goals which is satisfactory for all queries, where "satisfactory" is defined as finding all solutions and terminating in a finite amount of time, for a query having a finite number of solutions.
append_append(A, B, AB, C, D, CD) :- append_append_length(A, B, AB, C, D, CD, _). append_length(A, B, AB, D, AD, N) :- append_append_length(A, B, AB, A, D, AD, N). append(A, B, AB, D, AD) :- append_length(A, B, AB, D, AD, _). append_length(A, B, AB, N) :- append_length(A, B, AB, A, B, AB, N). append(A, B, AB) :- append_length(A, B, AB, _). same_length(A, C, N) :- append(A, [], A, C, [], C, N). same_length(A, C) :- same_length(A, C, _). one_longer(A, C) :- same_length(A, [_|C]). shorter_list(A, C) :- append([_|A], [], [_|A], _, _, C) cons(H, T, L) :- append([H], T, L).
The following predicates come from library(list_parts)
in the Quintus library. However, in order that the argument order of
these predicates should be consistent with the other list processing
predicates, it is necessary to switch the argument order from
Whole,Part to Part,Whole, because that's the way append/3
is.
head(H, L) :- cons(H, _, L). tail(T, L) :- cons(_, T, L). prefix(P, L) :- append(P, _, L). proper_prefix(P, L) :- append(P, [_|_], L). suffix(S, L) :- append(_, S, L). proper_suffix(S, L) :- append([_|_], S, L).
The next group of predicates are already fully consistent with
append_append_length/7
.
length(X, N) :- append_length(A, [], A, N). member(X, L) :- append(_, [X|_], L). memberchk(X, L) :- member(X, L), !. :- nonmember(X, L) when ground(X), ground(L). :- nonmember(X, L) is semidet when ground(X), ground(L). nonmember(X, L) :- \+ member(X, L). select(X, L, R) :- append(_, [X|T], L, T, R). selectchk(X, L, R) :- select(X, L, R), !. select(X, Xs, Y, Ys) :- append(_, [X|Xs], L, [Y|Ys], R). selectchk(X, L, Y, R) :- select(X, L, Y, R), !. last(ButLast, X, L) :- append(ButLast, [X], L). last(X, L) :- last(_, X, L). nextto(X, Y, L) :- append(_, [X,Y|_], L).
This predicate corresponds to the old correspond/4
predicate, but has a consistent argument order:
member_member(X, Xs, Y, Ys) :- append(_, [X|_], Xs, _, [Y|_], Ys).
One operation which is commonly asked for is finding the
Nth element of a list. Here, unfortunately, we have
two clashing archetypes: arg/3
and the
append_append_length/7
family. The Quintus library
assimilated "Nth element" to arg/3
,
with predicates
nth0(N, List, Member) nth0(N, List, Member, Residue) nth1(N, List, Member) nth1(N, List, Member, Residue)
The clash with member(Member, List)
and
select(Member, List, Residue)
is unfortunate.
The fact that 0 and 1 have other meanings in the Quintus library
(and other meanings in connection with lists) is also unfortunate.
I am driven to the conclusion that the analogy with
arg(N, Term, Arg)
is bogus. In particular,
arg/3
only works one way around: it does not
solve for N
(although at the price of notably worse
generated code, it could) and it cannot solve for
Term
. This means that its first two arguments are,
on any understanding of the term, "inputs", and by the
inputs-before-outputs principle, should come first.
The "Nth element" predicates do not share
arg/3
's input/output restrictions. Indeed, they
are more likely to be used to find N for a known
Member than the other way around; it is tolerable
for linear search to take linear time, but not for indexing.
I use the name component "_offset" to refer to a number of elements to skip. You can think of this as 0-origin indexing if you like, but it is actually more fruitful to think of it quite literally as the length of some list segment. Indeed, "length" could be used to be consistent with the other "_length" predicates, except that in this context it might be confusing.
I use the name component "_index" to refer to a position in a list. Like argument positions in a term, member positions in a list always start with 1.
The Nth element predicates are
member_member_offset(X, Xs, Y, Ys, Offset) :- append_length(_, [X|_], Xs, _, [Y|_], Offset). member_member_index(X, Xs, Y, Ys, Index) :- member_member_index(X, [X|Xs], Y, [Y|Ys], Index), Index =\= 0. member_offset(Member, List, Offset) :- append_length(_, [Member|_], List, Offset). member_index(Member, List, Index) :- append_length(_, [Member|_], [Member|List], Index), Index =\= 0. select_offset(X, Xs, Y, Ys, Offset) :- append_length(_, [X|T], Xs, [Y|T], Ys, Offset). select_index(X, Xs, Y, Ys, Index) :- select_offset(X, [X|Xs], Y, [Y|Ys], Index), Index =\= 0. select_offset(Member, List, Residue, Offset) :- append_length(_, [Member|Tail], List, Tail, Residue, Offset). select_index(Member, List, Residue, Index) :- append_length(_, [Member|Tail], [Member|List], Tail, [Member|Residue], Index), Index =\= 0.
Prolog has traditionally had reverse/2
.
Common Lisp also provides (REVAPPEND - -)
,
and such a predicate is commonly part of the implementation
of reverse/2
.
:- type reverse_append_length(list(T), list(T), list(T), integer). :- reverse_append_length(L, B, RB, N) when L ; RB ; N. :- reverse_append_length(L, B, RB, N) is semidet when list_skel(L) ; integer(N). :- reverse_append_length(L, B, RB, N) is bounded when list_skel(RB). % Sample implementation. reverse_append_length(L, B, RB, N) :- append_length(L, [], L, _, B, RB, N), $reverse(L, B, RB). $reverse([], B, B). $reverse([X|L], B, RB) :- $reverse(L, [X|B], RB).
Semantics: reverse_append(L, B, RB, N)
is true when RB = reverse(L)++B
and
length(L) = N
.
Notation: There is no new notation here, but I feel that I must repeat the warning that the "sample implementations" of "base" predicates are crafted for exposition and specification. This is not the way that I would implement this predicate for practical use. What implementor would use a two-pass algorithm when a one-pass algorithm could be used?
Except that L
is to be reversed, the argument
order of this predicate is identical to the argument order of
append_length/4
, as the name suggests.
reverse_append(L, B, RB) :- reverse_append_length(L, B, RB, _). reverse_length(L, R, N) :- reverse_append_length(L, [], R, N). reverse(L, R) :- reverse_length(L, R, _).
permutation_length
:- pred permutation_length(list(T), list(T), integer). :- permutation_length(A, B, N) is bounded when list_skel(A) ; list_skel(B) ; nonvar(N). % Sample implementation permutation_length(A, B, N) :- same_length(A, B, N), $perm(A, B). $perm([], []). $perm([X|Xs], Ys1) :- $perm(Xs, Ys), append(_, R, Ys, [X|R], Ys1).
permutation(A, B) :- permutation_length(A, B, _).
pairs_keys_values/3
Given the great practical utility of keysort/2
,
it is useful to be able to build and dismantle the lists it
requires and produces.
I am not satisfied with the name that was used in the Quintus
library (keys_and_values/3
, because it does not make clear
which argument is which. Nor is there any way to appeal to 'inputs
before outputs' because the predicate is used both ways around. I am
tentatively calling it pairs_keys_values/3
.
Whatever it's called, it is a surprisingly useful predicate.
:- pred pairs_keys_values(list(pair(K,V)), list(K), list(V)). :- pairs_keys_values(Pairs, Keys, Values) when Pairs ; Keys ; Values. :- pairs_keys_values(Pairs, Keys, Values) is semidet when list_skel(Pairs) ; list_skel(Keys) ; list_skel(Values). % Sample implementation pairs_keys_values([], [], []). pairs_keys_values([K-V|Pairs], [K|Keys], [K|Values]) :- pairs_keys_values(Pairs, Keys, Values).
Note that the sample implementation doesn't actually satisfy the requirements. In a traditional first-argument-indexing Prolog, if the first argument is a variable but one of the others is a list skeleton, the determinism of the query will not be detected.
I'm not sure it's worth bothering with these.
pairs_keys(Pairs, Keys) :- pairs_keys_values(Pairs, Keys, _). pairs_values(Pairs, Values) :- pairs_keys_values(Pairs, _, Values).
The definition of the "delete" predicate is something I
struggled with for years. Part of the problem is that if list
R does not contain X, there are infinitely many lists L such
that deleting X from L yields R, so the predicate really only
works as a function. That tells us that L and X should precede
R in the argument order. Should it be
delete(Member, List, Residue)
to be consistent
with select/3
? Or, because delete/3
is dangerously different from select/3
, is it
actually a good thing if they are different?
Prolog has three versions of equality: (=)/2, (==)/2, and (=:=)/2. Which of them should be used? Or should we have three predicates, the way Scheme used to have DELQ!, DELV!, and DELETE!?
Should the item to be deleted be a strict input, or should
delete/3
be able to backtrack over the elements of
the input list as in Cugini's library?
If X does not occur in L, is that a good input, or should it fail?
There is a definition in the Quintus library. It's basically this:
/* Historic background, NOT for inclusion */ :- pred delete(list(T), T, list(T)). :- delete([], _, _) when true. :- delete([H|T], X, R) when ground(H), ground(X). :- delete(L, X, R) is semidet when ground(L), ground(X). % Sample implementation delete([], _, []). delete([H|T], X, R) :- H == X, !, delete(T, X, R). delete([H|T], X, [H|R]) :- delete(T, X, R).
I don't want to take delete/3
away from the
people who use it, if there are any, but it clearly doesn't
belong in even an informal standard. The problem is that there
are so many choices that none of them is going to be the right
one for the majority of uses (if uses there be); none of them
deserves the exclusive right to the name delete/3
.
The history of Scheme may be of interest here. There used to be
three delete-from-list functions in Scheme, but there aren't
any these days.
One of the questions about deleting from a list is how many
elements to delete. I note that when I want to delete a
known element from a list, I almost always want to delete just
one copy (usually because I know there is only one copy). For
that purpose, selectchk(X, L, R)
works very well.
The result is that despite writing delete/3
,
I cannot recall ever using it except to test it.
If delete/3
is not standard, what can programmers
do instead? One easy answer is to use findall/3
:
findall_member(Member, List, Test, Result) :- findall(Member, (member(Member, List), \+ \+call(Test)), Result). ... findall_member(Y, L, \+ Y = X, R) ... findall_member(Y, L, Y \== X, R) ... findall_member(Y, L, Y =\= X, R)
Using findall_member/4
instead of
delete/3
has some important advantages:
delete/3
in earnest, I would have used this.
The only bad thing about this predicate is efficiency.
But that's why I introduced findall_member/4
in
the first place, instead of calling findall/3
directly. It's comparatively simple for a compiler to recognise
calls to findall_member/4
and generate a three-clause
loop (essentially the structure of delete/3
shown
above), using a cache to ensure that the same code is not generated
repeatedly. Or for a compiler such as BinProlog, where
findall/3
is fast, the compiler could just convert
a findall_member/4
call to the corresponding
findall/3
code.
The findall/4
predicate, which combines the effect
of findall/3
with the effect of append/3
,
is unaccountably missing from ISO Prolog. When you know about it,
you find nearly as many uses for findall/4
as for
findall/3
. This suggests to me that the right "base"
predicate is not findall_member/4
but
findall_member/5
.
:- pred findall_member(T, list(T), void, list(T), list(T)). :- findall_member(M, L, P, R) is semidet when list_skel(L), nonvar(P). % Sample implementation. findall_member_append(_, [], _, R, R). findall_member_append(M, [X|Xs], P, R0, R) :- \+ (X = M, call(P)), !, findall_member_append(M, Xs, P, R0, R). findall_member_append(M, [X|Xs], P, [X|R0], R) :- findall_member_append(M, Xs, P, R0, R).
findall_member(M, Xs, P, R) :- findall_member(M, Xs, P, R, []). delete(L, X, R) :- findall_member(M, L, X \== M, R). disjoint(S1, S2) :- findall_member(M, S1, memberchk(M, S2), []). intersection(S1, S2, S) :- findall_member(M, S1, memberchk(M, S2), S). difference(S1, S2, S) :- findall_member(M, S1, nonmember(M, S2), S). union(S1, S2, S) :- findall_member(M, S1, nonmember(M, S2), S, S2). symdiff(S1, S2, S) :- findall_member(M, S1, nonmember(M, S2), S, S3), difference(S1, S2, S3).
I am not actually recommending any of the predicates in this
section. In particular, I am not recommending any of the
traditional quadratic-time set-as-unordered-list predicates for
inclusion in the "community standard library". What I do
recommend is that either no such predicates (letting people call
findall/3
themselves, or write and document their
own deletion predicates), or else one predicate should be
included in the library, and that predicate should be
findall_member/5
.
The Quintus library contains delete(L, X, N, R)
which deletes the first N
copies of X
in L
, or fewer if there aren't that many. To be
honest, it is difficult to imagine any serious use of this.
N=0
is just R=L
. N=1
is just selectchk/3
. N
larger than
the length of L
is just delete/3
. I
never did find any other uses for it.
sort/2
and keysort/2
may not be the most
useful predicates in Prolog, but they are somewhere near the top. In
a language without mutable data structures, but with a total order on
the data structures it does have, sorting is often indispensable for
getting code that is usefully fast.
The notion of a sorted list is clearly spelled out in the Prolog
standard. Unbelievably, there are no means whatsoever of sorting a
general list yourself; the only use of the "sorted list" notion is in
setof/3
. What's more, the Prolog standard doesn't even
guarantee that the result of setof/3
will remain
sorted, or even be sorted when it is first provided: the
ordering on variables is only required to be consistent during the
construction of a sorted list. Does this mean that the Prolog standard
allows an implementation where
length(L, N), N > 0, setof(X, member(X, L), S1), setof(X, member(X, L), S2), S1 == S2
may fail? Yes, it does.
In the same way that setof/3
needs sort/2
(yet the Prolog standard does not provide it), so bagof/3
needs keysort/2
, and the Prolog standard does not provide
it. While this makes it distressingly plain that the Prolog standardisers
did not do a very good job, is it anything more than a nuisance? After
all, it is quite easy to build a good sorting predicate on top of the
compare/3
primitive.
Oddly enough, while ISO Prolog provides
(@=<)/2
, (@<)/2
,
(@>=)/2
, (@>)/2
,
(==)/2
, and (\==)/2
,
it does not provide compare/3
.
If one wants a minimal standard, this is back to front.
Given compare/3
, the other six comparisons
can be implemented at low cost. (Although == and \== can
be done even faster.) But given the other six comparisons,
the best you can do for compare/3
is
compare(R, X, Y) :- ( X @> Y -> R = (>) ; X == Y -> R = (=) ; R = (<) ).
which would nearly double the cost of some sorting algorithms.
Using the six comparison operators instead of compare/3
also forces you to use cuts or if-then-else in many cases where
pure logic with no cuts of any kind could be used. This too is often
an efficiency problem.
There is an additional problem. In the presence of variables,
the ISO Prolog term ordering is not required to be stable.
It's required to be stable during a call to a built in predicate
such as (@=<)/2
or setof/3
, but it is
not required to be consistent from call to call. This means
that
X @< Y, X @< Y X @< Y, setof(Z, (Z=X;Z=Y), [X,Y]) sort(X, p(X), S1), sort(X, p(X), S2), S1 == S2
sort/2
or keysort/2
for themselves
in ISO Prolog.
Both the efficiency and the instability problems suggest that some of the ISO Prolog committee didn't have a clue about what Prolog term ordering is for.
The built-in predicate compare/3
must
be available with the same status as any other built-in predicate. For example,
it should be in the same module as is/2
or
clause/2
.
The term ordering must be stable in the sense that only the disappearance of a variable (when it becomes bound to something else) can alter the order of two terms.
It is expected that compare(R,T1,T2)
is linear in the
size of its arguments (in fact, of the smaller of the two), and allocates
no memory. It is possible to implement Prolog atoms with a symbol table
so structured that atom comparisons take O(1) amortised time, but the
requirement is actually "linear in the number of characters".
keycompare(R, K1-_, K2-_) :- compare(R, K1, K2).
This is useful
for specifying keysort/2
and a number of other predicates.
:- pred sort(+void(?order,T,T), +list(T), ?list(T)). :- sort(P, L, S) is semidet. :- pred msort(+void(?order,T,T), +list(T), ?list(T)). :- msort(P, L, S) is semidet.
The sort/3
predicate sorts its second argument,
using the first argument to determine the order, and unifies its
third argument with the result. If P(X1,X2) fails for any X1,X2
in L, sort(P, L, S) may fail. If P is not a partial order, the
result is not defined. P may be called up to O(N.lgN) times, where
length(L, N). In addition, only O(N.lgN) other work may be done.
If two or more elements of L are equal according to
P, only the first (leftmost) element will be retained.
The msort/3
predicate is like sort/3
but does not remove duplicates. It must be stable: if
L = [...,X1,...,X2,...] and P(=,X1,X2), then S = [...,X1,...,X2,...].
sort(L, S) :- sort(compare, L, S). msort(L, S) :- msort(compare, L, S). keysort(L, S) :- msort(keycompare, L, S). keysort(P, L, S) :- msort($keycompare(P), L, S). $keycompare(P, R, K1-_, K2-_) :- P(R, K1, K2).
The sort/2
and keysort/2
predicates
are traditional. The need for msort/2
has often been
felt; it is present in SWI Prolog, for example. The need for generalised
versions has also been felt; LPA Prolog had a rather clumsy version
which the higher order sorting predicates can simulate. With
sort/3
we can do
sort(codes:normalised_compare(kc), L, R)
without requiring a special-purpose predicate.
The derived comparison and sorting predicates should be in the
same module as compare/3
.
SWI Prolog includes merge/3
, but not the obvious (and
sometimes useful) keymerge/3
; they are currently not
included in this proposal. It also includes a merge_set/3
predicate, which appears to be an alias for ord_union/3
.
Not only is it possible to have a stable sort with guaranteed worst case O(N.lgN), it is possible to have a stable sort which has linear cost when the input list is the concatenation of a small number of sorted lists (sorted by @=<, not just @<). For example,
merge(L1, L2, L) :- append(L1, L2, L12), msort(L12, L).
could be an O(|L|) implementation of merging. It is tempting to require that such an algorithm be used; this makes converting an already sorted list to ordered set form cheap, for example.
Term ordering in Prolog is based on the underlying coded
character set. It is not intended to conform to the collating order
of any natural language in any culture. Amongst other things,
compare/3
must never report two terms as equal unless
they are completely indistinguishable by Prolog. However, natural
language collating orders may do things like ignoring alphabetic case
so that 'bin' and 'BIN' sort together. Term ordering is also supposed
to be fast; locale-sensitive ordering can be quite costly.
When producing output for people, it can be useful to compare and
sort text in a way that people find reasonable. Just as ISO C has
both strcmp()
and strcoll()
, so ISO Prolog
needs both compare/3
and locale_compare/[3,4]
.
:- pred locale_compare(atom, order, T, T). locale_compare(LocaleName, Order, T1, T2) is semidet when ground(LocaleName). :- pred locale_compare(order, T, T). locale_compare(Order, T1, T2) is semidet. locale_compare(R, T1, T2) :- "L is the user's default locale", locale_compare(L, R, T1, T2).
These are defined in terms of locale names, not locale terms. Presumably they would be implemented by caching, so that sorting a list of a million terms would determine the locale's collation rules at most once.
locale_sort(R, S) :- sort(locale_compare, R, S). locale_msort(R, S) :- msort(locale_compare, R, S). locale_keysort(R, S) :- msort($locale_keycompare, R, S). $locale_keycompare(R, K1-_, K2-_) :- locale_compare(R, K1, K2). locale_sort(L, R, S) :- sort(locale_compare(L), R, S). locale_msort(L, R, S) :- msort(locale_compare(L), R, S). locale_keysort(L, R, S) :- msort($locale_keycompare(L), R, S). $locale_keycompare(L, R, K1-_, K2-_) :- locale_compare(L, R, K1, K2).
People sometimes want a way to generate a subset of a given set.
The Quintus Prolog predicates subseq
/3,
subseq0
/2, and subseq1
/2 address that.
subseq
(AB, A, B) is true when AB is an interleaving
of A and B. If AB represents a set, A will be a subset of AB, and
B will be AB\A. It is regarded as a list operation, rather than a
set operation, because it preserves the order of the elements.
subseq0(AB, A) :- subseq(AB, A, _). subseq1(AB, A) :- subseq(AB, A, _), A \== AB.
The problem with this is that there are 2|AB| solutions. That makes these predicate useless except for rather small lists. In most potential applications for this predicate, there is some kind of
Do not include these predicates. In fact, think 50 times before including any predicates with exponential cost.
The following predicates from the DEC-10 and/or Quintus libraries have not been slotted into families yet.
min
,
max
}_{member([P, ]X, L)
,
select([P, ]X, L, R)
}. In particular, it is time that
msort/2
was adopted.
What self-respecting list-processing language would be caught dead without a basic kit of higher order list processing operations? Only Prolog, I guess.
The notation p/N+k is used for a family of predicates iterating over N lists, having k other arguments.
The following predicate families come from DEC-10 Prolog and Quintus Prolog:
map/
N+1.
maplist(P, [X11,...,X1n], ..., [Xm1,...,Xmn]) :- P(X11, ..., Xm1), ... P(Xm1, ..., Xmn).
scanlist(P, [X11,...,X1n], ..., [Xm1,...,Xmn], V0, Vn) :- P(X11, ..., Xm1, V0, V1), ... P(Xm1, ..., Xmn, V', Vn).
cumlist(P, [X11,...,X1n], ..., [Xm1,...,Xmn], V0, [V0,V1,...,Vn]) :- P(X11, ..., Xmn, V0, V1), ... P(Xm1, ..., Xmn, V', Vn).
some(P, [X11,...,X1n], ..., [Xm1,...,Xmn]) :- ( P(X11, ..., Xm1) ; ... ; P(Xm1, ..., Xmn) ).
Why don't we need it? Suppose we generalise member,
to member
/2N:
member(X1, [X1|_], ..., Xn, [Xn|_]). member(X1, [_|Xs1], ..., Xn, [_|Xsn]) :- member(X1, Xs1, ..., Xn, Xsn).
Then instead of
some(P, Xs1, ..., Xsn)
we can write
member(X1, Xs1, ..., Xn, Xsn), P(X1, ..., Xn)
and we not only avoid all overheads of passing P and get to call it directly, if the query is try, we get to find out which values of X1...Xn make it true.
If an analogue of some
/N+1 is implementable, so is an
analogue of member
/2N.
somechk(P, [X11,...,X1n], ..., [Xm1,...,Xmn]) :- ( P(X11, ..., Xm1) -> true ; ... ; P(Xm1, ..., Xmn) -> true ).
Similarly, instead of
somechk(P, Xs1, ..., Xsn)
we can write
(member(X1, Xs1, ..., Xn, Xsn), P(X1, ..., Xn) -> true)
and get to find out why P is true as well that it is true.
include(P, [], ..., [], []). include(P, [X1|Xs1], ..., [Xn|Xsn], Included) :- ( call(P, X1, ..., Xn) -> Included = [X1|Included1] ; Included = Included1 ), include(P, Xs1, ..., Xsn, Included1).
Note that this really requires P to be determinate, but it does allow variables in P to be bound. For a ground P, we could manage without this, writing
findall(X1, ( member(X1, Xs1, .., Xn, Xsn), P(X1, ..., Xsn) ), Included)
would do the job. Some Prolog systems (BinProlog) may implement
findall
/3 so very efficiently to make this an attractive
alternative.
exclude(P, [], ..., [], []). exclude(P, [X1|Xs1], ..., [Xn|Xsn], Included) :- ( call(P, X1, ..., Xn) -> Included = Included1 ; Included = [X1|Included1] ), exclude(P, Xs1, ..., Xsn, Included1).
There were and are others, but this will do to start with.
The obvious missing predicates are
foldr(P, [X11,...,X1n], ..., [Xm1,...,Xmn], V0, Vn) :- P(Xm1, ..., Xmn, V', Vn), ... P(X11, ..., Xm1, V0, V1). scanr(P, [X11,...,X1n], ..., [Xm1,...,Xmn], Vn, [V0,V1,...,Vn]) :- P(Xm1, ..., Xmn, V', Vn), ..., P(X11, ..., Xm1, V0, V1).
maplist
/N+1,
either under that name for backwards compatibility,
or under the name map
/N+1
for cultural compatibility with the functional programming
community.
scanlist
/N+3,
but under the name foldl
/N+3.
cumlist
/N+3,
but under the name scanl
/N+3.
foldr
/N+3 and
scanr
/N+3.
include
/N+2 and
exclude
/N+2, under those names;
if we called one of them 'filter', what would we call the
other?
member
/2N under some name
for a useful range of N.
some
/N+1
or somechk
/N+1, because it is easy to
manage without them. I do not say that nobody should
provide them, only that we should not require them of
everyone.
This is very much a minimal set of higher-order list operations.
Many set representations can be used in Prolog: unordered lists, ordered lists, delta-coded ordered lists, bitstrings, binary search trees of various types, ternary search trees.
It is important that the commonly available implementation of sets should be efficient enough to use without hesitation. It is also important that the operation names should not conflict with widely used operation names for other set representations.
This means that either names such as union/3
must
be avoided, or set predicates must not be used without a module
prefix to identify the representation.
Without a change to ISO Prolog, we cannot enforce a requirement that module prefixes be used, so we have to avoid commonly used names.
The representation most widely used in Prolog textbooks is the unordered list representation. This is also the representation used in Common Lisp.
The ordered list representation has its flaws (adding, removing, or checking for the presence of a single element is O(|Set|) rather than O(lg|Set|)), but the unordered list representation has no virtues.
One of the flaws of any term comparison-based set representation
in ISO Prolog is going to be that it allows variables to be set
elements, so that two terms can be in a set because when they were
entered they were not identical, but subsequent variable bindings
can make them identical. The answer to this is to use a mode system
which requires that set elements are sufficiently instantiated,
perhaps via a "safe" version of compare/3
.
I recommend the ordered set representation. It is space efficient,
time efficient for bulk operations, and is already used in Prolog
(the sort/2
and setof/3
predicates deliver
results of this form, which we might as well exploit).
In order to preserve backwards compatibility and to avoid name
clashes with the unordered list representation, all predicate names
have the form ord_
something.
The ordset representation cannot be implemented in Prolog without
the use of Prolog's term ordering.
As noted in the section on Sorting,
compare/3
is not provided, even though term
ordering is defined. That is really bad.
The compare/3
predicate must be available to Prolog programmers.
ord_op/5
:- type ordset(T) = list(T). :- pred ord_op(integer, ordset(T), ordset(T), ordset(T), ordset(T)). :- ord_op(M, Xs, Ys, S1, S2) when M, Xs, Ys. :- ord_op(M, Xs, Ys, S1, S2) is semidet when ground(M), ground(Xs), ground(Ys). ord_op(M, [], Ys, S1, S2) :- ( M /\ 2'000010 =:= 0 -> S1 = [] | S1 = Ys ), ( M /\ 2'010000 =:= 0 -> S2 = [] | S2 = Ys ). ord_op(M, [X|Xs], Ys, S1, S2) :- $ord_op1(M, Xs, Ys, S1, S2, X). $ord_op1(M, Xs, [], S1, S2, X) :- ( M /\ 2'000001 =:= 0 -> S1 = [] | S1 = [X|Xs] ), ( M /\ 2'001000 =:= 0 -> S2 = [] | S2 = [X|Xs] ). $ord_op1(M, Xs, [Y|Ys], S1, S2, X) :- compare(R, X, Y), $ord_op12(R, Xs, Ys, S1, S2, X, Y, M). $ord_op2(M, [], Ys, S1, S2, Y) :- ( M /\ 2'000001 =:= 0 -> S1 = [] | S1 = [Y|Ys] ), ( M /\ 2'001000 =:= 0 -> S2 = [] | S2 = [Y|Ys] ). $ord_op2(M, [X|Xs], Ys, S1, S2, Y) :- compare(R, X, Y), $ord_op12(R, Xs, Ys, S1, S2, X, Y, M). $ord_op12(<, Xs, Ys, S1, S2, X, Y, M) :- ( M /\ 2'000001 =:= 0 -> S1 = T1 | S1 = [X|T1] ), ( M /\ 2'001000 =:= 0 -> S2 = T2 | S2 = [X|T2] ), $ord_op2(M, Xs, Ys, T1, T2, Y). $ord_op12(>, Xs, Ys, S1, S2, X, Y, M) :- ( M /\ 2'000010 =:= 0 -> S1 = T1 | S1 = [Y|T1] ), ( M /\ 2'010000 =:= 0 -> S2 = T2 | S2 = [Y|T2] ), $ord_op1(M, Xs, Ys, T1, T2, X). $ord_op12(=, Xs, Ys, S1, S2, X, Y, M) :- ( M /\ 2'000100 =:= 0 -> S1 = T1 | S1 = [X|T1] ), ( M /\ 2'100000 =:= 0 -> S2 = T2 | S2 = [Y|T2] ), ord_op(M, Xs, Ys, T1, T2).
Looking at this predicate, we see that there is a clear notion of inputs (M, Xs, Ys) and outputs (S1, S2).
There are two ways to define is_ordset/1
. One
enforces the groundness of an ordered set. That is appropriate
for systems which can delay calls to (some analogue of)
compare/3
. The other is useful for hacking, and
allows set elements to be non-ground. At this stage I do not
wish to make any recommendation, so I merely note the two definitions:
is_ordset(S) :- % enforces groundness ground(S), sort(S, S). is_ordset(S) :- % allows sets of variables nonvar(S), $is_ordset(S). $is_ordset([]). $is_ordset([X|Xs]) :- nonvar(Xs), $is_ordset(Xs, X). $is_ordset([], _). $is_ordset([Y|Ys], X) :- nonvar(Ys), X @< Y, $is_ordset(Ys, Y).
The list processing predicates, especially
length/2
, member/2
, and select/3
can be used with ordered sets. The memberchk/2
and
selectchk/3
predicates are rather more dubious, and
need their own implementations.
list_to_ordset(List, Set) :- sort(List, Set). ord_op(M, Xs, Ys, S1) :- ord_op(M, Xs, Ys, S1, []). ord_compare(R, S1, S2) :- ord_op(2'010001, S1, S2, D12, D21), $ord_compare(R, D12, D21). $ord_compare(<, [], [_|_]). $ord_compare(>, [_|_], []). $ord_compare(=, [], []). ord_disjoint(S1, S2) :- ord_op(2'100, S1, S2, []). ord_disjoint_union(S1, S2, S) :- ord_op(2'100011, S1, S2, S, []). ord_intersect(S1, S2) :- ord_op(2'100, S1, S2, [_|_]). ord_intersection(S1, S2, S) :- ord_op(2'100, S1, S2, S). ord_intersection([], []). ord_intersection([S1|Ss], S) :- ord_intersection(Ss, S2), ord_intersection(S1, S2, S). ord_memberchk(X, S) :- ord_intersect([X], S). ord_nonmember(X, S) :- ord_disjoint([X], S). ord_proper_subset(S1, S2) :- ord_op(2'010001, S1, S2, [], [_|_]). ord_proper_superset(S1, S2) :- ord_op(2'001010, S1, S2, [], [_|_]). ord_selectchk(X, S1, S2) :- ord_op(2'010, [X], S1, S2). ord_subset(S1, S2) :- ord_op(2'001, S1, S2, []). ord_subtract(S1, S2, S) :- ord_op(2'001, S1, S2, S). ord_superset(S1, S2) :- ord_op(2'010, S1, S2, []). ord_symdiff(S1, S2, S) :- ord_op(2'011, S1, S2, S). ord_union([], []). ord_union([S1|Ss], S) :- ord_union(Ss, S2), ord_union(S1, S2, S). ord_union(S1, S2, S) :- ord_op(2'111, S1, S2, S). ord_union(S1, S2, U, D) :- ord_op(2'001111, S1, S2, U, D).
There are two names I am not happy with here.
I would prefer it if ord_subtract/3
had a less "imperative"
name, such as ord_difference/3
. It would also be clearer
if ord_union/4
were named ord_union_difference/4
.
However, that would break backwards compatibility, and that's bad manners
There are four predicates missing from the old library.
ord_seteq/2
has been generalised to
ord_compare/3
.
ord_add_element(X, S0, S)
is replaced by
ord_union([X], S0, S)
.
ord_del_element(X, S0, S)
is renamed to
ord_selectchk(X, S0, S)
to be compatible with
the list operations.
ord_setproduct/3
preserves set ordering but does not
depend on it. Its analogue probably belongs among the list operations.
Of course, nothing would prevent an implementation providing those predicates for backwards compatibility.
The higher order list operations can be applied to ordered sets.
Results will not necessarily be ordered sets, but can be converted
to ordered sets by sorting. (Except, of course, in the
thrice-accursed ISO "Prolog", where there is no "standard"
sort/2
.) It is useful that
[include,exclude]
/N+2 do not alter the order
of their 2nd argument and do not introduce duplicates, so filtering
an ordered set yields an ordered set.
Should be based on q(Unary,L0,L) representation. Each basic operation is O(1).
A queue is a sequence, rather like a list. Adding items or lists of items at either end is rather like appending, and so is removing an item or list of items from the head end. (That is to say, what I'm calling a "queue" here is really an "output-restricted deque").
There are many ways to implement queues. A single list makes some operations cheap and other operations very expensive. A pair of back-to-back lists works nicely in a strict functional language, but since there is a slow rebalancing operation which may be executed repeatedly on backtracking, they are not efficient in Prolog. The representation I propose here was invented by someone at SRI. Fernando Pereira told me about it, and I am sorry to say that I have forgotten the inventor's name. This representation has the nice property that adding an element at either end or removing an element from the head end requires just a single unification. This makes it well suited to Prolog.
There is one flaw in this representation: it uses list differences so that appending an item at the tail end must be regarded as destroying the original queue. Queue uses should be single threaded. Mercury can enforce this, but does not support the partially instantiated terms the technique relies on. Prolog does support partially instantiated terms, but cannot enforce single threading.
We found a similar problem with sets; a representation which Prolog is good at has technical flaws which mean that it has to be used carefully, but the efficiency of the representation is such that we are unwilling to abandon it.
:- type queue(T) ---> q(unary, list(T), list(T)). :- type unary ---> z ; s(unary). is_queue(q(U,L0,L)) :- nonvar(U), nonvar(L0), $is_queue(U, L0, L). $is_queue(z, L0, L) :- L0 == L. $is_queue(s(U), [_|L1], L) :- nonvar(U), nonvar(L1), $is_queue(U, L1, L). item_queue_append(X, q(U,L0,L), q(s(U),[X|L0],L)). list_queue_append([], Q, Q). list_queue_append([X|Xs], Q, Q0) :- item_queue_append(X, Q1, Q0), list_queue_append(Xs, Q, Q1). queue_item_append(q(U,L0,[X|L]), X, q(s(U),L0,L)). queue_list_append(Q, [], Q). queue_list_append(Q0, [X|Xs], Q) :- queue_item_append(Q0, X, Q1), queue_list_append(Q1, Xs, Q). queue_list(q(U,L0,L), Xs) :- $queue_list(U, L0, L, Xs). $queue_list(z, L, L, []). $queue_list(s(U), [X|L1], L, [X|Xs]) :- $queue_list(U, L1, L, Xs). cons_queue(X, Q0, Q1) :- item_queue_append(X, Q0, Q1). head_queue(X, Q1) :- item_queue_append(X, _, Q1). tail_queue(Q0, Q1) :- item_queue_append(_, Q0, Q1). member_queue_offset(X, q(s(U),L0,_), N) :- $member(X, U, L0, 0, N). $member(X, _, [X|_], N, N). $member(X, s(U), [_|L1], N0, N) :- N1 is N0 + 1, $member(X, U, L1, N1, N). member_queue_index(X, q(s(U),L0,_), N) :- $member(X, U, L0, 1, N). member_queue(X, Q) :- member_queue_index(X, Q, _). memberchk_queue(X, Q) :- member_queue(X, Q), !. queue_list_length(Q, Xs, N) :- Q and Xs have the same elements, N of them. queue_empty(Q) :- queue_list(Q, []). queue_length(q(U,L0,L), N) :- ( integer(N) -> N >= 0, $queue_length(U, L0, L, N) ; var(N) -> $queue_length(U, L0, L, 0, N) ; $error ). $queue_length(z, L, L, N, N). $queue_length(s(U), [_|L1], L, N0, N) :- N1 is N0 + 1, $queue_length(U, L1, L, N1, N). $queue_length(s(U), [_|L1], L, N0) :- N0 > 0, !, N1 is N0 - 1, $queue_length(U, L1, L, N1). $queue_length(z, L, L, 0). portray_queue(Q) :- ( write('[|'), member_queue_offset(X, Q, N), ( N =:= 0 -> true ; write(', ') ), write(X), fail ; write('|]') ).
I have omitted the following predicates from the Quintus library:
list_queue(Xs, Q)
is queue_list(Q, Xs)
.
queue_last(Q, X)
has the wrong argument order and
is not a fast operation.
queue_last(Q0, X, Q)
has the wrong argument order
and is not a fast operation.
Since the Quintus module is called 'newqueues', this should not be a problem.
I think I may already have too many operations providing access to the elements of a queue other than by popping them. It doesn't feel right to propose any higher-order operations on them. The idea of having 2N map*/N+1 predicates,
map(P, L1, L2) map_list_queue(P, L1, Q2) map_queue_list(P, Q1, L2) map_queue_queue(P, Q1, Q2)
doesn't appeal either. Once the higher order list operations are there, anyone who wants higher order queue operations can easily program them him/herself.
There are three predicates which are so basic that the ISO Prolog standard itself uses them, while at the same time refusing to bless Prolog programmers with assured access to them.
succ/2
provides the successor function on
natural numbers only. It is essential to the purposes
for which succ/2
was invented that it should not accept or
produce negative integers.
:- pred succ(integer, integer). :- succ(X, Y) when X ; Y. :- succ(X, Y) is semidet when X ; Y. % Sample implementation succ(X, Y) :- ( integer(X) -> X >= 0, Y is X + 1 ; integer(Y) -> Y > 0, X is Y - 1 ; var(X), var(Y) -> instantiation fault ; $error ).
plus/3
provides reversible arithmetic on exact numbers.
It was intended to work with rational numbers as well as integers; since
ISO Prolog omits rational numbers, it is here specified for integers only.
It is essential to the purposes for which plus/3
was
invented that it should be a proper logical predicate; the nature of
floating point arithmetic means that it cannot behave as intended
if it allows floating-point arguments.
:- pred plus(integer, integer, integer). :- plus(X, Y, Z) when X, Y ; Y, Z ; Z, X. :- plus(X, Y, S) is semidet when X, Y ; Y, Z ; Z, X. % Sample implementation plus(X, Y, Z) :- ( integer(X), integer(Y) -> Z is X + Y ; integer(Y), integer(Z) -> X is Z - Y ; integer(Z), integer(X) -> Y is Z - X ; $error ).
between/3
is astonishingly useful, not least because of
its frequent use in bounding the search space of other predicates. It
is difficult to take seriously any Prolog "standard" which omits it.
:- pred between(integer, integer, integer). :- between(L, U, X) when L, U. :- between(L, U, X) is semidet when ground(L), ground(U), ground(X). :- between(L, U, X) is bounded when ground(L), ground(U). % Sample implementation. between(L, U, X) :- ( integer(L), integer(U) -> ( integer(X) -> L =< X, X =< U ; var(X) -> L =< U, $between(L, U, X) ; $error ) ; integer(L), integer(X), L > X -> fail ; integer(U), integer(X), X > U -> fail ; $error ). $between(L, L, L) :- !. $between(L, _, L). $between(L, U, X) :- M is L + 1, $between(M, U, X).
The order in which the sample implementation enumerates solutions for X is part of the specification.
There is of course a better alternative to between/3
,
and that is to extend the syntax of comparison.
Comparison form | between/3 equivalent
| ||
E =< X =< F | L is E, | U is F, | between(L, U, X) |
E =< X < F | L is E, | U is F-1, | between(L, U, X) |
E < X < F | L is E+1, | U is F-1, | between(L, U, X) |
E < X =< F | L is E+1, | U is F, | between(L, U, X) |
E >= X >= F | U is -F, | L is -E, | between(L, U, Z), X is -Z |
E >= X > F | U is 1-F, | L is -E, | between(L, U, Z), X is -Z |
E > X > F | U is 1-F, | L is -1-E, | between(L, U, Z), X is -Z |
E > X >= F | U is -F, | L is -1-E, | between(L, U, Z), X is -Z |
This requires "distfix" operators, which are easy enough to add to
Prolog syntax and have been so added in the past. These operators
resemble the existing arithmetic comparison operators in evaluating
their "outside" arguments; unlike the existing arithmetic comparison
operators they do not evaluate their "inside" argument and only accept
integers. This approach generalises neatly to multiple generated
arguments, e.g.,
1 =< X < Y =< N.
However, between/3
is the simplest thing that could work
and has been proven over many years, so let's stick with it.
The problem with integer division is that there are several different versions, and the one you get is seldom the one you need. We should consider seriously whether Common Lisp analogues such as
floor(N, D, Q, R) ceiling(N, D, Q, R) round(N, D, Q, R) truncate(N, D, Q, R)
predicates with
floor(N, D), floor_remainder(N, D) ceiling(N, D), ceiling_remainder(N, D) round(N, D), round_remainder(N, D) truncate(N, D), truncate_remainder(N, D)
as arithmetic functions, might not be a good idea. Obviously, one pair of these arithmetic functions would duplicate the effect of the existing // and mod, but can you instantly recall, without looking it up, which?
I brought this up in 1984 and with the BSI and with the ISO committee, to a deafening silence. Surely I can't be the only person who is never quite sure what // is going to do with negative numbers?
Chapter 9 of the ISO Prolog standard lists the following evaluable functors:
Functor | Note |
---|---|
X + Y | but not +X |
X - Y | |
- Y | |
X * Y | |
X / Y>td>It always delivers a floating-point result, even when X is divisble by Y. An implementation which has rational numbers is completely forbidden to return a rational answer to 1/2, can you believe it? | |
X // Y | Yields an implementation-defined answer. |
X rem Y | yields the remainder corresponding to X // Y. |
X mod Y | yields X - floor(X/Y). Knuth's strong and carefully reasoned advice that X mod 0 should be allowed and should equal X has been ignored, sadly. |
??? | There isn't any division corresponding to mod. Of course, this broke all the old Prolog code that relied on // and mod going together, but when did the ISO Prolog committee worry about that? |
abs(X) | |
sign(X) | Despite the overwhelming predominance of IEEE floating-point arithmetic on popular machines, there is no equivalent of IEEE's copysign(), nor is it clear that copysign() can be emulated. |
float_integer_part(X) | long-winded |
float_fractional_part(X) | |
float(I) | |
floor(X) | |
truncate(X) | |
round(X) | |
ceiling(X) | |
X ** Y | 2**2 is 4.0, not 4. |
sin(X) | |
cos(X) | |
atan(X) | The one version of atan() no sane programmer uses! |
exp(X) | |
log(X) | A pity this wasn't spelled ln(X). |
sqrt(X) | |
Bits << Shift | |
Bits >> Shift | Utterly useless definition. |
X /\ Y | Crippled definition |
X \/ Y | Crippled definition |
\(X) | Inconsistent definition |
Let's start at the bottom and work up.
How, for example, could we do bitwise operations on sign-and-magnitude values as if they were twos-complement values? \(X) is simple; it is just -1-X. For other operations, the sign-and-magnitude bit pattern for the value X should be converted to the twos-complement bit pattern for the same value, operated on, and the result converted back.
sign_and_magnitude_to_twos_complement x = case compare x 0 of GT -> x EQ -> 0 LT -> x `xor` MAXINT twos_complement_to_sign_and_magnitude x = if x >= 0 then x else if x == MININT then error "overflow" else x `xor` MAXINT
Considering the other overheads in typical Prolog implementations, the conversion cost is tolerable; considering the rarity of bitwise operations compared with other operations (like procedure calls) the conversion cost isn't important; and considering the rarity of non-twos-complement machines the costs are overwhelmed by the improved portability of code.
I hasten to add that I like sign-and-magnitude and wish that it had prevailed over twos-complement. But it didn't.
char
). Java programmers will remember that the Java
designers found it advisable to have two right shift operators. With
the prior art of Common Lisp before them, it was quite inexcusable for
the ISO Prolog committee to insert a C tumour into poor old Prolog.
There really isn't any way of setting up a "logical" right shift
without exposing the width of integers. Assume for the moment that
twos-complement is used. What is the value of (-1)>>1
,
when that's an unsigned shift? It's INTMAX. Which means that it is
different for different Prolog systems, even on the same
hardware. For example, on 32-bit hardware, Prolog immediate integers
might be 24 bits, or 25 bits, or 28 bits, or 29 bits, or 30 bits,
or the Prolog system might support 32-bit integers, or even 64 bit
integers, or might have unbounded integers. The number of bits supported
might even change from one release to the next: one Prolog I know went
from 29-bit integers to 32-bit integers.
In other words, while a C programmer writing for a particular machine may have a use for unsigned shifts on signed numbers, a Prolog programmer writing for a particular machine cannot, because there isn't the slightest reason to expect two Prolog implementations on the same hardware to have the same integer width. The only kind of right shift that could possibly make sense for Prolog is signed right shift.
Surprise surprise: that's exactly the conclusion the Common Lisp designers came to. Ecrasez l'infame! This tumour must be excised from the community standard. Just as Common Lisp does signed shifts as if on twos-complement representations, so should Prolog.
atan
is provided, but tan
is not. I can't for the life of me think of any Prolog implementor whose
life would be simplified by leaving tan
out. The tangent
function is a standard part of Fortran, C, and Lisp.
mod
is a nuisance. It means that you have to write
div(X, Y, Z) :- Z is (X - X mod Y) // Y.
The Prolog implementor could have done it for you, much more
efficiently. In fact, using div
for integer division
is older art in Prolog than using //.
There are some obvious things missing. If you are using trig functions, sooner or later you need pi. We might as well have pi/0 and e/0 as evaluable functors.
Maximum and minimum are extremely useful. They've been in Fortran for years. They're in Lisp. They are easy to implement.
div
infix operator.
tan
(X), atan2
(X,Y),
acos
(X), and asin
(X).
max
(X,Y) and min
(X,Y); when the arguments
are of different types use the usual floating-point contagion rule.
pi
and e
constants.
Recently my attention was drawn to the following facts:
:- include(TextUnit).
% 7.4.2.7
This is a directive, not a built-in predicate.
It may only be used statically, and in this form. That is,
it is not possible in an ISO Prolog program to have
code which computes TextUnit and then calls include/1. To
repeat, there is no include/1
built-in
predicate, only a compile-time :-include(_)
directive.
:- ensure_loaded(TextUnit).
% 7.4.2.8
This is a directive, not a built-in predicate.
It may only be used statically, and in this form. That is,
it is not possible in an ISO Prolog program to have
code which computes TextUnit and then calls ensure_loaded/1.
To repeat, there is no ensure_loaded/1
built-in predicate, only a compile-time
:-ensure_loaded(_)
directive.
There are additional problems with those directives.
open/4
predicate. In fact there are excellent
reasons to expect that it won't: given 'foobar'
it is useful for a Prolog system to try
"foobar", "foobar.qof", "foobar.pl" in that order, or some
such thing.
:-ensure_loaded('foobar.pl')
will work at all.
:- ensure_loaded/1
were available as a
built-in predicate, the whole point of it is to load a file
once and only once. That makes it useless for reloading a file
after it has been changed.
compile_predicate(Name/Arity)
built-in predicate was proposed to the committee (by me) with a
clear rationale. Here's a trivial implementation:
compile_predicate(Name/Arity) :- scratch_file_name(Foobar), telling(Old), tell(Foobar), listing(Name/Arity), tell(Old), close(Foobar), consult(Foobar), % 'compile' if you have it. delete_file(Foobar).
That's not industrial-strength code, but it shows that all the Prolog systems listed above could easily support this operation.
open/4
for opening a file, but
there is no standard way to call that predicate. The
format of the file name argument is implementation defined and
there is no way for a program to discover what that format is.
Contrary to the impression given by the examples in the standard,
there is no requirement that any atoms ever be acceptable as file
name arguments, nor, if accepted, that they be interepreted in
accordance with the host file system.
open/4
should be defined as accepting an atom,
a non-empty list of atoms, or a non-empty list of character
codes as a file name, plus additional implementation-
defined ground terms. An atom provides a sequence of characters
(its name). A non-empty list of atoms provides a sequence of
characters determined by concatenating the names of its elements.
A non-empty list of character codes provides a sequence of
characters corresponding to the codes. In these three cases,
the character sequence is taken verbatim and passed to the
host file system for interpretation.
:- include/1
should be defined as accepting an
atom, a non-empty list of atoms, or a non-empty list of
character codes as a text unit name, plus additional
implementation-defined ground terms. A text unit name of these
forms should always be interpreted as referring to the contents
of a host file system file. If and only if there is no host file
system file with exactly the name given, an implementation may try
implementation-defined transformations of the file name. Typical
transformations include adding a ".pl" or ".pro" suffix if there
is no suffix, removing all "_" characters, changing " " to "-"...
:- ensure_loaded/1
should be defined as accepting
the same kinds of argument as :- include/1
and as
interpreting them in the same way, except that it might try a
different series of transformations, such as trying ".qof" before
trying ".pl".
ensure_loaded/1
should also be available as a
built-in predicate.
consult/1
built-in predicate.
This predicate should accept the same arguments as the
:- include/1
directive and should use the same
sequence of transformations, if any. From the host file system
file name which finally works, a canonical form is computed,
traditionally this has been the "absolute file name", with
conversion to lower case for host file systems that ignore alphabetic
case. Call this canonical form F, and the module that encloses
the call M, unless the file is a module-file, in which case the
module name in the :- module/2
directive is used for M.
Each predicate property and clause loaded from the file is in
effect tagged with the pair (M,F). This should work in three phases:
The same (M,F) tagging should be done by :-include and :-ensure_loaded and by any other means of loading Prolog text units. For text units that are not files, F need not be a file name; it is simply a hidden canonical label.
See the previous section. There are at least two dimensions of portability:
In section 7.1 I complained that there is no way of specifying a
file name to open/4
which is portable in the second sense.
A Prolog implementor I shan't name has taken me to task over this,
saying that it is quite reasonable for there to be no "portable" file
name terms because only the first kind of portability matters, and
the file names would not be portable anyway.
He is wrong. For example, I have Quintus Prolog, SICStus Prolog, XSB Prolog, SWI Prolog, Ciao, ECLiPsE, and a couple of others (including a version of Poplog I can't actually get at directly, as part of the SPARK/Ada toolkit), all running on the same machine. If you are trying to write code for publication, you want to test it under several Prolog systems. If you are trying to write library code for general use, you want to test it under several Prolog systems. Encouraging portability between different language implementations is one of the major purposes of a language standard. The really annoying thing here is that those implementations do share a common file name term (atom), so why can't the standard say so?
He is also wrong about the portability of file names. It is true that different operating systems have different file name syntax, but words matching the regular expression /[A-Z][A-Z0-9]{0,5}/ are portable between MVS, CMS, VMS, TOPS-10, TENEX, TOPS-20, RT-11, RSTS/E, all versions of UNIX from at least version 6 on, all versions of DOS, all versions of Windows, all versions of MacOS, the Burroughs (now Unisys) MCP, PR1MOS, ITS, and several other operating systems I haven't used. This is hardly surprising: that's the syntax for Fortran 66 identifiers, and Fortran programmers expected to be able to use Fortran identifiers as file names. The fact that some file names are not portable across a wide range of operating systems is no excuse for failing to provide standard access to file names that are portable across a wide range of operating systems.
In fact, there is a hidden assumption in the ISO Prolog standard which
restricts ISO Prolog input/output to DOS, Windows, and POSIX-compatible
systems such as Solaris and Macos-X. Since DOS and Windows accept forward
slashes in file names as well as reverse slashes, then one can expect
relative file names formed from the POSIX portable filename character set
and using forward slash as the directory separator with names restricted to
"8+3" form to be usable across all the file systems where ISO Prolog can be
expected to work. What's more, while file names containing slashes are
not legal OpenVMS (VMS, TOPS-10, TENEX, TOPS-20) file names, they can easily
be mapped to file names which are legal. It is quite reasonable for
open/4
to take as host file names those character sequences
which are possible host file names and to use some kind of POSIX-to-host
mapping for character sequences which are not.
Section 3.94 of the ISO standard lists the input/output modes
read
, write
, and append
.
Section 7.10.1.1 repeats this.
It is not clear whether an implementation may
support additional input/output modes, such as rewrite
.
The relevant wording in section 7.10.1.1 is
append -- Output. The
source/sink is a sink. If the sink already exists then output shall start
at the end of that sink, else an empty sink shall be created.
|
The problem is that "the end of the sink" is not a well defined term.
Stream optionbs (section 7.10.2.11) include type(text)
or
type(binary)
. Now, in UNIX, Windows, and MacOS, the contents
of a file are a sequence of bytes. (In many other operating systems, they
are a sequence of records.) The end of a sequence of bytes is well defined.
But a text file is a sequence of characters encoded as a sequence of bytes.
For example, in TOPS-10 the last block of a file was typically padded with
ASCII NUL characters. Where is the end? Is it at the end of the block,
or is it at just after the last non-NUL character? Or consider DOS. It
was based on CP/M, where a file was a sequence of 128-byte records. When
you closed a character stream, if there was any space left in the last
record, you'd get a Ctrl-Z followed by as many NULs as there were room
for. Although Windows no longer uses this convention for output, it still
obeys it for input. So if a 2 MB file contains a Ctrl-Z as its first
character, a C program reading it as a text stream will think it is empty.
Where is "the end of that sink"? Is it at the logical file end as reported
by the file system, or is it at the first Ctrl-Z? This is actually serious
problem, because if you start writing after the last byte in a file, the
characters you write may be completely invisible when you read back:
?- open('HAIRY', write, Bin, [type(binary)]), put_byte(26), close(Bin), open('HAIRY', append, Txt, [type(text)]), put_code(0'*), nl, close(Txt), open('HAIRY', read, Src, [type(text)]), get_code(X), close(Src).
What is the value of X? Is it 0'*, or is it the end of file code?
To get text mode appending right, it is necessary to open the file for reading and writing, read and decode the current data, and then decide where the end of the text contents is found.
The matter of encoding is even more complicated than that. There is not a one-to-one correspondence between characters and bytes. There are over 98,000 characters in Unicode. To convert between a stream of characters and a stream of bytes it is necessary to use some kind of encoding. This encoding may be stateless (like UTF-8), or it may be stateful (like Unicode Technical Report 6), or for that matter like dynamic compression schemes such as dynamic Huffman encoding and Lempel-Ziv encoding. Stateless encodings are not a problem. Some stateful encodings can be restarted, by which I mean that you can put the encoder back into its initial state and keep or resume writing, and still recover the intended character stream without error or loss of information. Lempel-Ziv compression and Unicode Technical Report 6 are like that. If you create a file, write CS1, close it, open for append, write CS2, close it, and then read it back you will get CS1++CS2, even though you will not get the same byte stream that you would have got from writing CS1++CS2 in one go. But some stateful encodings cannot be restarted, because the original designer never thought of providing a code that says "forget what you know".
There is no way in ISO Prolog to get the effect of the O_APPEND flag in POSIX, which is something of a pity.
The behaviour of append
mode for text files needs to be
clarified.
When encodings are supported, the interaction between append
and encodings needs to be clarified.
An early draft of the ISO standard said that a stream position was an integer. In the days of ASCII/Latin-1 on UNIX and MacRoman on MacOS, the mapping between position in character sequence and position in byte sequence was the identity mapping. MS-DOS, with its CR-LF sequences, spoiled that, and meant that there was no general way to go from character position to byte position without reading all the previous bytes in the file. That was actually our good luck, because other encodings make it even harder.
However, the ISO standard makes stream positions harder to use than it has to. A normal use of stream positions is to remember them as you generate a file, and same them out in some kind of index structure. Then to look in the file, you load the index structure and use that to tell you where to go.
That usage is not allowed in ISO Prolog. The ISO Prolog standard says, in section 7.10.2.8, that stream position terms are stable while the stream is open. A stream position could, for example, include the stream identifier, making it unusable with any other stream for the same file.
It should be said that a stream-position term uniquely identifies a particular position in a source/sink that is connected to a persistent file as long as the contents of that file preceding the position are not modified.
encoding
(EncName)normalisation_form
(NF)Drat, I ran out of time again.
This is a draft. More operations will be added, and community discussion may result in adding, changing, or removing some of the operations listed here.
There is nothing here which was not obvious at least two years before the Prolog standard came out. In particular, at a conference in 1985, I warned explicitly that a "minimal" standard was not at all a good idea, and by 1986 most of the predicates listed above could have been designed, had a thorough overhaul of the library been desired instead of minor revisions.