Programmiersprachen – Programming Languages
LVA 185.208, VU, 3 ECTS, 2017 S
The computation shall use two sets, a set E of entities to be manipulated (or several sets if there are different kinds of entities), and a set R of rules specifying how entities in E shall be manipulated. Most of the rules are determined by a program, some rules can be fixed in the language definition. The computation model is simple: Select entities from E and a rule r from R where r is applicable to the selected entities, remove the selected entities from E, and (optionally) add further entities to E as specified by r. These steps are repeated again and again (sequentially or in parallel) until no rule in R is applicable to any subset of E.
Please develop an interpreter of a language operating in this way and show that your language is useful by giving simple programming examples in your language. There are no further constraints. Please feel free to find your own application area and design your own language. Be careful to keep the syntax simple and avoid the use of expensive compiler technologies (no lexer, parser, AST, etc. necessary).
All data in the example language are possibly open character sequences (unterminated string, stream, ...) consisting of tokens that are separated from each other by white space. This is the syntax in EBNF:
<program> ::= { <rule> }
<rule> ::= <head> <body> '.'
<head> ::= <atom>
<body> ::= { ':' <goal> }
<goal> ::= <atom>
| <pattern> '=' <pattern>
| '$' <pattern> <pattern> '-' <pattern> <pattern>
<atom> ::= <name> { <pattern> } '-' { <pattern> }
<name> ::= <token>
<pattern> ::= '(' { [ '+' ] <token> } [ '*' <token> ] ')'
<token> is a sequence of characters not containing white space.
Special characters used in the syntax (':', '.', '=', '$', '-', '(', '+', '*', ')') as well as '\' occurring in a token must be preceded by '\', e.g. x\$\+ stands for the character sequence x$+.
Within a <pattern>,
<token> (without preceding '+' or '*') stands for a token consisting just of the characters of <token> (as a literal);
'+' <token> is a variable supposed to stand for a single token;
'*' <token> is a variable supposed to stand for a possibly open character sequence.
x or +y) separated by arbitrary white space.
With *z at the end, the character sequence is open, and *z stands for the rest of the sequence.
The special empty pattern () stands for an empty (no longer open) character sequence.
The scope of a variable (e.g., +y or *z) is the single <rule> where the variable occurs within.
Each variable can get a value only once – single assignment.
The variable can be initialized or uninitialized.
When a pattern contains no uninitialized variable, we call this pattern full.
A full pattern specifies a character sequence possibly containing known tokens at the beginning (if specified) and possibly being open (when ending with e.g. *z).
This is, for full patterns it is not necessary that the corresponding character sequence is already closed.
A pattern containing neither + nor * is always full.
Let a1,...,am and c1,...,cn be literal tokens or variables standing for tokens (m,n ≥ 0), and let b and d be variables standing for character sequences. A not necessarily full pattern (a1 ... am) or (a1 ... am b), where b is unitialized, matches a full pattern (c1 ... cn) or (c1 ... cn d), thereby initializing uninitialized variables in a1,...,am,b, in the following case:
For an execution we need a set G of goals according to <goal>, a set R of rules according to <rule>, and a map V associating values to variables.
We repeatedly select an appropriate goal g from G and (if necessary) a corresponding rule r from R, remove g from G, add corresponding new goals g1,...,gk to G, and map variables to values in V when variables are initialized, so that
<atom>, where
$p1 p2 - p3 p4 (causes the character sequence specified by p1 to be executed in a separate shell using p2 as standard input stream and p3 as standard output stream, returning the result of the execution in p4) where p1 and p2 are full, p1 specifies a closed character sequence, k is 2, g1 is p3 = out (out being the standard output stream of the shell) and g2 is p4 = res (res being an internal resource to be replaced by the result of the execution after termination).
matchesin this specification implies that variables become initialized. We have to be careful to use the same value for each occurrence of the same variable and to distinguish between different variables. Especially, we must distinguish between variables belonging to different applications of the same rule. For that reason we replace all variables occurring in a rule with fresh variables when applying a rule. The reason of using goals of the form
$p1 p2 - p3 p4 is simplification:
Since we can use all resources of the operating system, it is not necessary to define elementary operations (numbers, arithmetic and logic operations, etc.).
When writing programs we need not make use of goals of the form q = p;
these goals are used mainly in the execution to postpone the return of results (to the right of -) to a later point of time when these results are known.
To start the computation, we take the program as rule set R, an empty mapping V, and a set G usually containing only a single goal (to invoke the rule that starts the computation).
Here is an example of a program in this language that opens all pdf files in the current directory (executing ls *.pdf to list all such files and xpdf ... for each listed file in arbitrary order or in parallel):
main-:findFiles-(*files):showFiles(*files)-.
showFiles(+x*x)-:showFile(+x)-:showFiles(*x)-.
showFiles()-.
findFiles-(*f):$(ls \*\.pdf)()-(*f)(*x).
showFile(+f)-:$(xpdf+f)()-(*a)(+b).
To execute the program, we assume G initially to be the set { main- }.
Note that +x and *x in line 2 are different variables, and variables of the same name in different lines are different because of different scopes.