By Rick Sutcliffe
Over the last two months the Spy introduced the motivation for he and Telecom engineer Benjamin Kowarsch developing a new language, or more precisely, a fully modern dialect of an existing notation–to address serious software engineering issues of safety, security, reliability, and extensibility–then offered something a little more than a “Hello World” as an initial example. In this piece, he briefly describes the baked-in building blocks of Modula-2 R10, that is the reserved symbols, words and reserved identifiers.
Modula-2 has always been smaller than but at least as expressive as any other mainstream general purpose language (including that thousand ton gorilla of notations known as Ada). Throughout the R10 design process, every proposed addition was scrutinized for necessity (use cases required) and potential for bloat. To mitigate the latter, in most cases, every addition required a deletion, so that the overall size of the language remains approximately what it was in Wirth’s initial early 80s release.
What follows is a brief sketch of the built-ins for the new notation.
Reserved Words:
1. For Modules, and importing from same:
MODULE : All compilation units are modules and are tagged with this word. A program module has no modifiers to this word on the header. R10 does not have local modules.
DEFINITION: A definition module file holds the interface to a library module. It is compiled to a symbol file.
IMPLEMENTATION : The implementation module file is the code or declaration part of a library module pair. In the presence of the already compiled symbol file for the definition it implements, it compiles to an object code file ready for the linker.
BLUEPRINT: Tags a module as a blueprint for all derived modules, including narrowing blueprint modules. It specifies what bindings have to be included for all types dependent on it.
GENLIB : A macro specifying the generation of a refined module from a template for a type and/or a generic operation.
FROM modulename IMPORT : syntax used by a client module for unqualified import of functionality from a library module.
IMPORT modulename: syntax used by a client module for qualified import of all functionality available from a library module. Entities imported in this manner have their names qualified by the module name.
REFERENTIAL : Used in a Blueprint module to specify module type requirements or impediments.
2. Used in definitions, declarations, and parameter lists:
ALIAS : Defines an alias name for a type. This is necessary in R10 as a line like TYPE aType = bType; defines a new an incompatible type with the same members, not an alias as in previous dialects.
ARGLIST : Tags a parameter list as containing an indeterminate number of items of the same type. The arglist may than be iterated on inside the procedure.
ARRAY : Tags an array in a type declaration. Associative arranys are available in a library.
BY : Used in declaring structured array constants to indicate a repetition of values
CONST : In a declaration or parameter tag declares the following name(s) as immutable in their scope.
INDETERMINATE : Optionally tags the final field of a record as having dynamic size binding. This differs from the variant records of previous dialects.
LITERAL : The enumeration type of the literals, this tag can be used with the macro TLITERAL to determine compatibility with a specific literal type.
NONE : Used for omitted arguments.
OF : Used to specify the supertype of a range or set elements.
OPAQUE : Specifies that a record field defined in a definition module is available only to the implementation, not to client modules.
POINTER : Part of a type definition for pointer variables.
PROCEDURE : Opening word in a procedure header. Function procedures also have a return type specified at the end of the header. Note that this is the opposite convention to that of the C-like languages.
RECORD : The name of a cross product type. Note that R10 dispenses with the WITH unqualifier.
SET : As in mathematics, an unordered collection. Counted sets are available in the library.
TO : In a pointer declaration is followed by the type of the entities pointed to. In R10, not used in the FOR loop.
TYPE: Header for one or more type declarations.
VAR : Header for one or more variable declarations, tag for a reference parameter in a procedure header (behaviour that is never automatic).
3. Operators:
AND, DIV, MOD, NOT, OR : Div and MOD operate on whole operands, and there are of course symbolic operators such as +, -, *, and / whose applicability to specific types is controlled by blueprints for that class of types.
4. Block Control:
BEGIN, DO, END, RETURN : Note that the END of a named block must include the name, and that a RETURN in a function procedure must be followed by the designator of an entity of the correct return type, whereas if used in a regular procedure, it must not be. Unlike the C-like languages, scopes cannot be opened just anywhere.
5. Selection:
CASE : R10 changes the syntax of this to require a vertical bar to precede each case, rather than as a case separator. This is not used in records.
IF..THEN..ELSIF..ELSE..END : The standard syntax of Modula-2’s versatile and easy-to-read if statement. There can be any number of ELSIF clauses.
6. Repetition:
FOR..IN..DO..END : New syntax for the iterative loop. This can be tagged for ascending or descending, provided the loop control variable is scalar. The FOR..TO..BY style loop has been discarded. FOR can be bound so a data type can define its own iteration control.
REPEAT..UNTIL : Standard bottom-of-loop exit tested loop.
WHILE..DO..END : Standard top-of-loop exit tested loop.
LOOP..END : An indefinite loop. Exit can be tested anywhere.
EXIT : Execution of this statement at any point in any of the above four kinds of loops causes a transfer of control to the first statement following the loop. Note that this is a considerable extension of functionality from previous dialects, which restricted this to use in a LOOP construct.
7. For dynamic variables:
NEW : A Wirthian macro that calls ALLOCATE (which must be made visible with an import) to obtain dynamic memory and set the pointer variable used in the call to point to same.
RETAIN : retains one reference to a reference counted variable.
RELEASE : releases one reference to a dynamic variable. If the last reference is released, DEALLOCATE is called to return the dynamic memory.
COPY : A Wirthian macro that allows binding to the assignment operator := for statically defined types.
Those familiar with either classical or ISO Modula-2 will note a number of changes, simple but significant in the case of records, the FOR loop, and the allowed use of EXIT in all loops, but more profound in the handling of dynamic data, where careless program-crashing errors are most common.
What Wirth called “standard identifiers”, (as such designators named built in entities but were not reserved and could be redefined), ISO termed “pervasive identifiers” as they were deemed to be imported into every module, even if they were redefined in an outer scope. In R10 we took another approach. Such “predefined identifiers” in the base language are now reserved. They cannot be redefined, as there is little point except the making of pathological examples in doing so. Here, we have made even more changes from previous dialects, but followed the same principle. The language remains small, but becomes more powerful.
Predefined identifiers:
1. Constants:
TRUE, FALSE : Boolean constants. These are not numbers, and BOOLEAN is not an enumeration as it was in previous dialects.
NIL : identifies the special constant for pointers, which are always initialized to this value. It is an error to dereference a NIL pointer. Since this name is reserved, the dithering back and forth about whether this is a reserved word or a standard identifier becomes moot.
2. Numeric Types:
OCTET, CARDINAL, LONGCARD, INTEGER, LONGINT, REAL , LONGREAL: The bit widths of the ordinary vs long types is implementation dependant. OCTET is a special type, doing double duty as the eight-bit positive whole type (CARD8) and the universally compatible minimum storage unit type (probably the same as the ISO SYSTEM.LOC and likely also equivalent to its SYSTEM.BYTE.) A procedure parameter that is an ARRAY of OCTET can have any array assigned to it. Note that since there is in practice little difference between built-in types and library types because operators can be bound, and these bindings accessed through Wirthian macros, the built-ins can be kept to a minimum. Therefore COMPLEX, BCD, and their operators, for instance are in separate libraries.
3. Other Types:
BOOLEAN, CHAR, UNICHAR : Note that the character types are not enumerations in R10.
4. Predefined Procedure Names: (These are all Wirthian Macros that can be bound to in modules.)
CONCAT : For binding concatination operators to data types, where applicable.
INSERT, REMOVE : For binding insertion and deletion operators to data types, where applicable.
READ, READNEW, WRITE, WRITEF, : I/O macros. READNEW is for dynamic types, and WRITEF is for formatted output and takes an ARGLIST parameter so it can write multiple items with the same specified format. An I/O channel or file parameter is ptional, and defaults to the standard I/O channels stdIn and stdOut. Additionally, READ can take an optional codec parameter such as CSV.
STORE : For binding to [] to store in collections (not limited to arrays or sets.)
5. Predefined Function Names:
ABS : Absolute value.
CHR : To construct a character value from a whole number.
COUNT : Returns the number of items in a collection.
LENGTH : Returns the length of an array of characters.
NEG : The negative of a value, where applicable.
ODD : Returns TRUE if the entity is odd, otherwise false.
ORD : Returns the ordinal value of the specified item in an enumeration. The numbering always starts at zero.
PRED, SUCC : Returns a predecessor or successor (moderated by a second parameter) to a value in an ordinal type.
PTR : Returns a pointer to the first operand. The return type is that of the second operand.
RETRIEVE : Counterpart to STORE, for binding to [] to retrieve from collections (not limited to arrays or sets.)
SUBSET : For binding to the <= operator for defining set types. TMIN, TMAX : return the minimum and maximum values of a scalar type.
TSIZE : Returns in a LONGCARD the number of octets required to store an entity of the given type.
TLIMIT : Returns in a LONGCARD the capacity limit of a set, array, string, or collection type.
Besides declining to add CMPLX, IM, and RE as did the ISO dialect, the most obvious omissions are the ones for type conversions, which variously took such forms as:
anInteger := INT (aCardinal);
anInteger := VAL (INTEGER, aCardinal);
aReal := FLOAT (aCardinal);
aCardinal := TRUNC (aReal);
All such (and there were quite a few in some dialects) are replaced by a new conversion operator :: which is bound to underlying conversion routines either in the compiler (for built-in types) or specified a library module. To a programmer, it looks like this:
anInteger := aCardinal :: INTEGER;
aReal := aCardinal :: REAL;
aCardinal := aReal :: CARDINAL;
Note that R10 has tighter rules on compatibility then previous dialects. In R10, CARDINAL and INTEGER are not compatible in the overlapping part of their ranges. One must intentionally do potentially dangerous things.
Where it makes sense, bindings are available to extend this functionality to conversions of other types.
NOTE: In some languages type changing conversions are called “casts”, which is incorrect terminology. Casting is coercion, or forced reinterpretation of the bit pattern of one variable’s contents as those of another of a different type. whether this makes sense or not–a very dangerous operation indeed. In R10 Modula-2, such coercions are done by importing and calling CAST from the module UNSAFE, which is our new name for what previous dialects called SYSTEM. CAST may also be used as a parameter list tag in some contexts, but must be imported.
New and altered use of symbols:
The Spy will not here delineate all the use of reserved symbols, most of which are typical of many programming notations (+ – * / ), or at least of all dialects of Modula-2 (<= and cognates, and the above operators applicable to sets, := for assignment, ^ for pointer dereferencing, (* and *) for comment delimiters, <* and *> for pragma delimiters. However, some are new or have been changed.
++ and — : Post increment and post decrement.
== : Identity test.
.. : Besides a subrange constructor, also a slice range specifier for slicing out or inserting, say, substrings from an array of char.
# : Is the only symbol for not equal
{} : As in ISO, used for any structured value constructor, but no longer need be prefixed by the type name.
\ : Escape character in a literal string. May be followed by n to generate a newline, t to generate a tab, or \ to generate a backslash in the output string.
Still coming are additional examples to illustrate the safe use of libraries, templates, blueprints, and extension modules. Beyond this gentle introduction, consult the report and syntax diagrams on the web site, or wait for the book to be published.
–The Northern Spy
Opinions expressed here are entirely the author’s own, and no endorsement is implied by any community or organization to which he may be attached. Rick Sutcliffe, (a. k. a. The Northern Spy) is professor of Computing Science and Mathematics at Canada’s Trinity Western University. He has been involved as a member or consultant with the boards of several community and organizations, and participated in developing industry standards at the national and international level. He is a co-author of the Modula-2 programming language R10 dialect. He is a long time technology author and has written two textbooks and nine alternate history SF novels, one named best ePublished SF novel for 2003. His columns have appeared in numerous magazines and newspapers (paper and online), and he’s a regular speaker at churches, schools, academic meetings, and conferences. He and his wife Joyce have lived in the Aldergrove/Bradner area of BC since 1972.
Want to discuss this and other Northern Spy columns? Surf on over to ArjayBB. com. Participate and you could win free web hosting from the WebNameHost. net subsidiary of Arjay Web Services. Rick Sutcliffe’s fiction can be purchased in various eBook formats from Fictionwise, and in dead tree form from Amazon’s Booksurge.
URLs for Rick Sutcliffe’s Arjay Enterprises:
The Northern Spy Home Page: http: //www. TheNorthernSpy. com
opundo : http: //opundo. com
Sheaves Christian Resources : http: //sheaves. org
WebNameHost : http: //www. WebNameHost. net
WebNameSource : http: //www. WebNameSource. net
nameman : http: //nameman. net
General URLs for Rick Sutcliffe’s Books:
Author Site: http: //www. arjay. ca
Publisher’s Site: http: //www. writers-exchange. com/Richard-Sutcliffe. html
The Fourth Civilization–Ethics, Society, and Technology (4th 2003 ed. ): http: //www. arjay. bc. ca/EthTech/Text/index. html
Sites for Modula-2 resources
Modula-2 FAQ and ISO-based introductory text: http://www.modula-2.com
R10 Repository and source code: https://bitbucket.org/trijezdci/m2r10/src
More links, Wiki: http://www.modula-2.net
p1 ISO Modula-2 for the Mac: http://modula2.awiedemann.de/