5 Building a Language

7.0.0.6

5 Building a Language

Goals

— constructing a hash-lang language with s-exp

— interposition points: #%module-begin, #%app, etc.

5.1 From Macros to Languages

Recall that we are trying to implement this language:

Definition	=	(define-function (Variable Variable1 ...) Expression)

Expression	=	(function-application Variable Expression ...)
	\|	(if Expression Expression Expression)
	\|	(+ Expression Expression)
	\|	Variable
	\|	Number
	\|	String

We succeeded in writing define-function and function-application, but only by adding those to racket. We’d like a language where a program is

Program	=	Definition-or-Expression ...

Definition-or-Expression	=	Definition
	\|	Expression

and nothing else—no cond, no booleans, no extra primitive operations, and so on. Racket will let us enforce those constraints for programs that start #lang algebra, instead of #lang racket, so let’s work toward defining define #lang algebra.

5.2 Modules and #lang

The #lang that starts a Racket-program file determines what the rest of the file means. Specifically, the identifier immediately after #lang selects an interpretation for the rest of the file. The only constraint on that interpretation is that it defines a Racket module that can be referenced using the file’s path.

Somehow, the characters of a module file get converted to a pile of machine code with certain addresses designated as entry points to implement functions defined by the module. Clearly, we don’t want to think about the long road from characters to machine code every time we write a Racket program or even a Racket language—not to mention the further realization of machine code into transistors or superpositions of quantum-mechanical waves. To manage all of that complexity and divide up the work among Racketeers, Racket compiler writers, operating-system vendors, microprocessor designers, and physicists, the define a sequence of layers an an abstraction to go between each layer. A user of algebra will see only (hash-lang) algebra and its grammar, while the implementer of algebra sees another layer down.

From our perspective as the implementer of algebra, the next layer down is a syntax-object representation of a module. The program

"example.rkt"
#lang algebra
(define-function (f x) (+ x 1))
(function-application f 2)

will translate to roughly

(module example ....
  (#%module-begin
    (define-syntax (f stx) ....)
    ((lambda (x) (+ x 1)) 2)))

Before explaining more about that module form, there’s a difference in intent in the above two chunks of text showing programs. In the first case, the parentheses are meant as actual parenthesis characters that reside in a file. In the second case, the parentheses are just a way to write a text representation of the actual value, which is a syntax object that contains a lists of syntax objects that contain symbols, and so on. Someone has to parse the parentheses in the first block of code.

For now, we don’t want to be the one parsing parentheses, so we’ll actually write

"example.rkt"
#lang s-exp "algebra.rkt"
(define-function (f x) (+ x 1))
(function-application f 2)

The s-exp language doesn’t do anything but parse parentheses into syntax objects. It directly generates the syntax object

(module example "algebra.rkt"
  (#%module-begin
    (define-function (f x) (+ x 1))
    (function-application f 2)))

Without creating a "algebra.rkt" file, copy the #lang s-exp "algebra.rkt" example into DrRacket and click the Macro Stepper button. The stepper will immediately error, since there’s no "algebra.rkt" module, but it will show you the parsed form.

which is half-way to where we want to be: the define-function and function-application syntax objects are still here to be expanded by macros, but we no longer have to worry about parsing characters. (The change from algebra to "algebra.rkt" just lets us work with relative paths, for now, instead of installing an algebra collection.)

5.3 The Core module Form

The core module grammar is

Module

(module name initial-import-module

(#%module-begin

form ...))

(module _name _initial-import-module

form ...)

The second variant is a shorthand for the first, and it is automatically converted to the first variant by adding #%module-begin.

For a module that comes from a file, the name turns out to be ignored, because the file path acts as the actual module name. The key part is initial-import-module. The module named by initial-import-module gives meaning to some set of identifiers that can be used in the module body. There are absolutely no pre-defined identifiers for the body of a module. Even things like lambda or #%module-begin must be exported by initial-import-module if they are going to be used in the module body’s forms.

If require is provided by initial-import-module, then it can be used to pull in additional names for use by forms. If there’s no way to get at require, define, or other binding forms from the exports of initial-import-module, then nothing but the exports of initial-import-module will ever be available to the forms.

Since every module for has an explicit or implicit #%module-begin, initial-import-module had better provide #%module-begin. If a language should allow the same sort of definition-or-expression sequence as racket, then it can just re-export #%module-begin from racket. As we will see, there are some other implicit forms, all of which start with #%, and initial-import-module must provide those forms if they’re going to be triggered.

Here is the simplest possible Racket language module:

"simple.rkt"
#lang racket
(provide #%module-begin)

Since "simple.rkt" provides #%module-begin, it’s a valid initial import. You can use it in the empty program

"use-simple.rkt"
#lang s-exp "simple.rkt"

as long as "use-simple.rkt" is saved in the same directory as "simple.rkt" (so that the relative path works). You can add comments after the #lang line, since comments are stripped away by the parser. Nothing else in the body is going to work, though. Actually, (#%module-begin) will work, since #%module-begin is bound and since s-exp relies on the inplicit introduction of #%module-begin instead of adding it explicitly. That’s a flaw in s-exp.

For historical reasons, it turns out that you can put

(module whatever "simple.rkt"
(#%module-begin))

in a file, because Racket falls back to allowing an S-expression-based module form in the absence of #lang. Direct use of this intermediate form is discouraged, though.

5.4 Implicit Forms

Besides #%module-begin, there are four other implicit forms that will be relevant to #lang algebra: #%datum, #%app, #%top, and #%top-interaction.

5.4.1 #%datum

Try changing "use-simple.rkt" like this:

#lang s-exp "simple.rkt"
0

The complaint is “literal data is not allowed; no #%datum syntax transformer is bound.”

The #%datum form is implicitly wrapped around an literal value like 0, #true, or "apple" when it appears in a place where an expression is expected.

As a step toward algebra, let’s define "arith.rkt" to allow literal numbers, but not other kinds of literals. Let’s also provide + while were at it:

"arith.rkt"
#lang racket
(require (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [number-datum #%datum])
         +)

(define-syntax (number-datum stx)
  (syntax-parse stx
    [(_ . v:number) #'(#%datum . v)]
    [(_ . other) (raise-syntax-error #f "not allowed" #'other)]))

and then this program will work:

#lang s-exp "arith.rkt"
0

5.4.2 #%app

This program still won’t work:

#lang s-exp "arith.rkt"
(+ 1 2)

The error is “function application is not allowed; no #%app syntax transformer is bound.”

The #%app form is implicitly added to parenthesized expression that appears in a place where an expression is expected and where the first item in the parentheses is not an identifier that is defined as a macro.

We could provide racket’s #%app from "arith.rkt", but that’s not quite what we want. Currently, this program does run:

#lang s-exp "arith.rkt"
+

That is, we’ve provided +, which is a Racket function that doesn’t have to be immediately applied. We can disallow this—and also avoid having to worry about forms like (1 2) that attempt to apply a number—by providing a different + that is a macro:

"arith.rkt"
#lang racket
(require (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [number-datum #%datum]
                     [plus +]))

(define-syntax (number-datum stx)
  (syntax-parse stx
    [(_ . v:number) #'(#%datum . v)]
    [(_ . other) (raise-syntax-error #f "not allowed" #'other)]))

(define-syntax (plus stx)
  (syntax-parse stx
   [(_ n1 n2) #'(+ n1 n2)]))

Now, addition is allowed,

#lang s-exp "arith.rkt"
(+ 1 2)

and a misuse of + is not allowed:

#lang s-exp "arith.rkt"
+

Although #%app may not be the best way to make + work right, "arith.rkt" might usefull export an #%app that complains in a better way for misuses of parentheses. For example,

#lang s-exp "arith.rkt"
(1 2)

complains with “function application is not allowed; no #%app syntax transformer is bound,” which is likely nonsense to target users.

"arith.rkt"
#lang racket
(require (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [number-datum #%datum]
                     [plus +]
                     [complain-app #%app]))

(define-syntax (number-datum stx)
  (syntax-parse stx
    [(_ . v:number) #'(#%datum . v)]
    [(_ . other) (raise-syntax-error #f "not allowed" #'other)]))

(define-syntax (plus stx)
  (syntax-parse stx
   [(_ n1 n2) #'(+ n1 n2)]))

(define-syntax (complain-app stx)
  (define (complain msg src-stx)
    (raise-syntax-error 'parentheses msg src-stx))
  (define without-app-stx
    (syntax-parse stx [(_ e ...) (syntax/loc stx (e ...))]))
  (syntax-parse stx
   [(_)
    (complain "empty parentheses are not allowed" without-app-stx)]
   [(_ n:number)
    (complain "extra parentheses are not allowed around numbers" #'n)]
   [(_ x:id _ ...)
    (complain "unknown operator" #'x)]
   [_
    (complain "something is wrong here" without-app-stx)]))

5.4.3 #%top

If you use an identifier that isn’t provided by "arith.rkt",

#lang s-exp "arith.rkt"
oops

then you’ll get a message that mentions #%top. The #%top form is wrapped around an identifier that has no binding.

We could improve the error for users so that it doesn’t mention the implicit name #%top:

(define-syntax (complain-top stx)
  (syntax-parse stx
    [(_ . x:id)
     (raise-syntax-error 'variable "unknown" #'x)]))

5.4.4 #%top-interaction

Finally, you may have noticed that when you run any of the working programs with "arith.rkt", DrRacket reports “Interactions disabled: language does not support a REPL (no #%top-interaction).”

The #%top-interaction form is wrapped around any expression entered into the interactions window, and DrRacket notices that it will never work in our arithmetic language, so it doesn’t provide a prompt. We could enable the interactions window to have the same kinds of forms as a program by providing a #%top-interaction that just removes itself:

#lang racket
(require (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [number-datum #%datum]
                     [plus +]
                     [unwrap #%top-interaction]))

....

(define-syntax (unwrap stx)
  (syntax-parse stx
   [(_ . e) #'e]))

← prev up next →

1	Racket and Language-oriented Programming
2	Lab Syntax, More Syntax, ...
3	Parsing Syntax, Syntax Classes
4	Lab ... and Yet More Syntax
5	Building a Language
6	Lab My First Language
7	Completing the Language
8	Lab My First Real Language
9	Lexing and Parsing
10	Lab Parsing Modules
11	Building an Ugly Language
12	Lab My First Ugly Language
13	Types and Type Checking
14	Lab My First Typed Language
15	Building a Typed Language with Macros and Turnstile
16	Lab My Second Typed Language
17	Language Gems I
18
19	Language Gems II
20	Good Bye

5.1	From Macros to Languages
5.2	Modules and #lang
5.3	The Core module Form
5.4	Implicit Forms