7 Completing the Language

7.0.0.6

7 Completing the Language

Goals

— ... with a config module and readers

Our #lang s-exp "arith.rkt" is a fantastic language with a terrible name. Even though we’d like to keep using the S-expression parser, we’d prefer to write just #lang arith to use the language.

7.1 Getting Rid of the Quotes

To refer to a module without quotes and ".rkt", the module’s enclosing directory must be registered as a collection. A collection houses a set of modules as files, and a file named "name.rkt" in a registered collection directory "collection" can be referenced without quotes as collection/name. As a convenience, collection by itself is recognized as a shorthand for collection/main.

Although it’s possible to register an individual collection, collections are more commonly defined within packages; installing the package registers its collections. At the same time, it’s common to make a single-collection package where the package and collection are the same directory and have the same name, and that’s what we’ll do now:

Create a directory named "arith".
Copy your "arith.rkt" file into "arith/main.rkt".
Select Install Package.. in DrRacket’s File menu.
Click Browse and then Directory (in the dialog that appears), and then select the "arith" directory that you just created.
Click Install.

After those steps, you can drop the quotes and ".rkt", but must keep s-exp:

#lang s-exp arith
(+ 1 2)

The s-exp parser converts that to

(module name arith
(+ 1 2))

which works because arith as the initial import can be mapped to the "arith/main.rkt" file. Similarly, (require arith) would work in various contexts to import "arith/main.rkt".

7.2 Getting Rid of s-exp

The next step is to make

#lang arith
....

parse as

(module name arith
....)

The allowed form a name after #lang is more restrictive than the grammar of require or the initial import of module, but it overlaps with the registered-collection syntax, as in arith. A language name after #lang is similarly interpreted as a module path, so arith refers to "arith/main.rkt"—but that module isn’t used directly. Instead #lang looks for a submodule in "arith/main.rkt", and specifically a submodule named reader.

A submodule is just a module form that is nested in a module. Unlike a whole-file module form, the name of a submodule matters and is used to name the submodule.

Try this variant of the "main/arith.rkt" module:

"main/arith.rkt"
#lang racket
(require (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [number-datum #%datum]
                     [plus +]))

(define-syntax (number-datum stx)
  (syntax-parse stx
    [(_ . v:number) #'(#%datum . v)]
    [(_ . other) (raise-syntax-error #f "not allowed" #'other)]))

(define-syntax (plus stx)
  (syntax-parse stx
   [(_ n1 n2) #'(+ n1 n2)]))

(module reader racket
  (provide read-syntax)

  (define (ignore in)
    (unless (eof-object? (read-char in))
      (ignore in)))

  (define (read-syntax src in)
    (ignore in)
    #`(module whatever arith
        7)))

and try this program:

#lang arith
(+ 1 2)

which prints... 7! In fact, you can put whatever text you want after the #lang arith line, it will be ignored, and the result is always 7.

A reader submodule is obliged to export a read-syntax function that takes two arguments and returns a syntax object. The syntax object must be a module form, and the usual expansion of modules (based on macros) takes over from there. In the module above, the reader submodule from "arith/main.rkt" provides a read-syntax function that calls (ignore in) to read and discard all characters from in, and it always returns the same module whose body is 7.

To actually parse the body of a #lang arith module, we want to call a function that parses the body “as usual,” like s-exp. As it happens, the racket language provides a read-syntax function—and it’s almost, but not quite, what we want. The read-syntax function from racket reads a single form and doesn’t wrap it as a module.

We could write a loop to drive the read-syntax function, and then call it in a read-module-syntax function to be exported as read-syntax:

(module reader racket
  (provide (rename-out [read-module-syntax read-syntax]))

  (define (read-syntax-all src in)
    (define e (read-syntax src in))
    (if (eof-object? e)
        '()
        (cons e (read-syntax-all src in))))

  (define (read-module-syntax src in)
    (define lang (datum->syntax #f 'arith))
    #`(module whatever #,lang
        #,@(read-syntax-all src in))))

There’s some subtlety here, in that we need an arith initial import that doesn’t look like it comes from any module. The result of #'arith would look like it comes from the read submodule, but using (datum->syntax #f 'arith) generates a symbol object like #'arith but without any claimed context. There are also some minor parts of the protocol that this submodule doesn’t satisfy, such as the fact that a reader submodule is supposed to provide a read function alongside read-syntax.

Fortunately, we don’t have to implement the reader submodule in raw racket. It’s a Racket module, after all, so we can implement it any language and using any libraries that we prefer. The syntax/module-reader language creates a module that does all of the standard reader things for S-expressions, and you just have to tell it to use arith as the initial import for modules:

(module reader syntax/module-reader
arith)

We’ll see other convenient libraries for implementing readers on Thursday.

For now, the punchline is that you can just add (module reader syntax/module-reader enclosing-module-name) to a module to turn it from something that works with #lang s-exp to something that works with just #lang.

A language’s reader configuration is in a submodule for two reasons:

The language does not necessarily want to expose a read-syntax binding from the language itself, so read-syntax should not be exported alongside other bindings of the language module.
The reader for a language is useful independent of the rest of the language implementation. When a module is compiled, any of its submodules can be loaded and run without loading or running the enclosing module, so the reader submodule can be used independently.

7.3 Runtime Configuration

As we think about creating languages this week, we’ll spend most of our time thinking about the syntax of the language and the semantics through that syntax—including how it elaborates into some other language via macros. But when we build a program with multiple modules that are written in multiple languages, there may be interesting interactions of data and control across module boundaries at run time; that’s one more thing for a programmer to potentially select and a language implementation to potentially provide.

One particularly common language-sensitive facet of run-time behavior is the way that values are printed. If you run the program

"apple.rkt"
#lang racket
'apple

then the result value will print just the way it appears in the program: as ’apple. But if you run

"banana.ss"
#lang scheme
'banana

then the result prints as banana, without the quote, because that’s the traditional way for Scheme. The underlying data representation is the same, and you can see that by writing a program that uses both modules:

"fruit.rkt"
#lang racket
(require "apple.rkt")
(require "banana.ss")

That is, the output will be ’apple and then ’banana. If you change "fruit.rkt" to #lang scheme, then the output is apple and then banana.

The racket and scheme languages are able to adjust the format of printed values by injecting a submodule into the modules that they generate (as opposed to including another submodule in the language implementation). When you ask Racket or DrRacket to start a program as particular module, Racket or DrRacket will first look for a configure-runtime submodule and run that before the module. (Recall that a submodule can be run independent of its enclosing module, so it’s possible to run the configure-runtime submodule first.) The configure-runtime submodule is used only for that “main” module of the program, and not for any other modules that it imports.

How do racket and scheme inject a submodule? Remember that they get to define #%module-begin as wrapped around the body of a module using the language. We can play the same game in "main/arith.rkt" by exporting our own #%module-begin that chains to #%module-begin from racket, but adds a customized configure-runtime submodule:

(provide (rename-out [hedging-module-begin #%module-begin]
                     ....))

(define-syntax (hedging-module-begin stx)
  (syntax-parse stx
    [(_ form ...)
     #'(#%module-begin
        form ...
        (module configure-runtime racket/base
          (current-print
           (lambda (v)
             (unless (void? v)
               (printf "about ~s\n" v))))))]))

Wait... If racket’s #%module-begin adds its own configure-runtime submodule, why don’t the two submodules collide? It turns out that racket’s #%module-begin adds a configure-runtime submodule only if there isn’t one already, and hedging-module-begin has already added one.

7.4 The Little Syntaxer

What’s the value of 1?

That’s the number 1.

Can we type 1 in DrRacket’s interactions window and see the value?

Sure, assuming that the definitions window starts #lang racket.
> 1
1

What’s the value of #'1?

That’s a syntax tree, a.k.a. syntax object, that has a 1 inside it. A macro would return that to expand to just the number 1.
> (define-syntax (one stx)
#'1)
> (one)
1

Can we type #'1 in DrRacket’s interactions window and see the value?

Let’s try it...
> #'1
#<syntax:eval:5:0 1>
That works! Why?

The full reason is that #'1 is shorthand for (syntax 1), and the syntax form is provided by the racket language for both run-time and compile-time expressions.
> #'1
#<syntax:eval:6:0 1>
> (syntax 1)
#<syntax:eval:7:0 1>

I notice that the printed form of a syntax object seems to start with a source location, and the line number gets bigger each time we enter a new expression.
> #'1
#<syntax:eval:8:0 1>
> #'1
#<syntax:eval:9:0 1>

Yes it does. Try syntax-quoting some things other than just 1.

> #'2
#<syntax:eval:10:0 2>
> #'(+ 3 4)
#<syntax:eval:11:0 (+ 3 4)>
> #'my-function
#<syntax:eval:12:0 my-function>
> #'(define (f x)
(+ x 1))
#<syntax:eval:13:0 (define (f x) (+ x 1))>
> #'#t
#<syntax:eval:14:0 #t>

Is there a way to get the 1 out of #'1?

I could print #'1 and then try to parse the 1 back out. That’s probably not the right way.

Try syntax-e.

> (syntax-e #'1)
1
> (syntax-e #'2)
2
> (syntax-e #'(+ 3 4))
'(#<syntax:eval:17:0 +> #<syntax:eval:17:0 3> #<syntax:eval:17:0 4>)
So, if I want to inspect a big syntax tree, I could use syntax-e plus list-manipulation functions to look at subtress.

You could, but syntax-parse is better at that.

Speaking of syntax-parse, can you use it directly in the interactions window?

> (syntax-parse #'(f 1)
    [(_ n:number) #'"that's a number"])
eval:18:0: _: wildcard not allowed as an expression
  in: (_ n:number)
No. (Weird error.) But using only #lang racket, we also didn’t get syntax-parse within define-syntax:
> (define-syntax (f stx)
    (syntax-parse stx
      [(_ n:number) #'"that's a number"]))
eval:19:0: _: wildcard not allowed as an expression
  in: (_ n:number)

What a terrible error message! And it’s a shame that syntax-parse doesn’t just work with #lang racket.

True, but we know how to import it:
> (require (for-syntax syntax/parse))
> (define-syntax (f stx)
    (syntax-parse stx
      [(_ n:number) #'"that's a number"]))
> (f 1)
"that's a number"
> (f something-else)
eval:23.0: f: expected number
  at: something-else
  in: (f something-else)

So, now can you also use syntax-parse directly in the interactions window?

> (syntax-parse #'(f 1)
[(_ n:number) #'"that's a number"])
eval:24:0: _: wildcard not allowed as an expression
in: (_ n:number)
Apparently not. What went wrong?

Your (require (for-syntax syntax/parse)) imports the syntax/parse library “for syntax,” which means “for compile-time expressions.” It doesn’t make syntax-parse available for run-time expression.

Aha!
> (require syntax/parse)
> (syntax-parse #'(f 1)
[(_ n:number) #'"that's a number"])
#<syntax:eval:26:0 "that's a number">
But why would I ever want to do that?

Well, you might want to manipulate syntax objects in a helper function. For example, if somehow you lose faith in Racket’s compiler, you might want to manually optimize arithemtic operations on literal numbers.

Something like this?
> (require syntax/parse)
> (define (opt stx)
    (syntax-parse stx
      #:literals (+ -)
      [(+ a:number b:number) (+ (syntax-e #'a) (syntax-e #'b))]
      [(- a:number b:number) (- (syntax-e #'a) (syntax-e #'b))]
      [_ stx]))
> (opt #'(+ 1 2))
3
> (opt #'(- 7 3))
4

I bet your opt doesn’t work as well as you intended for #'(+ 1 (+ 3 4)).

> (opt #'(+ 1 (+ 3 4)))
#<syntax:eval:31:0 (+ 1 (+ 3 4))>
You’re right. But I can recur.
> (require syntax/parse)
> (define (deep-opt stx)
    (syntax-parse stx
       [(o a b) (opt #`(o #,(opt #'a) #,(opt #'b)))]
       [_ stx]))
> (deep-opt #'(+ 1 (+ 3 4)))
8
> (deep-opt #'(* 1 (+ 3 4)))
#<syntax:eval:33:0 (* 1 7)>
I’m only handling nested addition and subtraction, but let’s just trust Racket to optimize multiplication and division.

Fine. But sometimes you’re getting back a plain number, and sometimes you’re getting back a syntax object.

Yes, when opt optimizes, it returns a number. I guess it’s more consistent to always return a syntax object. And I guess I can start a syntax-object result with #` and then immediately escape with #, to coerce to a syntax object.
> #`#,1
#<syntax 1>
> (define (opt stx)
    (syntax-parse stx
      #:literals (+ -)
      [(+ a:number b:number) #`#,(+ (syntax-e #'a) (syntax-e #'b))]
      [(- a:number b:number) #`#,(- (syntax-e #'a) (syntax-e #'b))]
      [_ stx]))
> (deep-opt #'(+ 1 (+ 3 4)))
#<syntax 8>

Neat trick. Can you write a macro now that uses opt to optimize some subexpression?

That sounds easy.
> (define-syntax (let-fast stx)
    (syntax-parse stx
      [(_ ([id rhs]) body)
       #`(let ([id #,(deep-opt #'rhs)]) body)]))
> (let-fast ([x (+ 1 2)]) x)
deep-opt: undefined;
cannot reference an identifier before its definition
  in module: top-level
  internal name: deep-opt
Huh. I really expected
(let-fast ([x (+ 1 2)]) x)
to get optimized to
(let ([x 3]) x)
And I did define deep-opt!

But you defined deep-opt as a run-time function...

... and a macro needs compile-time helper functions. Got it. Let’s use define-for-syntax.
> (define-for-syntax (opt stx)
    (syntax-parse stx
      #:literals (+ -)
      [(+ a:number b:number) #`#,(+ (syntax-e #'a) (syntax-e #'b))]
      [(- a:number b:number) #`#,(- (syntax-e #'a) (syntax-e #'b))]
      [_ stx]))
> (define-for-syntax (deep-opt stx)
    (syntax-parse stx
       [(o a b) (opt #`(o #,(opt #'a) #,(opt #'b)))]
       [_ stx]))
> (define-syntax (let-fast stx)
    (syntax-parse stx
      [(_ ([id rhs]) body)
       #`(let ([id #,(deep-opt #'rhs)]) body)]))
> (let-fast ([x (+ 1 2)]) x)
3
I can’t tell whether let-fast makes it go faster, but it gives the answer I wanted.

Try this:
> (syntax->datum
(expand
#'(let-fast ([x (+ 1 2)]) x)))
'(let-values (((x) '3)) x)

It looks like expand is a neanderthal stepper that jumps right to the end, syntax->datum is a kind of recursive syntax-e, and forms like let expand sometimes to a more general form. Anyway, I see the 3 that I expected.

There’s just one last thing I don’t like about let-fast. Try
(let-fast ([x (/ (+ 1 2) 0)]) x)

Well, (+ 1 2) will get replaced with 3, but no matter, because there will be a divide-by-zero error.
> (let-fast ([x (/ (+ 1 2) 0)]) x)
/: division by zero
Hmm... You don’t see it here, but the pink error highlight isn’t around the (/ (+ 1 2) 0). It’s around
(o #,(opt #'a) #,(opt #'b))
in deep-opt.

That’s the part I don’t like.

Well, that’s where the failing run-time expression (+ 3 0) is constructed after optimization of (+ 1 2) to 3. But, I agree, we’d rather DrRacket highlight the original expression. How can we make it do that?

There’s a variant of syntax called syntax/loc. It let you provide an existing syntax object whose source location is used for the new one.
> (define three-stx #'3)
> (syntax 4)
#<syntax:eval:48:0 4>
> (syntax/loc three-stx 4)
#<syntax:eval:47:0 4>
Note that the line number went down at that last one, because we used the line number for #'3.

When we use #`, that’s shorthand for quasisyntax, and there’s also a quasisyntax/loc.
> (quasisyntax/loc three-stx
(#,(+ 5 6)))
#<syntax:eval:47:0 (11)>

Let’s use that to copy the original location over to an expression when we optimize nested expressions:
> (define-for-syntax (deep-opt stx)
    (syntax-parse stx
       [(o a b) (opt (quasisyntax/loc stx
                       (o #,(opt #'a) #,(opt #'b))))]
       [_ stx]))
> (let-fast ([x (/ (+ 1 2) 0)]) x)
/: division by zero
Now the pink is in the right place.

I think you’re ready for Lab My First Real Language.

← prev up next →

1	Racket and Language-oriented Programming
2	Lab Syntax, More Syntax, ...
3	Parsing Syntax, Syntax Classes
4	Lab ... and Yet More Syntax
5	Building a Language
6	Lab My First Language
7	Completing the Language
8	Lab My First Real Language
9	Lexing and Parsing
10	Lab Parsing Modules
11	Building an Ugly Language
12	Lab My First Ugly Language
13	Types and Type Checking
14	Lab My First Typed Language
15	Building a Typed Language with Macros and Turnstile
16	Lab My Second Typed Language
17	Language Gems I
18
19	Language Gems II
20	Good Bye

7.1	Getting Rid of the Quotes
7.2	Getting Rid of s-exp
7.3	Runtime Configuration
7.4	The Little Syntaxer