7.0.0.6
7 Completing the Language
Goals |
— ... with a config module and readers |
Our #lang s-exp "arith.rkt" is a fantastic
language with a terrible name. Even though we’d like to keep using the
S-expression parser, we’d prefer to write just #lang arith to use the language.
7.1 Getting Rid of the Quotes
To refer to a module without quotes and ".rkt", the module’s
enclosing directory must be registered as a collection. A
collection houses a set of modules as files, and a file named
"name.rkt" in a registered collection directory
"collection" can be referenced without quotes as
collection/name. As a convenience,
collection by itself is recognized as a shorthand for
collection/main.
Although it’s possible to register an individual collection,
collections are more commonly defined within
packages;
installing the package registers its collections. At the same time,
it’s common to make a single-collection package where the package and
collection are the same directory and have the same name, and that’s
what we’ll do now:
Create a directory named "arith".
Copy your "arith.rkt" file into "arith/main.rkt".
Select Install Package.. in DrRacket’s File menu.
Click Browse and then Directory (in the
dialog that appears), and then select the "arith"
directory that you just created.
Click Install.
After those steps, you can drop the quotes and
".rkt", but must keep
s-exp:
The
s-exp parser converts that to
which works because
arith as the initial import can be mapped
to the
"arith/main.rkt" file. Similarly,
(require arith) would work in various contexts to import
"arith/main.rkt".
7.2 Getting Rid of s-exp
The next step is to make
parse as
The allowed form a name after
#lang is more restrictive than
the grammar of
require or the initial import of
module, but it overlaps with the registered-collection syntax,
as in
arith. A language name after
#lang is similarly
interpreted as a module path, so
arith refers to
"arith/main.rkt"—
but that module isn’t used directly.
Instead
#lang looks for a
submodule in
"arith/main.rkt", and specifically a submodule named
reader.
A submodule is just a module form that is nested in a module.
Unlike a whole-file module form, the name of a
submodule matters and is used to name the submodule.
Try this variant of the "main/arith.rkt" module:
and try this program:
which prints...
7! In fact, you can put whatever text you
want after the
#lang arith line, it will be ignored,
and the result is always
7.
A reader submodule is obliged to export a
read-syntax function that takes two arguments and
returns a syntax object. The syntax object must be a module
form, and the usual expansion of modules (based on macros) takes
over from there. In the module above, the reader submodule from
"arith/main.rkt" provides a read-syntax
function that calls (ignore in) to read and discard all
characters from in, and it always returns the same
module whose body is 7.
To actually parse the body of a #lang arith module,
we want to call a function that parses the body “as usual,” like
s-exp. As it happens, the racket
language provides a read-syntax function—and it’s almost, but
not quite, what we want. The read-syntax function from
racket reads a single form and doesn’t wrap it as a
module.
We could write a loop to drive the
read-syntax function,
and then call it in a
read-module-syntax function to be exported as
read-syntax:
There’s some subtlety here, in that we need an
arith
initial import that doesn’t look like it comes from any module.
The result of
#'arith would look like it comes from the
read submodule, but using
(datum->syntax #f 'arith)
generates a symbol object like
#'arith but without any
claimed context. There are also some minor parts of the protocol that this submodule
doesn’t satisfy, such as the fact that a
reader
submodule is supposed to provide a
read function
alongside
read-syntax.
Fortunately, we don’t have to implement the
reader
submodule in raw
racket. It’s a Racket module, after all, so
we can implement it any language and using any libraries that we
prefer. The
syntax/module-reader language creates a
module that does all of the standard reader things for S-expressions,
and you just have to tell it to use
arith as the initial
import for modules:
(module reader syntax/module-reader |
arith) |
We’ll see other convenient libraries for implementing readers on
Thursday.
For now, the punchline is that you can just add (module reader syntax/module-reader enclosing-module-name) to a module to
turn it from something that works with #lang s-exp
to something that works with just #lang.
A language’s reader configuration is in a submodule for two reasons:
The language does not necessarily want to expose a
read-syntax binding from the language itself, so
read-syntax should not be exported alongside other
bindings of the language module.
The reader for a language is useful independent of the rest of
the language implementation. When a module is compiled, any of
its submodules can be loaded and run without loading or running
the enclosing module, so the reader submodule can
be used independently.
7.3 Runtime Configuration
As we think about creating languages this week, we’ll spend most of
our time thinking about the syntax of the language and the semantics
through that syntax—including how it elaborates into some other language
via macros. But when we build a program with multiple
modules that are written in multiple languages, there may be
interesting interactions of data and control across module boundaries
at run time; that’s one more thing for a programmer to potentially
select and a language implementation to potentially provide.
One particularly common language-sensitive facet of run-time behavior
is the way that values are printed. If you run the program
then the result value will print just the way it appears in the
program: as ’apple. But if you run
then the result prints as banana, without the quote,
because that’s the traditional way for Scheme. The underlying data
representation is the same, and you can see that by writing
a program that uses both modules:
That is, the output will be
’apple and then
’banana. If you change
"fruit.rkt" to
#lang scheme, then the output is
apple and then
banana.
The racket and scheme languages are
able to adjust the format of printed values by injecting a submodule
into the modules that they generate (as opposed to including another
submodule in the language implementation). When you ask Racket or
DrRacket to start a program as particular module, Racket or DrRacket
will first look for a configure-runtime submodule and
run that before the module. (Recall that a submodule can be run
independent of its enclosing module, so it’s possible to run the
configure-runtime submodule first.) The
configure-runtime submodule is used only for that
“main” module of the program, and not for any other modules that it
imports.
How do
racket and
scheme inject a
submodule? Remember that they get to define
#%module-begin as
wrapped around the body of a module using the language. We can play
the same game in
"main/arith.rkt" by exporting our own
#%module-begin that chains to
#%module-begin from
racket,
but adds a customized
configure-runtime submodule:
Wait... If racket’s #%module-begin adds its
own configure-runtime submodule, why don’t the two
submodules collide? It turns out that racket’s
#%module-begin adds a configure-runtime
submodule only if there isn’t one already, and
hedging-module-begin has already added one.
7.4 The Little Syntaxer
What’s the value of 1?
| | |
| | That’s the number 1.
|
Can we type 1 in DrRacket’s interactions window and see the value?
| | |
| | Sure, assuming that the definitions window starts #lang racket.
|
What’s the value of #'1?
| | |
| | That’s a syntax tree, a.k.a. syntax object, that has a 1 inside it.
A macro would return that to expand to just the number 1.
|
Can we type #'1 in DrRacket’s interactions window and see the value?
| | |
| | Let’s try it... > #'1 | #<syntax:eval:5:0 1> |
That works! Why?
|
The full reason is that #'1 is shorthand for (syntax 1),
and the syntax form is provided by the racket language
for both run-time and compile-time expressions. > #'1 | #<syntax:eval:6:0 1> | > (syntax 1) | #<syntax:eval:7:0 1> |
| | |
| | I notice that the printed form of a syntax object seems to start with a source
location, and the line number gets bigger each time we enter a new expression. > #'1 | #<syntax:eval:8:0 1> | > #'1 | #<syntax:eval:9:0 1> |
|
Yes it does. Try syntax-quoting some things other than just 1.
| | |
| | > #'2 | #<syntax:eval:10:0 2> | > #'(+ 3 4) | #<syntax:eval:11:0 (+ 3 4)> | > #'my-function | #<syntax:eval:12:0 my-function> | | #<syntax:eval:13:0 (define (f x) (+ x 1))> | > #'#t | #<syntax:eval:14:0 #t> |
|
Is there a way to get the 1 out of #'1?
| | |
| | I could print #'1 and then try to parse the 1 back out.
That’s probably not the right way.
|
Try syntax-e.
| | |
| | So, if I want to inspect a big syntax tree, I could use syntax-e
plus list-manipulation functions to look at subtress.
|
You could, but syntax-parse is better at that. Speaking of syntax-parse, can you use it directly in the interactions window?
| | |
| | | eval:18:0: _: wildcard not allowed as an expression | in: (_ n:number) |
No. (Weird error.) But using only #lang racket, we also didn’t get
syntax-parse within define-syntax: | eval:19:0: _: wildcard not allowed as an expression | in: (_ n:number) |
|
What a terrible error message! And it’s a shame that syntax-parse doesn’t
just work with #lang racket.
| | |
| | True, but we know how to import it: > (require (for-syntax syntax/parse)) | | > (f 1) | "that's a number" | > (f something-else) | eval:23.0: f: expected number | at: something-else | in: (f something-else) |
|
So, now can you also use syntax-parse directly in the interactions window?
| | |
| | | eval:24:0: _: wildcard not allowed as an expression | in: (_ n:number) |
Apparently not. What went wrong?
|
Your (require (for-syntax syntax/parse)) imports the
syntax/parse library “for syntax,” which means
“for compile-time expressions.” It doesn’t make
syntax-parse available for run-time expression.
| | |
| | Aha! > (require syntax/parse) | | #<syntax:eval:26:0 "that's a number"> |
But why would I ever want to do that?
|
Well, you might want to manipulate syntax objects in a helper
function. For example, if somehow you lose faith in Racket’s
compiler, you might want to manually optimize arithemtic
operations on literal numbers.
| | |
| | Something like this? > (require syntax/parse) | | > (opt #'(+ 1 2)) | 3 | > (opt #'(- 7 3)) | 4 |
|
I bet your opt doesn’t work as well as you intended for
#'(+ 1 (+ 3 4)).
| | |
| | > (opt #'(+ 1 (+ 3 4))) | #<syntax:eval:31:0 (+ 1 (+ 3 4))> |
You’re right. But I can recur. > (require syntax/parse) | > (define (deep-opt stx) | (syntax-parse stx | [(o a b) (opt #`(o #,(opt #'a) #,(opt #'b)))] | [_ stx])) |
| > (deep-opt #'(+ 1 (+ 3 4))) | 8 | > (deep-opt #'(* 1 (+ 3 4))) | #<syntax:eval:33:0 (* 1 7)> |
I’m only handling nested addition and subtraction, but let’s just
trust Racket to optimize multiplication and division.
|
Fine. But sometimes you’re getting back a plain number, and sometimes you’re getting back a syntax object.
| | |
| | Yes, when opt optimizes, it returns a number. I guess it’s
more consistent to always return a syntax object. And I guess I can start
a syntax-object result with #` and then immediately escape with
#, to coerce to a syntax object. > #`#,1 | #<syntax 1> | | > (deep-opt #'(+ 1 (+ 3 4))) | #<syntax 8> |
|
Neat trick. Can you write a macro now that uses opt to optimize some subexpression?
| | |
| | That sounds easy. | > (let-fast ([x (+ 1 2)]) x) | deep-opt: undefined; | cannot reference an identifier before its definition | in module: top-level | internal name: deep-opt |
Huh. I really expected (let-fast ([x (+ 1 2)]) x)
to get optimized to And I did define deep-opt!
|
But you defined deep-opt as a run-time function...
| | |
| | ... and a macro needs compile-time helper functions.
Got it. Let’s use define-for-syntax. | | | > (let-fast ([x (+ 1 2)]) x) | 3 |
I can’t tell whether let-fast makes it go faster, but it
gives the answer I wanted.
|
Try this: | '(let-values (((x) '3)) x) |
| | |
| | It looks like expand is a neanderthal stepper that
jumps right to the end, syntax->datum is a kind of
recursive syntax-e, and forms like let expand
sometimes to a more general form. Anyway, I see the 3
that I expected.
|
There’s just one last thing I don’t like about let-fast.
Try (let-fast ([x (/ (+ 1 2) 0)]) x)
| | |
| | Well, (+ 1 2) will get replaced with 3, but
no matter, because there will be a divide-by-zero error. > (let-fast ([x (/ (+ 1 2) 0)]) x) | /: division by zero |
Hmm... You don’t see it here, but the pink error highlight
isn’t around the (/ (+ 1 2) 0). It’s around
(o #,(opt #'a) #,(opt #'b))
in deep-opt.
|
That’s the part I don’t like.
| | |
| | Well, that’s where the failing run-time expression (+ 3 0) is constructed after optimization of (+ 1 2) to
3. But, I agree, we’d rather DrRacket highlight the
original expression. How can we make it do that?
|
There’s a variant of syntax called syntax/loc.
It let you provide an existing syntax object whose source location is
used for the new one. Note that the line number went down at that last one,
because we used the line number for #'3. When we use #`, that’s shorthand for quasisyntax,
and there’s also a quasisyntax/loc.
| | |
| | Let’s use that to copy the original location over to an expression
when we optimize nested expressions: | > (let-fast ([x (/ (+ 1 2) 0)]) x) | /: division by zero |
Now the pink is in the right place.
|
I think you’re ready for Lab My First Real Language.
| | |