On this page:
1.1 Who am I? Why am I here?
1.2 One Programming Language, Many Languages
1.3 Racket and Language-Oriented Programming
1.4 Extending Languages
1.4.1 Syntax and Functions on Syntax
1.4.2 Pattern Matching Syntax
7.0.0.6

1 Racket and Language-oriented Programming

Goals

what this week is about

how we will approach it

getting started with a basic review of the syntax system

1.1 Who am I? Why am I here?

And where am I from?

If you haven’t already done so, please fill out the Racket Summer School Survey

1.2 One Programming Language, Many Languages

Almost every modern programming language comes with several distinct sub-languages that deal with
  • formatting strings

  • regexp (or regular expression) matching

  • (algebraic) term matching

  • event-handling

  • database queries

Racket is the second-best programming language in the world, and it definitely comes with all of the above.

Think Instantiate the above bullets for your favorite programming language.

Language designers accept that code communicates ideas (about problems and solutions) across time and that clear expression of ideas greatly facilitates communication. They therefore include these sub-languages because they know that these niche problem areas in programming—from preparing a string for output to querying a database—have their own ways of describing solutions. They also know that developers know these niche languages and that using them communicates solutions effectively.

An experienced language designer also knows that a language can check more properties than an API. Figure 2 and figure 3 illustrates the contrast between the two approaches. The code fragment in figure 2 shows how a programmer may use a library to describe a flow network and request a minimal-cost path from one node to another. The first two definitions create sets of nodes and edges; the third one creates the graph from those two. And the last one requests the solution.

#lang racket
 
(require network/flow)
 
(define nodes (set "a" "b" "c" "d"))
(define edges (set '("a" 10 "b") '("a" 20 "c") '("b" 30 "d") '("c" 30 "d")))
(define G (graph nodes edges))
 
(find-min-path G "a" "d" argmin)

Figure 2: Languages vs APIs

The notation is not one an applied mathematician would use to state the problem. While this expert might understand the code, it still takes a significant cognitive effort to match it to the original problem and considerably more to produce the code. Similarly, the code also suffers from the unfortunate choice of using strings to represent nodes, which blindsides at least the compiler and IDE: (1) the compiler cannot check whether a specified edge refers to existing nodes (typos can result in “dangling” or, worse, incorrect references); (2) the IDE cannot show how a reference to a node is matched by its definition. Although other developers may do better, the code quality is definitely a direct function of the users of the library.

By contrast, the creator of a domain-specific language (DSL) can force all users to deliver code that satisfies these desirable properties. Concretely, take a look at figure 3. The left-hand side shows how the code from above looks in a Racket-y DSL; the right-hand side is similar but uses “ugly” syntax in lieu of Racket’s beautiful parentheses.Implementing the right-hand side requires an additional 50 lines of code over the left-hand side, including two or three hints to enable DrRacket to show lexical information.

#lang network-flow
 
(node a (b 10) (c 20))
(node b (d 30))
(node c (d 30))
(node d)
 
(find-min-path a d argmin)

     

#lang network-flow/ugly
 
problem: maximize from a to d
 
node a -- 10 --> b
node a -- 20 --> c
node b -- 30 --> d
node c -- 30 --> d

Figure 3: Languages vs APIs

With the DSL’s notation, an applied mathematician can easily comprehend (and manipulate) the problem statement. This DSL version also treats nodes as identifiers so that the compiler can check that nodes are declared and the IDE can connect node declarations to node references with arrows (see figure), improving program comprehension. Additionally, the DSL can internally associate strings with nodes so that a path can be rendered in terms of the original source text.

1.3 Racket and Language-Oriented Programming

The preceding section works out two important insights. First, language designers create languages by composing special-purpose languages. They eliminate a lot of boiler plate and help check properties. Programs consist of components in these special-purpose languages that are fluidly interconnected. Second, this first insight does not just hold for the many special-purpose situations within the domain of programming. It applies to (almost) all problem domains.

Some of the instructors of this summer school have authored a magazine article for the Communications of the ACM on this guideline and its consequences. You may wish to read it if you would like to see more of the big picture.

Racket translates these insights into an explicit design goal:

Racket empowers developers to add (sub)languages, and the process of adding these languages to the existing eco-system is free of any friction.

We call this language-oriented programming.

To bring across this point, this summer school presents the tools for creating languages in a bottom-up, back-to-front manner:
  • Day 1 Matthias will review (for some participants, introduce) Racket’s syntax system and tools for building extensions to Racket.

  • Day 2 Matthew F. will build on the tools to build Racket-like languages.

  • Day 3 Stephen will show how to equip such embedded languages with type systems.

  • Day 4 Matthew B. will equip languages with “real” (aka “ugly”) syntax.

  • Day 5 Robby, Jay, and Matthew will give lectures on languages that they have built for production uses and sometimes just because Racketeers want to have fun.

1.4 Extending Languages

Goals

Review define-syntax and functions on syntax.

Review syntax-parse and how to design such functions effectively.

1.4.1 Syntax and Functions on Syntax

Racket’s compiler is programmable. The most basic way to change the compiler is to define a kind of compile-time function called a macro. This is done with define-syntax, which superficially looks like an ordinary function definition:

regular function

     

macro

(define (f x) 5)

     

(define-syntax (f x) #'5)

Two visual differences jump out immediately: the keyword define versus define-syntax, and the result 5 versus #'5. While a run-time function returns values, a compile-time function that implements a macro must generate code, and #' (short-hand for syntax) turns a 5 into code that is compiled to the literal constant.

So how do we use f? Like a function, except that it is called at compile time, not at run time:
> (f 10)

5

Just to make sure, try this with plain 5 as the function body:
> (define-syntax (f-bad stx) 5)
> (f-bad 10)

f-bad: received value from syntax expander was not syntax

  received: 5

The use of a macro like f can take any number of “arguments” grouped with f in parentheses:
> (f)

5

> (f 10 "hello world")

5

Weird? Let’s print the input to see what’s going on:
(define-syntax (f-display stx)
  (displayln stx)
  #'5)
) Now we can see what the compiler hands to the compile-time function:
> (f-display)

#<syntax:eval:10:0 (f-display)>

5

> (f-display 10)

#<syntax:eval:11:0 (f-display 10)>

5

> (f-display 10 "hello world")

#<syntax:eval:12:0 (f-display 10 "hello world")>

5

Insight The argument of a compile-time function f is the entire syntax tree that is written with f (or f-display here), that is, the root is the name of the function. So, the use of f can have multiple parts, but they’re grouped together as a single syntax-tree argument to the compile-time function that implements f.

Since Racket belongs to the Lisp family, there is a function that extracts the underlying list from the syntax tree. We can, for example, take it is length and translate a syntax tree into code that represents its length:
; Syntax -> Syntax
; extract the length of the given tree and generate code from this value
(define-syntax (g stx)
  (define n (length (syntax->list stx)))
  #`#,n)

What’s really new here is #`x and #,y. The former is like Racket’s quasiquote and the latter is of course unquote, meaning the former sets up a code-generating context and the latter allows an escape to Racket. And now watch:
> (g)

1

> (g a)

2

> (g a b)

3

Syntax trees come with additional properties, and a macro can access those. For example, it can retrieve the line where the tree shows up or its column:
; Syntax -> Syntax
; extract the source location information and generate code from it
(define-syntax (i stx)
  (define l (syntax-line stx))
  (define c (syntax-column stx))
  #`(list "line and column info" #,l #,c))
> (i 1)

'("line and column info" 18 0)

> (i 2)

'("line and column info" 19 0)

> (i 3)

'("line and column info" 20 0)

Of course, the real purpose of macros is to generate new syntax trees that say something interesting. Let’s use Racket’s list to generate such a syntax tree:
; Syntax -> Syntax
; (define-world name) defines name to a string
(define-syntax (define-world stx)
  (define expression (syntax->list stx))
  (define identifier (second expression))
  (define code (list 'define identifier "hello world, how are you?"))
  (datum->syntax stx code))

> (define-world hello)
> hello

"hello world, how are you?"

The function extracts the identifier from the second position and creates a list from 'define, the identifier, and a string. This looks like a list-based representation of a definition, but it is not code. Hence, the last line, which translates that list into code with datum->syntax.

Let’s take one last look at datum->syntax, because we will run into it again. Its first argument is stx above, the given syntax tree. The function copies properties from stx to the new syntax tree, in particular, it copes over scope information. We will get back to this later.

1.4.2 Pattern Matching Syntax

How do macros really take apart their syntax arguments? Pattern matching, of course!

Unsurprisingly, syntax-parse is yet another embedded language for pattern matching in Racket. In contrast to match, it is tuned to help with syntax-processing functions:
; Syntax -> Syntax
; (define-world-v2 name) is equivalent to define-world
; with a simpler definition
(define-syntax (define-world-v2 stx)
  (syntax-parse stx
    ((_ x) #'(define x "world"))))
Morally, define-world-v2 is the same function as define-world. But look how much easier it is to write it down and how much easier it is to read.

If this doesn’t work for you, you need to (require (for-syntax syntax/parse)) to your Definitions Window.

> (define-world-v2 x)
> x

"world"

> (define-world-v2 y)
> y

"world"

And even better, imagine someone doesn’t use it properly:
> (define-world-v2 1)

define: bad syntax

  at: 1

  in: (define 1 "world")

Now we get an error about define, even though the example doesn’t use this form. The “client developer” of define-world-v2 needs to know that define-world-v2 was used the wrong way.

We can add annotations to tell the pattern matcher that the second part of the syntax tree must be an identifier or id for short:
; Syntax -> Syntax
; like (define-world-v2 name) but expresses errors in terms of itself
(define-syntax (define-world-v3 stx)
  (syntax-parse stx
    ((_ x:id) #'(define x "world"))))
When a programmer uses define-world-v3 the wrong way, the error message explains the problem in terms of define-world-v3 not the broken code it generates due to bad syntax trees:
> (define-world-v3 1)

define-world-v3: expected identifier

  at: 1

  in: (define-world-v3 1)

We can even express this as a test:
> (require rackunit)
> (require syntax/macro-testing)
> (check-exn #rx"define-world-v3: expected identifier"
             (lambda ()
               (convert-syntax-error (define-world-v3 1))))
And there is no output, because the test succeeds.

Problem Supposed we want a language construct that defines more than one name to stand for this amazing string.

With syntax-parse, creating this extension is also straightforward. Here is the pattern part:
; Syntax -> Syntax
; (define-world* name ...) defines all name ... to stand for a string
(define-syntax (define-world* stx)
  (syntax-parse stx
    [(_ x:id ...) #'***]))
This is precisely when the famous dot notation comes to the rescue. In the language of syntax-parseyes, it is a sub-language just like the ones we discussed at the beginning—p ... denotes a possibly empty sequence of patterns that look like p, and the entire sequence is a pattern all by itself. So, if p is x:id, we have a pattern that matches a sequence of identifiers. By contrast, (y:id n:num) ... is a pattern that matches a sequence of pairs with an identifier in the first position and a number in the second.

A sequence pattern must be spliced into the #'*** template as a whole. Since this macro is meant to define all given names to our favorite string, we use define-values to introduce definitions for all of them, and this step is relatively obvious:
; Syntax -> Syntax
; (define-world* name ...) defines all name ... to stand for a string
(define-syntax (define-world* stx)
  (syntax-parse stx
    ((_ x:id ...)
     #'(define-values (x ...) (values (begin 'x "world") ...)))))
The right-hand side of define-values must produces as many values as there are identifiers. To create that many values, we use a trick:

(begin 'x "world") ...

If we ignore the quote for a moment, this head template creates a begin expression with the identifier as the first sub-expression and the desired string as the second. The ... that follow say instantiate this template as many times as there are xes. If we chosen plain x, however, Racket would try to find the value of x and it doesn’t have one on the right-hand side of its own definition; but, by quoting x we create a dummy value—a symbol—that begin can discard.

Let’s watch this macro in action:
> (define-world* hello good bye)
> hello

"world"

> good

"world"

> bye

"world"

As promised, all specified identifiers now stand for our favorite string.

Problem Develop the syntactic extension some. It supports the following surface syntax:

(some e0:expr e1:expr ... en:expr)

That is, there is at least one expression though there might be arbitrarily many.

The some expression produces the list of all non-#false results of the given expressions until it encounters #false. If there are no non-#false results, some produces #false.

Let’s go about this systematically, starting with some tests:
(module+ test
  (check-equal? (some #f) #f)
  (check-equal? (some 1 2 #f 3) (list 1 2))
  (check-equal? (some 1 2 #f (displayln 'hello?)) (list 1 2))
  (check-equal? (some 1 2 (begin (displayln 'hello?) 3)) (list 1 2 3)))
Rule If a you can define an extension as a regular function instead of a macro, do so. While the first two don’t need an explanation, the last two are incomplete and need a bit of elaboration. In the third test, we would not want to see any output, because a #f precedes the effectful expression. Furthermore, since none of the preceding expressions evaluate to #f, we would expect to see one line of output for the last test What this says is that we cannot define some as a plain function, which would evaluate all arguments; it must be a macro.

To develop some, we will start with the simplest case, when there is only one sub-expression. We definitely need the value of this expression but we also want to evaluate the expression only once. This suggests the generation of a let expression:
(define-syntax (some stx)
  (syntax-parse stx
    [(_ e0:expr)
     #'(let ([value-of-e0 e0]) (and value-of-e0 (list value-of-e0)))]
    ...))
Once some has the value of e0, it can check whether it is non-#false and, if so, create a list of this value.

Like every self-respecting pattern-matching construct, syntax-parse can cope with as many patterns as necessary, and it does so in order. The second pattern must cope with more than one expression, which we can say as follows:

(e0:expr e1:expr ...)

If this pattern matches, there is at least one expression (e0) plus a possibly empty sequence (e1 ...), but because this is second pattern in syntax-parse and the first one deals with the case of a single expression, we know that the sequence won’t be empty.

Recursion at compile time works like this. When some generates the code (some e1 ...), it sends it back to the compiler. In turn, the compiler inspects this new syntax tree, finds the occurrence of a some expression, and calls the some function again. Now that you know this, you also understand that a Racket programmer has the power to make the compiler diverge.

The code for this case must first evaluate e0. If it is #false, the evaluation is done; if not, it must combine the value with the value of the sequence e1 ...and, naturally, macros can recur just like run-time functions:
(define-syntax (some stx)
  (syntax-parse stx
    [(_ e0:expr)
     #'(let ([value-of-e0 e0]) (and value-of-e0 (list value-of-e0)))]
    [(_ e0:expr e1:expr ...)
     #'(let ([v e0]) (and v (combine v (some e1 ...))))]))
Once it is established that the value of the first expression is a non-#f value, the code can delegate the rest of the work to a run-time function:
; Non-#false (U #false [Listof Any]) -> (U #false [Listof Any])
(define (combine value-of-e0 value-of-e1...)
  (if value-of-e1... (cons value-of-e0 value-of-e1...) (list value-of-e0)))
As the parameters indicate, combine consumes the values of e0 and the sequence e1 ... as determine by some. The rest is then merely a question of puzzling out the possible cases.

Question Why would the following clause not work?
[(_ e0:expr e1:expr ...)
 #'(let ([e e0]) (and e (or (and (some e1 ...) (cons e (some e1 ...))) (list e))))]

off to lab (Lab Syntax, More Syntax, ...)