1 Racket and Language-oriented Programming
Goals |
— |
— |
— |
1.1 Who am I? Why am I here?
And where am I from?
If you haven’t already done so, please fill out the Racket Summer School Survey
1.2 One Programming Language, Many Languages
formatting strings
regexp (or regular expression) matching
(algebraic) term matching
event-handling
database queries
Think Instantiate the above bullets for your favorite programming language.
Language designers accept that code communicates ideas (about problems and
solutions) across time and that clear expression of ideas greatly facilitates
communication. They therefore include these sub-languages because they know that
these niche problem areas in programming—
An experienced language designer also knows that a language can check more properties than an API. Figure 2 and figure 3 illustrates the contrast between the two approaches. The code fragment in figure 2 shows how a programmer may use a library to describe a flow network and request a minimal-cost path from one node to another. The first two definitions create sets of nodes and edges; the third one creates the graph from those two. And the last one requests the solution.
#lang racket (require network/flow) (define nodes (set "a" "b" "c" "d")) (define edges (set '("a" 10 "b") '("a" 20 "c") '("b" 30 "d") '("c" 30 "d"))) (define G (graph nodes edges)) (find-min-path G "a" "d" argmin)
The notation is not one an applied mathematician would use to state the problem. While this expert might understand the code, it still takes a significant cognitive effort to match it to the original problem and considerably more to produce the code. Similarly, the code also suffers from the unfortunate choice of using strings to represent nodes, which blindsides at least the compiler and IDE: (1) the compiler cannot check whether a specified edge refers to existing nodes (typos can result in “dangling” or, worse, incorrect references); (2) the IDE cannot show how a reference to a node is matched by its definition. Although other developers may do better, the code quality is definitely a direct function of the users of the library.
By contrast, the creator of a domain-specific language (DSL) can force all users to deliver code that satisfies these desirable properties. Concretely, take a look at figure 3. The left-hand side shows how the code from above looks in a Racket-y DSL; the right-hand side is similar but uses “ugly” syntax in lieu of Racket’s beautiful parentheses.Implementing the right-hand side requires an additional 50 lines of code over the left-hand side, including two or three hints to enable DrRacket to show lexical information.
#lang network-flow (node a (b 10) (c 20)) (node b (d 30)) (node c (d 30)) (node d) (find-min-path a d argmin)
#lang network-flow/ugly problem: maximize from a to d node a -- 10 --> b node a -- 20 --> c node b -- 30 --> d node c -- 30 --> d
With the DSL’s notation, an applied mathematician can easily comprehend (and manipulate) the problem statement. This DSL version also treats nodes as identifiers so that the compiler can check that nodes are declared and the IDE can connect node declarations to node references with arrows (see figure), improving program comprehension. Additionally, the DSL can internally associate strings with nodes so that a path can be rendered in terms of the original source text.
1.3 Racket and Language-Oriented Programming
The preceding section works out two important insights. First, language designers create languages by composing special-purpose languages. They eliminate a lot of boiler plate and help check properties. Programs consist of components in these special-purpose languages that are fluidly interconnected. Second, this first insight does not just hold for the many special-purpose situations within the domain of programming. It applies to (almost) all problem domains.
Some of the instructors of this summer school have authored a magazine article for the Communications of the ACM on this guideline and its consequences. You may wish to read it if you would like to see more of the big picture.
Racket empowers developers to add (sub)languages, and the process of adding these languages to the existing eco-system is free of any friction.
Day 1 Matthias will review (for some participants, introduce) Racket’s syntax system and tools for building extensions to Racket.
Day 2 Matthew F. will build on the tools to build Racket-like languages.
Day 3 Stephen will show how to equip such embedded languages with type systems.
Day 4 Matthew B. will equip languages with “real” (aka “ugly”) syntax.
Day 5 Robby, Jay, and Matthew will give lectures on languages that they have built for production uses and sometimes just because Racketeers want to have fun.
1.4 Extending Languages
Goals |
— |
— |
1.4.1 Syntax and Functions on Syntax
Racket’s compiler is programmable. The most basic way to change the compiler is to define a kind of compile-time function called a macro. This is done with define-syntax, which superficially looks like an ordinary function definition:
regular function |
| macro |
(define (f x) 5) |
| (define-syntax (f x) #'5) |
Two visual differences jump out immediately: the keyword define versus define-syntax, and the result 5 versus #'5. While a run-time function returns values, a compile-time function that implements a macro must generate code, and #' (short-hand for syntax) turns a 5 into code that is compiled to the literal constant.
> (f 10) 5
> (define-syntax (f-bad stx) 5) > (f-bad 10)
f-bad: received value from syntax expander was not syntax
received: 5
> (f) 5
> (f 10 "hello world") 5
(define-syntax (f-display stx) (displayln stx) #'5)
> (f-display) #<syntax:eval:10:0 (f-display)>
5
> (f-display 10) #<syntax:eval:11:0 (f-display 10)>
5
> (f-display 10 "hello world") #<syntax:eval:12:0 (f-display 10 "hello world")>
5
; Syntax -> Syntax ; extract the length of the given tree and generate code from this value
(define-syntax (g stx) (define n (length (syntax->list stx))) #`#,n)
> (g) 1
> (g a) 2
> (g a b) 3
; Syntax -> Syntax ; extract the source location information and generate code from it
(define-syntax (i stx) (define l (syntax-line stx)) (define c (syntax-column stx)) #`(list "line and column info" #,l #,c))
> (i 1) '("line and column info" 18 0)
> (i 2) '("line and column info" 19 0)
> (i 3) '("line and column info" 20 0)
; Syntax -> Syntax ; (define-world name) defines name to a string
(define-syntax (define-world stx) (define expression (syntax->list stx)) (define identifier (second expression)) (define code (list 'define identifier "hello world, how are you?")) (datum->syntax stx code))
> (define-world hello) > hello "hello world, how are you?"
Let’s take one last look at datum->syntax, because we will run into it again. Its first argument is stx above, the given syntax tree. The function copies properties from stx to the new syntax tree, in particular, it copes over scope information. We will get back to this later.
1.4.2 Pattern Matching Syntax
How do macros really take apart their syntax arguments? Pattern matching, of course!
; Syntax -> Syntax ; (define-world-v2 name) is equivalent to define-world ; with a simpler definition
(define-syntax (define-world-v2 stx) (syntax-parse stx ((_ x) #'(define x "world"))))
If this doesn’t work for you, you need to (require (for-syntax syntax/parse)) to your Definitions Window.
> (define-world-v2 x) > x "world"
> (define-world-v2 y) > y "world"
> (define-world-v2 1)
define: bad syntax
at: 1
in: (define 1 "world")
; Syntax -> Syntax ; like (define-world-v2 name) but expresses errors in terms of itself
(define-syntax (define-world-v3 stx) (syntax-parse stx ((_ x:id) #'(define x "world"))))
> (define-world-v3 1)
define-world-v3: expected identifier
at: 1
in: (define-world-v3 1)
> (require rackunit) > (require syntax/macro-testing)
> (check-exn #rx"define-world-v3: expected identifier" (lambda () (convert-syntax-error (define-world-v3 1))))
Problem Supposed we want a language construct that defines more than one name to stand for this amazing string.
; Syntax -> Syntax ; (define-world* name ...) defines all name ... to stand for a string (define-syntax (define-world* stx) (syntax-parse stx [(_ x:id ...) #'***]))
; Syntax -> Syntax ; (define-world* name ...) defines all name ... to stand for a string
(define-syntax (define-world* stx) (syntax-parse stx ((_ x:id ...) #'(define-values (x ...) (values (begin 'x "world") ...)))))
> (define-world* hello good bye) > hello "world"
> good "world"
> bye "world"
Problem Develop the syntactic extension some. It supports the following surface syntax:(some e0:expr e1:expr ... en:expr)
That is, there is at least one expression though there might be arbitrarily many.The some expression produces the list of all non-#false results of the given expressions until it encounters #false. If there are no non-#false results, some produces #false.
(module+ test (check-equal? (some #f) #f) (check-equal? (some 1 2 #f 3) (list 1 2)) (check-equal? (some 1 2 #f (displayln 'hello?)) (list 1 2)) (check-equal? (some 1 2 (begin (displayln 'hello?) 3)) (list 1 2 3)))
(define-syntax (some stx) (syntax-parse stx [(_ e0:expr) #'(let ([value-of-e0 e0]) (and value-of-e0 (list value-of-e0)))] ...))
(e0:expr e1:expr ...)
Recursion at compile time works like this. When some generates the code (some e1 ...), it sends it back to the compiler. In turn, the compiler inspects this new syntax tree, finds the occurrence of a some expression, and calls the some function again. Now that you know this, you also understand that a Racket programmer has the power to make the compiler diverge.
(define-syntax (some stx) (syntax-parse stx [(_ e0:expr) #'(let ([value-of-e0 e0]) (and value-of-e0 (list value-of-e0)))] [(_ e0:expr e1:expr ...) #'(let ([v e0]) (and v (combine v (some e1 ...))))]))
; Non-#false (U #false [Listof Any]) -> (U #false [Listof Any]) (define (combine value-of-e0 value-of-e1...) (if value-of-e1... (cons value-of-e0 value-of-e1...) (list value-of-e0)))
[(_ e0:expr e1:expr ...) #'(let ([e e0]) (and e (or (and (some e1 ...) (cons e (some e1 ...))) (list e))))]
off to lab (Lab Syntax, More Syntax, ...)