2.1 Code Generators

Version: 4.2.1

2.1 Code Generators

Synopsis: Code generators (or code-generating procedures) are procedures that construct new syntax objects. They are called by macros to help build the macro’s result; they are also used to construct code to evaluate using eval.

Examples: Implementation of complex macros such as match and unit.

Related patterns:

Lexical Context-Passing Code Generators

Macros benefit from abstraction as much as any other code. The overall task of transformation should be broken up into smaller tasks, each task implemented by a separate procedure or set of procedures. Those procedures that contain syntax templates require special attention to get right. I refer to such procedures as code generators (or code-generating procedures) because their purpose is usually to produce code for the macro’s result, although technically the same considerations apply to auxiliary procedures that use syntax constants (eg, literal identifiers) to parse the macro’s input.

Small auxiliary procedures can be written either within a macro itself as an internal definition or in the same module using define-for-syntax or begin-for-syntax. When there are many auxiliary procedures, or when they are complex, it is useful to place them in a separate module that is imported for-syntax.

As the section on Syntax Templates explains, an important issue when writing separate code-generating procedures is making sure the syntax templates inside of them have the right bindings in the right phase. If you move a syntax template from within a macro to another module (for example, to factor out part of a macro into an auxiliary procedure), you must take care to preserve the relevant bindings, including the appropriate phase adjustments.

The Lexical Context-Passing Code Generators pattern provides an alternative to require for-syntax and for-template.

2.1.1 Example

Let’s explore different arrangements of code-generating procedures for the implementation of the delay macro.

2.1.1.1 Same module

Here is the macro with the code generator as an internal definition:

  (module delay scheme/base
    (require (for-syntax scheme/base))
    (provide delay)

    (define-struct promise (thunk))

    (define-syntax (delay stx)
      ; gen-update-thunk-code : identifier stx -> stx
      (define (gen-update-thunk-code p-var expr)
        #`(lambda ()
            (let ([value #,expr])
              (set-promise-thunk! #,p-var (lambda () value))
              value)))
      (syntax-case stx ()
        [(delay e)
         #`(letrec ([p (make-promise
                        #,(gen-update-thunk-code #'p #'e))])
             p)])))

Let’s carefully consider the bindings in the first syntax template in this module, the one within the gen-update-thunk-code procedure. The template includes the identifiers lambda and set-promise-thunk! (among others) used as references at phase 0; that is, they are part of ordinary, “run-time” expressions. The context has a phase-0 binding for lambda from the scheme/base language; it has a phase-0 binding for set-promise-thunk! from the structure definition in the same module. All is well.

Here is the macro with the code generator in a begin-for-syntax form. The definition remains the same.

  (module delay scheme/base
    (require (for-syntax scheme/base))

    (define-struct promise (thunk))

    (begin-for-syntax
      ; gen-update-thunk-code : identifier stx -> stx
      (define (gen-update-thunk-code p-var expr)
        #`(lambda ()
            (let ([value #,expr])
              (set-promise-thunk! #,p-var (lambda () value))
              value))))

    (define-syntax (delay stx)
      (syntax-case stx ()
        [(delay e)
         #`(letrec ([p (make-promise
                        #,(gen-update-thunk-code #'p #'e))])
             p)])))

The analysis of the first template in this module is exactly the same, since the template’s context is the same (aside from the irrelevant binding of stx).

A warning about begin-for-syntax: the auxiliary definitions must occur before any macro definition that uses them; otherwise, an unbound variable error is raised. That is because macro definitions are compiled and executed as soon as they are encountered, so they must not contain free variables that are defined later in the module.

2.1.1.2 Separate module

Here is the standard way of factoring an auxiliary procedure into a separate module. The procedure is defined using define, just like any “normal” procedure definition. Since the procedure is used in the implementation of the macro transformation, its module must be imported for-syntax. In turn, the new module must import the bindings for the identifiers in its templates for-template. In essence, the for-syntax and for-template balance out as far as the template’s bindings are concerned.

One unfortunate wrinkle in this arrangement is that since the code generator refers to the promise struct, the code must be split into three modules, not just two, with the third module containing just the structure definition. It cannot go in the auxiliary module, because the structure definition needs to be at phase 0 relative to the 'delay module, and there is no begin-for-template. It cannot go in the 'delay module, because that would cause an import cycle.

  (module promises scheme/base
    (provide (struct-out promise))
    (define-struct promise (thunk)))

  (module delay-codegen scheme/base
    (require (for-template scheme/base
                           'promises))
    (provide gen-update-thunk-code)

    ; gen-update-thunk-code : identifier stx -> stx
    (define (gen-update-thunk-code p-var expr)
      #`(lambda ()
          (let ([value #,expr])
            (set-promise-thunk! #,p-var (lambda () value))
            value))))

  (module delay scheme/base
    (require 'promises
             (for-syntax scheme/base
                         'delay-codegen))
    (provide delay)

    (define-syntax (delay stx)
      (syntax-case stx ()
        [(delay e)
         #`(letrec ([p (make-promise
                        #,(gen-update-thunk-code #'p #'e))])
             p)])))

Let’s perform the binding/phase analysis of the syntax template in the 'delay-codegen module. As before, let’s just consider the two identifiers lambda and set-promise-thunk!. Looking at the context, we see that lambda has bindings at both phase 0 (from the module’s language import) and at phase -1 (from the explicit for-template import of scheme/base), and set-promise-thunk! has a binding at phase -1 from the import of 'promises. So this code is feasible as an expression at phase -1... but what does that mean?

The answer is that the template’s bindings are at phase -1 relative to the enclosing module (ie, 'delay-codegen), but that module is imported for-syntax by the module containing the macro that ultimately uses the template. A for-syntax import has a relative phase offset of 1, so the template’s bindings are at phase 0 relative to the macro that uses the template. And all is well again.

(An aside: The fact that the template works only as a phase -1 expression and not as a phase 0 expression means that it has the wrong bindings to pass to eval. In general, you cannot use the same modules macro-code generators and eval-code generators – at least not without an unconscionable “shotgun” approach to importing the same modules at multiple phases at once.)

2.1.1.3 Separate modules; a different approach

Here is another way of factoring out the auxiliary procedure into a separate module. Instead of adjusting phases using for-syntax and for-template imports, this arrangement preserves the phase structure of the begin-for-syntax solution. In fact, it is the begin-for-syntax solution, simply spread over multiple modules. As such, it retains the limitations of definitions within begin-for-syntax. On the other hand, it allows the code to be fit into two modules instead of three.

This arrangement essentially moves the for-syntax from the require form to the provide form.

  (module promises scheme/base
    (require (for-syntax scheme/base))
    (provide (struct-out promise)
             (for-syntax gen-update-thunk-code))

    (define-struct promise (thunk))

    (begin-for-syntax
      ; gen-update-thunk-code : identifier stx -> stx
      (define (gen-update-thunk-code p-var expr)
        #`(lambda ()
            (let ([value #,expr])
              (set-promise-thunk! #,p-var (lambda () value))
              value)))))

  (module delay scheme/base
    (require 'promises
             (for-syntax scheme/base))
    (provide delay)

    (define-syntax (delay stx)
      (syntax-case stx ()
        [(delay e)
         #`(letrec ([p (make-promise
                        #,(gen-update-thunk-code #'p #'e))])
             p)])))

The template bindings are the same as they were in the one-module arrangements. Since 'delay imports 'promises using just require, there are no phase adjustments. Consequently, we call such code generators same-phase code generators.

This arrangement is more awkward for writing the auxiliary functions. On the other hand, it supports a novel capability: the code generators can produce references to private members of their enclosing modules. The references must be certified manually; the automatic certification done by the macro expander doesn’t apply, since the references do not belong to the module containing the macro. See Certification for Same-Phase Code Generators for details.

← prev up next →

1	Syntax and Bindings
2	Code Generators
3	Special Subforms
4	Breaking Hygiene
5	Expressions
6	Communication
7	Errors

2.1	Code Generators
2.2	Lexical Context-Passing Code Generators
2.3	Certification for Same-Phase Code Generators