Skip to content

V2 Compiler Passes

David Miller edited this page Jul 14, 2016 · 2 revisions
  • Pass 0: Lexical analysis
  • Pass 1: Local variable identification and macroexpansion
  • Pass 2: TBD

Pass 0: Lexical analysis

Done by the Lisp reader. 'Nuff said.

Pass1: Local variable identification and macroexpansion

Hard to go further without knowing the code to be compiled. This involves macroexpansion to all levels. Examining a form (op arg1 arg2 ... ), there are several possibilities:

  1. op is a special form
  2. op signifies a macro
  3. op is of the form .name
  4. op is a symbol of the form ns/name where ns is an existing Namespace or alias for one in the CurrentNamespace and ns names a type
  5. op is of the form name.
  6. otherwise

Result of macroexpansion is, respectively,

  1. the original form itself (no macroexpansion)
  2. Result of calling Var identified by op on the entire form, the local variable environment, the next of the form.
  3. (. name arg1 arg2 ... ) (host expression, either static or virtual method call, depending on whether arg1 names a type
  4. (. ns name arg1 arg2 ...) (static method)
  5. (new name arg1 args 2 ... ) (new expression)
  6. the original form itself (no macroexpansion)

Determining where op is a macro:

  1. op is a Symbol and not a local variable
  2. op is a Var or a Symbol naming a Var, the Var is marked as a macro and is not marked as private

By inspection, macroexpansion and the identification of local variable scopes are intertwined. This pass must walk the form being compiled, keeping track of the local variable environment and macroexpanding along the way.

Output TBD: could be a simple structure with local variable introduction nodes and expression nodes and leave it at that. Or, one could go to gross level of analysis as done by Compiler.analyze and bottom out with SymbolExpr, KeywordExpr, etc.

Pass 2: Type inference & interop call resolution

With the code expanded, types can be chased throughout the tree. User type tags, type info on Var'd IFns, and through flow interop calls. Likely this will include all identification of known flow of value type values. We should add a boxing node type to the AST to mark explicitly where value types get boxed.

Pass 3: invocation resolution

For remaining (non-interop) nodes (fn arg1 arg2 ...) identify invocation type: regular, static, prim, ... . THis might need to be combined with Pass 2 above.

Other passes in no particular order

  • Identification of constants to compile in (symbols, keywords, maps/lists/sets, etc.)
  • Adornment of sequence points and other IL debug information

Pass: Optimizations

Could come in two flavors: optimizations on the AST nodes, or optimizations on the (abstract, pseudo) IL. I won't know what is possible or needed here until we see where the above gets us.

Pass: IL Generation

A question to be resolved is if there is an intermediate IL (a la Swift IL, e.g.) that sits between the AST representation and the final IL. We definitely want an explicit IL representation tied to MSIL that allows inspection and manipulation prior to going to ILGen.

Final pass: take a nap