Skip to content

Problems To Solve

David Miller edited this page Aug 16, 2017 · 6 revisions

Compiler hell

The current ClojureCLR was implemented to be as close to the ClojureJVM compiler as possible. This was a good choice for the first version. However, accommodating the differences the JVM and CLR has led to some real kludges. Over time, it has become increasingly difficult to translate significant change in the ClojureJVM to ClojureCLR.

In addition, some of the problems pointed in the Clojure-in-Clojure document are really ClojureJVM compiler problems that also plague the ClojureCLR compiler. From that document, I would point out the following as desirable:

  • "Provide discrete analysis phase yielding Clojure data structures capturing semantics of code." ** Currently the compiler is essentially a two-pass compiler: build an AST, generate code. The resulting code has semantic analysis all over, in both passes. We need to restructure into more passes. We might even consider embracing the Nanopass framework and consider implementing a DSL for this as they have done.

  • "Split out type reflection into API", "needs protocols to hide actual reflection classes" ** For reasons stated below, we need much more sophisticated type analysis in the compiler.

  • "Split out code gen into API" ** Having an intermediate IL representation before going to System.Reflection.Emit would allow for better analysis and much better debugging. We can build on work by Ramsey Nasser in Mage.

  • "Remove warts from current design", "e.g. use of vars and binding". "explicitly pass environment" ** Definitely.

  • "current code modifies environment to e.g. tag things as closed-over could turn into query against child tree" ** We will move to immutable/functional.

Working with types

The CLR type system has features that are significant extensions of what the JVM has to offer. These include value types, true generic types, and unsigned integral types. In addition, type identities are assembly based rather than classloader based.

Handling types

Type names and type lookup are significantly more complicated. We had to implement a reader extension to provide |-quoted symbols to serve as type names. However, we are still using the native CLR type naming syntax which includes nastiness such as having to provide type argument counts for realized generic types and fully-qualified typenames. There are also known problems with type lookup inefficiencies.

We need a simplified type naming syntax that ties into ns form definitions, removes the need for type argument counts when generic types are instantiated, and supports import/alias of namespaces for typenames. In addition, ns should support assembly loading. We need to fully specify lookup rules for assembly loading.

Value types must be first-class

For full efficiency, value types need to be first-class. This requires improved type flow analysis and better handling of value-type typed parameters. (See below for primitive interfaces in relation to this.) The Arcadia project has the most experience with these requirements. The JVM compiler in many places hardcodes special cases and checks for the limited set of primitive types it supports. We should avoid doing this.

Generic types must be first-class

Because the JVM erases generics, they really don't exist from the ClojureJVM viewpoint. Not so in the CLR: generic types are significant. We need the ability to work with generic types with type parameters when defining types. The immutable, persistent data structures that are the meat of the Clojure datatypes need to be genericized. We should get rid of the ODL typing for primitive IFn interfaces in favor of Func<...> interfaces. Static linking should allow more specific typing. TBD

Get rid of faked Java data structure implementations

They are just embarassing, a cheap concession to actual thought and reworking of code.

Resolve some semantics issues

Consider what auto-promotion to long means when we have ulongs for which it does not work. Consider handling of all the unsigned integral types and decimals. Make Char consistent (integral or not). Make sure value types are properly handled any place primitive types are mentioned.

Reconsider DLR usage

The DLR serves several purposes. One is the use of its mechanism for matching arguments to parameters for method selection. The other purpose is dynamic callsites. For method selection, we should define the rules for argument matching more carefully, then decide whether we need all the complication of the DLR machinery to accomplish it. Dynamic callsites are very nice, but we should benchmark simpler solutions to see if we need all the complexity of the DLR to handle it.

Protocols top-to-bottom?

ClojureScript did it. Do we want to rework the guts to this extent? From the powers-that-be, this opinion was given: "I'd say the general consensus there is that protocols are a big win over the approach in the JVM impl. The JVM impl would likely be a lot cleaner (but potentially a lot slower) if we reworked to be more like ClojureScript." Can we go with protocols and preserve performance?