lispy in java

Recently I re-read a blog entry from Peter Norvig about a really small and nice Lisp/Scheme interpreter written in Python, what about porting that code to Java 8 and compare.

There are several implementations of Scheme in Java, i believe that the most complete and fast is Kawa of Per Bothner and because Kawa uses a nice type inference algorithm, it's not that rare to see that Kawa is faster that JRuby or Groovy on some benchmarks.

Anyway, here is aim is to not try to implement the whole Scheme specification but just to write a small Scheme interpreter in one file. For that, i will just port the Python code of Peter Norvig. Because i'm lenient, i have tried to be as close as possible from the original code to avoid to explain how the code works given that Peter already did a good job of describing how his implementation works. So if you have still not read his blog entry, please read it !

Porting a Scheme interpreter from Python to Java

First, as Peter note, implementing a dynamically typed language (Scheme) on top of a dynamically typed language (Python) is easier than on top of a statically typed language like Java. By example, Python operator '+' already works with integer values and real values while in Java its two different operations, Python function call can call any function with any parameter types while in Java, you have to use reflection (or java.lang.invoke) which also means messing with checked exceptions.

Mapping Scheme type to Java one

In Java, i will use Object to represent any type, so this rule out the use of primitive types.
- floating point number will be represented by java.lang.Double
- integer number will be represented by java.math.BigInteger (in Python or Scheme, integer types can not overflow)
- symbol will be represented by java.lang.String
- list will be represented by java.util.List using only random access lists (no LinkedList !)
- function will be represented by java.lang.invoke.MethodHandle

MethodHandle is perhaps the lesser known of the types listed here, it represents a kind function pointer typed at runtime, compared to a java.lang.reflect.Method, a method handle is more versatile, you can apply currying (bindTo), do arguments shuffling, arguments collecting, arguments spreading, etc.

The environment

The environment contains the symbols that are accessible during the evaluation. The Python code uses a dict, the Java code uses a HashMap and the code is just a copy paste.

static class Env {
  final HashMap<String, Object> dict = new HashMap<>();
  private final Env outer;

  Env(Env outer) { this.outer = outer; }
  Env find(String var) {
    return dict.containsKey(var)? this: outer.find(var);
  }
  Env add(String var, Object value) { dict.put(var, value); return this; }
}

Note that there is a bug in the code of find(), if the Scheme code refers to an unknown symbol (a symbol not previously defined), find() will be called recursively until outer is null and the VM will thow a null pointer exception. This bug is present in the Python code, i've not tried to fix it.

Creating the global environment

The global environment contains the 20 basic operations that are available by default, while most of these operations are available as default function in Python, very few of them are available in Java, that's why the code define helper methods that will be used to implement operation like + on big integers and doubles.

static double dbl(Object o) { return ((Number)o).doubleValue(); }
static BigInteger bigint(Object o) { return ((BigInteger)o); }
static boolean isdbl(Object o ) { return o instanceof Double; }

Moreover, Python has a literal syntax to initialize a dictionary, Java has no syntax for initializing a Map, i use the builder pattern as a poor's man replacement in order to chain calls.

public interface Fun2 { Object apply(Object a, Object b); }

static Env globalEnv() {
  return new Env(null)
    .add("+", mh(Fun2.class,
      (a, b) -> (isdbl(a) || isdbl(b))?
        dbl(a) + dbl(b): bigint(a).add(bigint(b))))
    .add("-", mh(Fun2.class,
      (a, b) -> (isdbl(a) || isdbl(b))?
        dbl(a) - dbl(b): bigint(a).subtract(bigint(b))))
    .add("<", mh(Fun2.class, (a, b) -> compare(a,b) < 0))
    ...
    .add("=", mhRef(Object.class, "equals"))
    ...
}

As I said, each operation is represented by a method handle, so I need a way to create a method handle either from an existing Java method (like Object.equals) or from a lambda (to avoid to declare a method with a name i don't care). There is unfortunately no way to create a method handle from a lambda or a method reference, even if as you may know that the compiler store lambdas and method references as method handles inside the bytecode. I've proposed that feature to the lambda Expert Group but the EG have found that this use case too specific. So in order to convert a lambda to a method handle, instead of asking the compiler to provide it, you have to let the Java runtime to create a lambda as an instance of a functional interface (i.e. as something typed) and then create a method handle on the single method of that instance.
In the code, mhRef() finds the first method in the class that have the right name (this mean that this code doesn't support overloaded methods) and loads it as a method handle. mh() uses mhRef() to find the method "apply", (interfaces Fun, Fun2 and FunAll all define "apply" as their single abstract method) and curry the instance of the lambda as first parameter of the method handle.

public interface Fun { Object apply(Object a); }
public interface Fun2 { Object apply(Object a, Object b); }
public interface FunAll { Object apply(Object[] args); }

static MethodHandle mhRef(Class<?> type, String name) {
  try {
    return MethodHandles.publicLookup().unreflect(
      stream(type.getMethods()).filter(m -> m.getName().equals(name))
                               .findFirst()
                               .get());
  } catch (IllegalAccessException e) {
    throw new Error(e);
  }
}

static <F> MethodHandle mh(Class<F> type, F fun) {
  return mhRef(type, "apply").bindTo(fun);
}

Eval

The function eval() is not very different from the Python one, Java has a switch statement (on String since Java 7) which make the code cleaner but Java has no destructured assignment making the code less clean.

static Object eval(Object x, Env env) {
  if (x instanceof String) {             // variable reference
    return env.find(string(x)).dict.get(x);   
  }
  if (!(x instanceof List)) {            // constant
    return x;
  }
  List<?> l = (List<?>)x;
  String var;
  Object exp, cmd = l.get(0);
  if (cmd instanceof String) {
    switch(string(l.get(0))) {
    case "quote":                        // (quote exp)
      return l.get(1);                 
    case "if":                           // (if test conseq alt)
      return eval(((Boolean)eval(l.get(1),env))?l.get(2):l.get(3),env);            
    case "set!":                         // (set! var exp)
      var = string(l.get(1));
      env.find(var).add(var, eval(l.get(2), env));
      return null;
    case "define":                       // (define var exp)
      var = string(l.get(1));
      env.add(var, eval(l.get(2), env));
      return null;
    case "lambda":                       // (lambda (vars) exp)
      List<?> vars = list(l.get(1));
      exp = l.get(2);
      return mh(FunAll.class, 
          args -> eval(exp, new Env(env).addAll(vars, asList(args))))
        .asCollector(Object[].class, vars.size());
    case "begin":                        // (begin exp*)
      return l.stream()
              .skip(1)
              .reduce(null, (val, e) -> eval(e, env), (__1, __2) -> null);
    default:
    }
  }
  List<?> exprs = l.stream()
                   .map(expr -> eval(expr, env))
                   .collect(toList());
  MethodHandle proc = (MethodHandle)exprs.get(0);
  try {
    return proc.invokeWithArguments(exprs.subList(1, exprs.size()));
  } catch (Throwable e) {
    if (e instanceof Error) throw (Error)e;
    throw new Error(e);
  }
}  

for 'lambda', Java has no varargs at runtime (no equivalent to *args), the syntax ... is resolved at compile time not runtime, so in order to collect all the arguments inside an array, i need to call asCollector() on the method handle.

for 'begin', the code does basically a fold, the last argument of reduce is used only if the stream is parallel, so the lambda (__1, __2) -> null should never be called.

And if the first argument is a method handle, invokeWithArguments() invokes the function pointer with the other arguments as arguments. invokeWithArguments() throws a Throwable because it can call virtually any methods, it can call a method that throws checked exception, so a little dance is required to transform the checked exception to an unchecked one.

Parsing

This code is mostly a straightforward translation of the Python code with two differences. String.split(" ") generates some empty strings that need to be filtered out in tokenize() and readFrom() takes a Queue as parameter and not a List because while in Python list.pop() is O(1), ArrayList.remove(0) is O(n), so the Java code use an ArrayDeque instead of an ArrayList.

static Object parse(String s) {
  return readFrom(tokenize(s));
}

static Queue<String> tokenize(String text) {
  return stream(text.replace("(","( ")
                    .replace(")"," )")
                    .split(" "))
           .filter(s -> !s.isEmpty())
           .collect(toCollection(ArrayDeque::new));
}

static Object readFrom(Queue<String> tokens) {
  if (tokens.isEmpty())
    throw new Error("unexpected EOF while reading");
  String token = tokens.poll();
  if ("(".equals(token)) {
    ArrayList<Object> l = new ArrayList<>();
    while (!tokens.peek().equals(")")) {
      l.add(readFrom(tokens));
    }
    tokens.poll();   // pop of ")"
    return l;
  }
  if (")".equals(token)) {
    return new Error("unexpected ')'");
  }
  return atom(token);
}

static Object atom(String token) {
  try {
    return new BigInteger(token);
  } catch(NumberFormatException __) {
    try {
      return Double.parseDouble(token);
    } catch(NumberFormatException ___) {
      return token;
    }
  }
}

REPL

The code of toString() is equivalent to the Python one, the only small difference is that joining() in Java takes 3 delimiters while join() in Python takes only one. For the REPL, i use the system console because it works like raw_input()/input(). Note that if you don't want to get a null pointer exception, the program has to be started from a terminal.

static String toString(Object val) {
  return (val instanceof List)?
    list(val).stream()
             .map(Lispy::toString)
             .collect(joining(" ", "(", ")")):
    String.valueOf(val);
}

static void repl() {
  Console console = System.console();
  Env env = globalEnv();
  for(;;) {
    Object val = eval(parse(console.readLine("lispy> ")), env);
    if (val != null) {
      System.out.println(toString(val));
    }
  }
}

Conclusion

Compared to the Python code, the Java code is twice bigger (20 lines of imports included) which is not that bad given that Java is less closer than Python from Scheme.

$ grep -v "^[ ]*$" Lispy.java | wc
    164     703    6971

I think it also demonstrate that Java 8 lambda & stream and Java 7 method handle allow to write a concise yet flexible code at least for an interpreter.
The whole code is available as a gist.

cheers,
RĂ©mi