Wednesday, January 28, 2009

Groovy Compile Time Meta-Magic

I have no practical application for what I'm about to demonstrate. But the idea of rewriting the AST of Groovy classes as they compile is really freakin' cool. Check it out...

I'm a big fan of using GroovyShell to evaluate Strings as code. So simple.
assert 4 == new GroovyShell().evaluate("Math.min(4, 8)")
I knew the String got compiled behind the scenes into an object of type Script. But where did that script go? It turns out that you can write the generated class to disk by setting the target directory on the CompilerConfiguration object. An easy way to do this is subclass GroovyClassLoader and override createCompilationUnit. Setting the target directory to the current directory causes the Script to be written to disk with the name "script[time-in-milliseconds].class"
class MyGroovyClassLoader extends GroovyClassLoader {

protected CompilationUnit createCompilationUnit(
CompilerConfiguration config,
CodeSource source) {

config.setTargetDirectory(new File("."))
return super.createCompilationUnit(config, source);
}
}

def loader = new MyGroovyClassLoader()
assert 4 == new GroovyShell(loader).evaluate("Math.min(4, 8)")
So there's one new metaprogramming technique open to you: write a program that generates valid Groovy source and compile it into .class files. There are better ways to skin that cat, but it's good to know your options.

The part I really like is the AST transformations. The compiler uses a visitor pattern internally, and it's easy to hook into. The visitor is strongly typed, so you'll receive a callback during the compile that tells you exactly what type of Expression you're seeing. For instance, a MethodCallExpression or a DeclarationExpression. From there, you can futz with the AST to your heart's content. Consider this visitor that turns all calls to Math.max into calls to Math.min, and vice versa:

private class MyAstVisitor extends CodeVisitorSupport {

void visitMethodCallExpression(MethodCallExpression methodCall) {

Expression receiver = methodCall.getObjectExpression()
ClassNode receiverType = receiver.getType()

if (receiverType.typeClass == java.lang.Math) {
ConstantExpression originalMethod = methodCall.getMethod()
if (originalMethod.getValue() == "min") {
methodCall.setMethod(new ConstantExpression("max"))
} else if (originalMethod.getValue() == "max") {
methodCall.setMethod(new ConstantExpression("min"))
}
}
super.visitMethodCallExpression(methodCall)
}
}
This visitor is invoked when a method call is encountered in the source. The method call has an object expression ("Math"). The method call has a method constant ("min"). And the method call has arguments, a compound expression representing "4" and "8". If you hook the visitor in early during the compile, say in the Conversion Phase, then the types of the object will be generalized as constants or just Objects. But if you hook the visitor into the Semantic Analysis phase, then the expressions will carry the full type information, and you can query for java.lang.Math as the receiver.

This example just swaps out any "min" constants for "max" for the java.lang.Math receiver. Math.min(4, 8) now returns 8! Sweet.

The big, unanswered question is, "So how do I hook into the compiler?" The answer is a little confusing, but luckily there are some examples. Last Fall I wrote the ArithmeticShell that demonstrates how to ban any code that isn't basic arithmetic. It is checked into the groovy examples. There is also the example below.

The ArithmeticShell subclasses GroovyClassLoader, subclasses PrimaryClassNodeOperation, and subclasses CodeVisitorSupport to wire in the visitor. The drawback to this approach is that it assumes that the String to be evaluated is a single Class. Nested or multiple classes won't work.

A better solution is to use the CompilerConfiguration object directly, which allows you to generate any number of classes. Given that you have the MyAstVisitor defined already, you still need to define the PrimaryClassNodeOperation. This just tells the visitor to start walking the tree:
private class MyClassNodeOperation extends PrimaryClassNodeOperation {

public void call(SourceUnit source, GeneratorContext context, ClassNode classNode) {

ModuleNode ast = source.getAST()
ast.getStatementBlock().visit(new MyAstVisitor())
}
}
Then it's just a matter of configuring and calling the CompilerConfiguration correctly. I agree that this is totally scary code, but it works as you can tell from the assert at the bottom. This snippet does assume a Script class, but the point is that ClassCollector.getLoadedClasses will give you a list of everything compiled.
def evaluate(String script) {
Properties settings = ["groovy.target.directory": new File(".")] as Properties
String filename = "MySynthesizedClass.groovy"
GroovyClassLoader classLoader = new GroovyClassLoader()
CompilerConfiguration config = new CompilerConfiguration(settings)

byte[] bytes = script.getBytes(config.getSourceEncoding());
ByteArrayInputStream inputStream = new ByteArrayInputStream(bytes)
GroovyCodeSource codeSource = new GroovyCodeSource(inputStream, filename, "/groovy/script")

CompilationUnit cu = new CompilationUnit(config, codeSource.getCodeSource(), classLoader)
// wiring into the semantic analysis phase will provide more type information
cu.addPhaseOperation(new MyClassNodeOperation(), Phases.SEMANTIC_ANALYSIS)

SourceUnit su = cu.addSource(codeSource.getName(), codeSource.getInputStream());
ClassCollector collector = new ClassCollector(new InnerLoader(classLoader), cu, su);
cu.setClassgenCallback(collector);
cu.compile(Phases.OUTPUT)
Class clazz = collector.getLoadedClasses()[0]
Script object = clazz.newInstance()
object.run()
}

assert 8 == evaluate("Math.min(4, 8)")
This next one goes out to all you bytecode lovers... here is the javap -c output for the synthesized class:
   61:    putstatic    #66; //Field class$java$lang$Math:Ljava/lang/Class;
64: goto 70
67: getstatic #66; //Field class$java$lang$Math:Ljava/lang/Class;
70: ldc #70; //String max
As you can see at line 70... the .min() call was converted into a .max() in the bytecode.

Thanks for sticking around. I hope it was worth it. The code in the complete form can be downloaded from http://svn.assembla.com/svn/SampleCode/gep/src/gep/UpsideDownShell.groovy

3 comments:

Peter Niederwieser said...

AST transformation in Groovy 1.6 offer a clean way to hook into the compiler. To get started, check out org.codehaus.groovy.transform.ImmutableASTTransformation.

Hamlet D'Arcy said...

I forgot to mention, Scott Vlamink and I are presenting at the Twin Cities Language Users Group on Metaprogramming in March. http://is.gd/hCq3
And Groovy Users of MN created the 6 min video on Metaprogramming: http://is.gd/hFDn

Alex Miller said...

Pretty cool. I could be mistaken but I believe LINQ is implemented in a morally equivalent way -> it gets access to the AST of the code and replaces it with different code at compile time.