Friday, February 19, 2010

Groovy ANTLR Plugins for Better DSLs

DSLs, ANTLR, and Groovy in one blog post? Oh yes, this should be good: a trifecta of interesting keywords.

There is a horrible scourge upon Groovy based Domain Specific Languages, and no I'm not talking about curly braces. That awful syntax blocking our users from natural language based productivity is that insidious amalgamation of about 5 pixels called "the comma". If only we could rid ourselves and our users of this terrible burden!

Some are trying: GEP-3 is the Groovy Enhancement Proposal called "Command Expression based DSL" that will allow some commas to be optional. While interesting, there has not been much public activity on it recently. One of the requirements of dropping commas from methods invocations is that "the evaluation must be easily explainable". I think most people agree that we have more work to do before the proposal is easily explained.

So where does that leave us? Does Groovy force these spurious commas on unsuspecting programmers? Hardly. The compiler architecture of Groovy is fairly open, and with a little creativity you can make the commas optional in your DSL. As long as you have access to the CompilerConfiguration then you have options, whether it be an AST Transformation or an ANTLR Plugin. And remember, if you are using GroovyShell then you have access to it.

As an example, consider the easyb syntax for behaviors. What I would like to see is the comma dropped. Mmmm... it looks so much better without the commas:

given "some data" {
println '... setting expectations'
}
when "a method is called" {
println '... calling some method'
}
then "some condition should exist" {
println '... making an assertion'
}

At some point in the compilation process, this source code will be represented as a text stream. What we're going to do is intercept that text stream, and provide some simple rewrite rules to add the comma in where it should be. An ANTLR plugin can intercept this text, add commas into it, and then pass it only to the Groovy compiler: Groovy is none the wiser. There is some boilerplate wiring together to do, but the bulk of the work is defining the rewrite rule (also known as a production). For the simple case, we can use a regular expression to add the comma in:
String addCommas(text) {
def pattern = ~/(.*)(given|when|then) "([^"\\]*(\\.[^"\\]*)*)" \{(.*)/
def replacement = /$1$2 "$3", {$4/
(text =~ pattern).replaceAll(replacement)
}

Does a regular expression scale well to larger problems? No really. At some point you will need an alternative (maybe even at this point!) For instance, this Regex matches nested quotes, but the quotes must be double quotes, and Groovy's single quotes and multiline strings are not supported. Oh well, it is just an example.

The "wiring together boilerplate" consists of subclassing AntlrParserPlugin so that you can write the text and subclassing ParserPluginFactory so you can wire in your AntlrParserPlugin subclass. The ParserPluginFactory can then be passed directly to the CompilerConfiguration which is passed to GrovoyShell. That makes no sense to me even as I write it, so it is probably best to go look at the full source code listing in Groovy Web Console.

For those of you using browsers not supporting anchor tags, here is the code inline:


class SourceModifierParserPlugin extends AntlrParserPlugin {
Reduction parseCST(SourceUnit sourceUnit, Reader reader) throws CompilationFailedException {
def text = addCommas(reader.text)
StringReader stringReader = new StringReader(text)
super.parseCST(sourceUnit, stringReader)
}
}

def parserPluginFactory = new ParserPluginFactory() {
ParserPlugin createParserPlugin() {
new SourceModifierParserPlugin()
}
}

def conf = new CompilerConfiguration(pluginFactory: parserPluginFactory)
def binding = ...
def shell = new GroovyShell(binding, conf)

And once you have a GroovyShell you can evaluate the world! Including the pseudo-easyb script from the beginning of the post. It runs with no problems... missing commas and all.

ANTLR plugins have been around Groovy for a long time, and this example is based off of Guillaume Laforge's famous Groovy Web Console Script #3. There is nothing particularly hard about writing an ANTLR plugin, but there might be something difficult with maintaining it. DSLs come with a host of issues including versioning. If you create an external DSL then you've published a language. It's fun at first but not so much later.

And there you have it. No more of those despicable commas! Now we just need to do something about those dispicable DSLs.

4 comments:

Unknown said...

That's a pretty nice trick.

You could probably write your own ANTLR grammar (instead of regex) to create an entire external DSL independent of any Groovy syntax that merely produces actual groovy code that gets passed along to the rest of the pipeline.

Hamlet D'Arcy said...

hmmm... that is good advice. i guess it is time to go get an ANTLR book, which sort of doesn't sound fun.

Unknown said...

Hamlet, have you looked at Terence Parr's latest book, Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages? I'm only slooowly making my way through it, but it appears to include a gentler ease-in to ANTLR than his definitive reference, with practical ANTLR recipes for just this sort of use case. I strongly suspect you would find it valuable.

Hamlet D'Arcy said...

I just saw this book for the first time last week and bookmarked the site. I'll definitely have to go out and get it now.