Saturday, January 31, 2009

Upcoming Speaking Gigs and Appearances

I have four speaking appearances booked this year (so far) in the Twin Cities area.

If you're in the area then please come out and say hi. If you're not then feel free to come crash on my couch. And if your JUG is within a motorcycle ride and you need a speaker then by all means contact me. My range is about 700 miles a day and I work for less than peanuts.

Here is what's confirmed so far in 2009:

Groovy Users of Minnesota
February 10, 2009
Groovy + IDEA For-The-Win (part of a larger IDE shootout)

Twin Cities Language User Group:
March 12, 2009
Groovy Metaprogramming (with Scott Vlaminck)
Expect greatness, we're super excited!

CITCON North America
April 24-25, 2009
Open Spaces Conference on Continuous Integration and Testing
Registration still open... it's free and in Minneapolis

Twin Cities Java User Group
July 13 or August 10, 2009 (TBD)
Zen and the Art of Java Concurrency
This isn't a catchy name. I genuinely plan to cover Zen Buddhism.

I'd love to fill up my calendar more... so please, please, please contact me if you have some sort of opening.

Thanks everyone!

On Laziness and Programmers

As programmers, laziness is presented to us as a ideal form to which we must aspire. I have a t-shirt with Larry Wall's famous quote written out in Perl code, "The three chief virtues of a programmer are: Laziness, Impatience and Hubris." And this is the guy who wrote a hugely popular programming language. Linus Torvalds openly admits to being lazy too, "I'm basically a very lazy person who likes to get credit for things other people actually do." And he wrote an operating system as a hobby. These two computer titans aren't just admitting to being lazy, they're crediting laziness as part of the success. Surely, being a better programmer means learning how to be lazy in a better way.

So why lazy people ruining all my projects? Reflecting on my last 10 years, lazy programmers have contributed the most to making polluted, convoluted messes of our shared code. How many times have you started on an existing project and said, "this should just be rewritten"? Sure there's a million constraints and variables at play on a project, but you can always blame the previous developer at least partly, if not wholly.

My first language was C++. The easiest path to reuse in C++ was multiple inheritance. Need some piece of code shared between objects? Simply extract a superclass and subclass it willy-nilly as needed. This leads to the diamond inheritance anti-pattern of course, and eventually my code was a mess. Laziness wasn't my friend here; I needed foresight to see the long term ramifications of my design decisions.

Later I learned PHP. My pages started off simple, and it was easy to put PHP, HTML, and SQL all within a single file. I kept pages small by extracting and including shared modules. The millions of code samples on the Web made PHP easy, in fact I never even bought a book to learn it. Six months later the project was a tangled mess of includes and untestable mixed-source files. Why? When I started the project I'd never heard of Pear::DB or the Smarty templating engine. I was lazy, I did no research, and I lacked knowledge of any of the standard PHP frameworks at the time. Oops.

Today I mostly program Java. There are more modern languages available, but the community, tool support, and consistency of Java make it a joy to use. Need an ORM framework? Pick one of 4 that have books published on them. Need to extract an interface? Ctrl+Alt+I. Need to work on a 10 year old module? The syntax and idioms haven't changed that much. Need to share code between two objects? The easiest solution of the lazy developer is almost universally bad: make a public static method. The next easiest solution is to move it into a superclass and tweak the inheritance hierarchy or create an adapter. The generally right thing to do is difficult: use composition so the two objects share a dependency on the new one, and use dependency injection to give the objects references. Just because Java makes it hard doesn't mean it's not the right thing to do. It takes discipline not laziness to keep Java code clean.

Still think laziness is a virtue? Being a craftman has nothing to do with being lazy. Anyone can be lazy, and for the most part if leads to crap. Larry Wall and Linus Torvalds aren't talking about run-of-the-mill easiest-way-out laziness. I'd say they're talking about knowing how to put forth just a little effort so that you don't have to see the problem again... they're talking about using foresight, knowledge, and discipline:

Foresight - To anticipate what today's decisions will become tomorrow
Knowledge - To learn, use, and apply the best work of others in the community
Discipline - To use foresight and knowledge to write the best code possible

Foresight, knowledge, and discipline have nothing to do with laziness; rather, they are traits of intelligence. Laziness is not something to which we should aspire. Instead of laziness, how about we aim for intelligence? Can we agree on that? Intelligence works for me if it works for you.

"Intelligence is the ability to avoid doing work, yet getting the work done." -Linus Torvalds

Wednesday, January 28, 2009

Groovy Compile Time Meta-Magic

I have no practical application for what I'm about to demonstrate. But the idea of rewriting the AST of Groovy classes as they compile is really freakin' cool. Check it out...

I'm a big fan of using GroovyShell to evaluate Strings as code. So simple.
assert 4 == new GroovyShell().evaluate("Math.min(4, 8)")
I knew the String got compiled behind the scenes into an object of type Script. But where did that script go? It turns out that you can write the generated class to disk by setting the target directory on the CompilerConfiguration object. An easy way to do this is subclass GroovyClassLoader and override createCompilationUnit. Setting the target directory to the current directory causes the Script to be written to disk with the name "script[time-in-milliseconds].class"
class MyGroovyClassLoader extends GroovyClassLoader {

protected CompilationUnit createCompilationUnit(
CompilerConfiguration config,
CodeSource source) {

config.setTargetDirectory(new File("."))
return super.createCompilationUnit(config, source);
}
}

def loader = new MyGroovyClassLoader()
assert 4 == new GroovyShell(loader).evaluate("Math.min(4, 8)")
So there's one new metaprogramming technique open to you: write a program that generates valid Groovy source and compile it into .class files. There are better ways to skin that cat, but it's good to know your options.

The part I really like is the AST transformations. The compiler uses a visitor pattern internally, and it's easy to hook into. The visitor is strongly typed, so you'll receive a callback during the compile that tells you exactly what type of Expression you're seeing. For instance, a MethodCallExpression or a DeclarationExpression. From there, you can futz with the AST to your heart's content. Consider this visitor that turns all calls to Math.max into calls to Math.min, and vice versa:

private class MyAstVisitor extends CodeVisitorSupport {

void visitMethodCallExpression(MethodCallExpression methodCall) {

Expression receiver = methodCall.getObjectExpression()
ClassNode receiverType = receiver.getType()

if (receiverType.typeClass == java.lang.Math) {
ConstantExpression originalMethod = methodCall.getMethod()
if (originalMethod.getValue() == "min") {
methodCall.setMethod(new ConstantExpression("max"))
} else if (originalMethod.getValue() == "max") {
methodCall.setMethod(new ConstantExpression("min"))
}
}
super.visitMethodCallExpression(methodCall)
}
}
This visitor is invoked when a method call is encountered in the source. The method call has an object expression ("Math"). The method call has a method constant ("min"). And the method call has arguments, a compound expression representing "4" and "8". If you hook the visitor in early during the compile, say in the Conversion Phase, then the types of the object will be generalized as constants or just Objects. But if you hook the visitor into the Semantic Analysis phase, then the expressions will carry the full type information, and you can query for java.lang.Math as the receiver.

This example just swaps out any "min" constants for "max" for the java.lang.Math receiver. Math.min(4, 8) now returns 8! Sweet.

The big, unanswered question is, "So how do I hook into the compiler?" The answer is a little confusing, but luckily there are some examples. Last Fall I wrote the ArithmeticShell that demonstrates how to ban any code that isn't basic arithmetic. It is checked into the groovy examples. There is also the example below.

The ArithmeticShell subclasses GroovyClassLoader, subclasses PrimaryClassNodeOperation, and subclasses CodeVisitorSupport to wire in the visitor. The drawback to this approach is that it assumes that the String to be evaluated is a single Class. Nested or multiple classes won't work.

A better solution is to use the CompilerConfiguration object directly, which allows you to generate any number of classes. Given that you have the MyAstVisitor defined already, you still need to define the PrimaryClassNodeOperation. This just tells the visitor to start walking the tree:
private class MyClassNodeOperation extends PrimaryClassNodeOperation {

public void call(SourceUnit source, GeneratorContext context, ClassNode classNode) {

ModuleNode ast = source.getAST()
ast.getStatementBlock().visit(new MyAstVisitor())
}
}
Then it's just a matter of configuring and calling the CompilerConfiguration correctly. I agree that this is totally scary code, but it works as you can tell from the assert at the bottom. This snippet does assume a Script class, but the point is that ClassCollector.getLoadedClasses will give you a list of everything compiled.
def evaluate(String script) {
Properties settings = ["groovy.target.directory": new File(".")] as Properties
String filename = "MySynthesizedClass.groovy"
GroovyClassLoader classLoader = new GroovyClassLoader()
CompilerConfiguration config = new CompilerConfiguration(settings)

byte[] bytes = script.getBytes(config.getSourceEncoding());
ByteArrayInputStream inputStream = new ByteArrayInputStream(bytes)
GroovyCodeSource codeSource = new GroovyCodeSource(inputStream, filename, "/groovy/script")

CompilationUnit cu = new CompilationUnit(config, codeSource.getCodeSource(), classLoader)
// wiring into the semantic analysis phase will provide more type information
cu.addPhaseOperation(new MyClassNodeOperation(), Phases.SEMANTIC_ANALYSIS)

SourceUnit su = cu.addSource(codeSource.getName(), codeSource.getInputStream());
ClassCollector collector = new ClassCollector(new InnerLoader(classLoader), cu, su);
cu.setClassgenCallback(collector);
cu.compile(Phases.OUTPUT)
Class clazz = collector.getLoadedClasses()[0]
Script object = clazz.newInstance()
object.run()
}

assert 8 == evaluate("Math.min(4, 8)")
This next one goes out to all you bytecode lovers... here is the javap -c output for the synthesized class:
   61:    putstatic    #66; //Field class$java$lang$Math:Ljava/lang/Class;
64: goto 70
67: getstatic #66; //Field class$java$lang$Math:Ljava/lang/Class;
70: ldc #70; //String max
As you can see at line 70... the .min() call was converted into a .max() in the bytecode.

Thanks for sticking around. I hope it was worth it. The code in the complete form can be downloaded from http://svn.assembla.com/svn/SampleCode/gep/src/gep/UpsideDownShell.groovy

Saturday, January 24, 2009

A Functional Approach to Java Managed Resources

Quick, what's your standard idiom for closing Closeable resources in Java? Is it this?

OutputStream stream = new FileOutputStream(file);
try {
writeMessage(stream);
} finally {
stream.close();
}
I promise this isn't about a lack of a try/catch around the close() invocation. I'd rather clutter the prose with a message to ignore it (ignore it) than clutter the code sample. No, this post is about safely closing resources using generics, an anonymous class, and almost twice as many characters... sweet, an orgy of complexity!
withClose(
new FileOutputStream(file),
new Action<OutputStream>(){
public void call(OutputStream stream) throws IOException {
writeMessage(stream);
}
});
Just because Java makes it hard doesn't mean it isn't the right thing to do. Consider this, what stops you from forgetting the close() on a resource? A unit test? A code review? A pairing partner? Or just an amazing attention to detail? Do you think your method is more or less reliable than the javac compiler?

Let's examine what's going on in the above examples. The first one is simple. You create a stream, write some data into it, and then close it. The second does the same thing slightly differently. You create a stream, and then create a little function that writes some data into a stream. And then you pass them both to a method that will presumably call your function back, passing you the stream, and then safely closing it when it's all done.

The first one is simpler. There's no argument about that. But here's why I prefer the second one:
  • Structure over Convention - Uncle Bob Martin's latest book Clean Code recommends using structure over convention. Odd advice in a post-Rails world, but there is value in having the design decision of closing resources in a single place, the withClose method. The point is that there is no way to incorrectly invoke withClose(). You either give it objects of the correct type of the compiler complains.

  • Lean on the Compiler - Michael Feathers coined the term "Lean on the Compiler" in his book, Working Effectively with Legacy Code. I say, if we can get the compiler to enforce something for us then we should. Why use a statically typed language if that isn't a principle? Unfortunately for me, Michael Feathers meant something totally different in this book. He recommended finding usages of a variable by renaming the definitions and then analyzing the resulting compiler errors. Programmers seem to have no problem with co-opting terms and overloading them with meaning. I'm sticking with my new, invented meaning; let the confusion begin now.

  • Dependency Injection - Another Bob Martin book, Agile Software Development, strongly advocates Dependency Injection and the Hollywood Principle. The idea is the movie producer refrain, "Don't call us, we'll call you." Interpreted here to mean, don't call our object to interact with it, give us a piece of code to execute and we'll give you what you need. Like dropping off your headshot at the casting agency, we're dropping off our Action function object at the withClose() service. You could further make this idiom more encapsulated by making the Action function part of the Closeable package and then making the methods on Closeable package-private, thus forcing usage of this idiom.

  • Monady Goodness? - Man, those functional programming guys get all the cool words: catamorphisms, continuations, and monads. While there is certainly more to monads than just inversion of control, the basic usage is, as Brian Hurt says, "this code needs to execute in a time and place where condition X is true." In this case, the time when an OutputStream is open and usable.

The problem with this approach is the verboseness. It's not just creating the anonymous class, it's the fact that Java requires you to define a named function object with explicitly named types (including Exception types!). In this case, Action:
interface Action <T> {
void call(T input) throws IOException;
}
The Functional Java library helps here by providing a unary function called F1, but seriously, now many F1, F2, F3 types do you want in your codebase. I'd have more luck implementing these myself than convincing the gatekeepers to include Functional Java in the release. In fact, I did and no one noticed. Another problem is the generics of a withClose() function can appear daunting. 3 Ts and an extends? Sounds like my shopping list at Target for my daughter.
static <T extends Closeable> void withClose(T stream, Action<T> delegate) throws IOException {
try {
delegate.call(stream);
} finally {
stream.close();
}
}
I've memorized Josh Bloch's PECS (Producer Extends Consumer Super) as a mnemonic for remembering how to write wildcards and I still never write it correctly the first time. Arguably ever. Oh well, write it once and be done with it.

The last issue is combinatorics. How do you combine a withClose() action and a withOpen() action and a withFoo() action... Well, as static method invocations it would be darn hard. Java uses objects as the unit of composition, so you need to reference the withX() monad style method as an object not a global procedure. More objects means more verboseness. And combining them elegantly requires some sort of logical combination library, in the style of bin4j. How far down the functional programming road do you want to take your Java code? A functional language would "fix" all these problems:

  • Function objects do not need to be named or declared

  • The syntax for creating function objects is trivial

  • The generic types can be inferred rather than declared

  • And functions are the natural unit of composition, and object baggage can be discarded

Perhaps it's time to try a functional language?

Monday, January 19, 2009

Test Driven Synchronization Policies with assertSynchronized


As you're surely aware, JConch 1.1 was released yesterday... it includes a tool I've used over the last two years at work to test drive synchronization policies in Java code: assertSynchronized. Even if you're not practicing TDD, assertSynchronized has proven to be a great aide in replicating concurrency defects and verifying that fixes work correctly. And, it's great to have test coverage on synchronization policies during refactoring time. I, and several team members, have genuinely found this assert useful, and this post walks you through how it works.


Here's the idea behind assertSynchronized:

  1. Wrap code snippets that should be thread safe in Callable objects

  2. Create another Callable that returns all your code snippets as a Collection

  3. then assertSynchronized runs your code snippets a whole bunch of times across several threads and makes sure there are no exceptions


This is not deterministic but is effective. My main concern is quickly replicating concurrency defects, which this assertion enables.


For clarity, let's name and describe a few things before seeing a code sample. Each (hopefully) synchronized code snippet is a "task". The Callable that produces the task list is a "task factory". assertSynchronized makes 1000 passes at the task factory. In each pass it gets the task list and queues up as many tasks as it can on multiple threads. When the queue is full, it releases the tasks and hopes for concurrency related exceptions. assertSynchronized moves on to the next iteration when all the tasks from the task factory are complete. Any errors are accumulated and reported at the end with an AssertionError taking the form of "java.lang.AssertionError: An exception was raised running the synchronization test. The test failed x out of y times. Last known error:" followed by a stack trace. There are zero dependencies on a testing framework, so it works in JUnit or TestNG.


So in practice, what does this look like? Answer: an orgy of generics and anonymous class cruft. Consider the case of ArrayList, whose add(T) methods are not synchronized. Calling ArrayList#add across multiple threads eventually breaks on my machine. The Groovy version later is much terser syntactically, but the Java version,although workable, might be a little shocking:


Assert.assertSynchronized(
new Callable<List<Callable<Void>>>() {

public List<Callable<Void>> call() throws Exception {
final List<Object> unsafeObject = new ArrayList<Object>();
final Callable<Void> unsafeInvocation = new Callable<Void>() {

public Void call() throws Exception {
for (int x = 0; x < 1000; x++) {
unsafeObject.add(new Object());
}
return null;
}
};
return Arrays.asList(
unsafeInvocation,
unsafeInvocation
);
}
}
);

Starting from the bottom up... this attempts to add 2000 objects to an ArrayList on two different threads: 1000 on each thread. And assertSynchronized will do this all 1000 times by default. This fails roughly 40 out of 1000 times on my machine. The key to understanding assertSynchronized is to know that the task factory will be invoked 1000 times, so shared state needs to be instantiated within the task factory and not within the individual task.


A good IDE makes all these anonymous classes easy to write, but it's hard to deny that the Groovy version from the JConch examples is more elegant:


Assert.assertSynchronized(
{
def target = new Vector() //fail if ArrayList
return [
{ (0..1000).each { target.add(new Object()) } } as Callable,
{ (0..1000).each { target.add(new Object()) } } as Callable,
]
} as Callable
)

So the Twin Cities OTUG group started a concurrency SIG. I added assertSynchronized to JConch as preparation for this, hopefully it helps you out. Consider joining in the discussion here. I'd love to flesh out the JConch concurrent unit testing support more, so email any suggestions to me or leave a comment. I'd love to see what other people are doing for concurrent testing!

Sunday, January 18, 2009

Groovy and OSGi Resources

There was a lot of interest in the Begineer's Guide to OSGi on the Desktop post a few weeks ago. Here are a couple follow up items worth checking out...

The Groovy wiki is now updated with a lengthy article on using Groovy in an OSGi bundle. The Groovy+OSGi wikipage shows a few techniques, including:

  1. Loading Groovy as an OSGi service
  2. Writing a Groovy OSGi Service
  3. and Publishing a Service Written in Groovy
All of the sample code is now in Groovy 1.7, available for browsing online or downloadable from subversion:
svn co https://svn.codehaus.org/groovy/trunk/groovy/groovy-core/src/examples/osgi
Enjoy!

Wednesday, January 14, 2009

Expert F# by Don Syme Review

(Sorry if this sounds like a commercial, but I liked the book)...

Can a 650+ page book from 2007 on an evolving language be any good? There are only two or three other F# books available, so saying it's the best reference isn't a big commitment. But two years after its release, Expert F# continues to be a great reference and overview of the language. It has a number of good things going for it.

  1. The book's not a 650 page reference manual. Expert F# is really two books in one. The first 250 pages deals solely with the language, functional and OO programming techniques, and available libraries. Seemingly in its entirety. I find F# an incredibly clean, simple, and easy language to program in (at least from my reference points). The fact that it can be fully described in 250 pages speaks to its straight-forward nature and lack of weird edge cases. Compare that to your JVM/CLR language du jour.

  2. The Appendix is a 7 page language guide to the F# syntax and language features. I printed this out from the eBook copy early on and it never left my side. This was a great aide in learning the language and all programming books should steal the idea.

  3. The functional programming chapter clearly explains the idioms of FP and does not use OO as a point of reference. I don't want an OO/FP hybrid and this is a good introduction to recursion, immutable data types, pattern matching, et al, yet it strictly avoids the common "here is how to use a functional syntax with the standard Java/C# mutable HashMap". In fact, I suggest you skip the OO chapter. Let's learn FP first, and then decide when it's required to resort to OO.

  4. The eBook has a searchable index. The book is 2 years old and it's still easiest for me to find answers in the pdf rather than searching Google and HubFS.

  5. The downloadable code samples are a valuable resource for exploring the concepts beyond code snippets. I have fond memories of using the PalmOS code samples to learn the platform, and these are of the same caliber. But seriously, isn't there a better build system for .NET than build.bat files? Ugh.

  6. An entire chapter on fslex and fsyacc? Actually, there are two chapters. Awesome, especially when coupled with the code samples. Langauge oriented programming gets a very good treatment here.

  7. The asynchronous workflow and concurrency chapter is worth reading just to see how elegant they made thread manipulation and wait/join actions. Yes there is a difference between let and let!, and it's cool. And who doesn't want yet another monad tutorial? Not me.
And the bad...

This is really two books in one: a language guide and a set of case studies on advanced topics. I'd have bought this book long ago if it were just the first 250 pages. Can anyone read a book that's too heavy to carry out of the house? My advice: buy the eBook and print out the chapters you want. Save a tree. Appreciate F# even if you don't plan on shipping code with it.