Monday, January 7, 2008

'this' is not the synchronization you're looking for

I'm no expert on multi-threaded applications. But I am quite good at juggling and making balloon animals. So if this post about Java synchronization strikes you as amateur, misguided, or wrong, then you have every right to call me a clown in the comments. That's what they're there for.

I suspect I'm not unique among Java developers in how I was introduced to concurrency issues. It started with a bug report, this time from some terribly misguided attempt at using threads to increase performance. Ugh. I worked out the easiest solution, which was to add method synchronization. It worked, that's good, so what's the issue? Liveness is the issue. Liveness and information hiding.

Consider this simple example of a class with a synchronized method:

class Counter {
private int count = 0

public synchronized void countUp() {
(1..10000).each {
count++
}
}
}
In Groovy the statement (1..10000).each {...} will perform something 10,000 times, so this method atomically increases the counter to 10000. (Also, the Java semi-colons aren't needed). So what's this about liveness? Consider the naive usage by two different threads:
def counter = new Counter()

//start a 2nd thread that synchronizes on counter
new Thread(
{
synchronized(counter) {
//pretend to do some work
Thread.sleep(5000);
}
}).start()

//try to count up on this thread
counter.countUp()
The problem is that the synchronized method from the first example and the synchronized statement in the second example are using the same lock (or "monitor"). If the worker thread that sleeps for 5 seconds acquires the lock first then it will starve the thread trying to call count Up( ). Sometimes the code here completes in 1 second, and sometimes it takes 6. Probably not the type of error you wanted to expose when you originally added the synchronized keyword. So how does this work?

A synchronized method uses the lock associated with the "this" reference. So any synchronized method:
synchronized void foo() {
//do something
};
Can just as equally be written like this:
void foo() {
synchronized (this) {
//do something
}
};
Using this approach it becomes pretty obvious that anyone with a reference to the object could create a synchronized block on that object, and thus interfere with the liveness of the method call. Any time you use method synchronization you are exposing your synchronization policy publicly. Is how your object is synchronized really something you want publicly visible? I'd say not. This is surely an implementation detail. And just like you don't expose object state by not using public fields, I suggest you hide your synchronization policy by not using method synchronization. This is simple information hiding, but I overlooked it for years.

So what do I do now? I synchronize on an internal field like so:
class Counter {
private int count = 0
private Object lock = new Object()

public void countUpSafely() {
synchronized (lock) {
(1..10000).each {
count++
}
}
}
}
The additional benefit is that your synchronization token can now carry with it an intent revealing name. Trust me, the maintenance programmers in the future will thank you. And if you ever want to find the usage of the object's lock, then the private variable is there for the searching. And that means you're free to change the implementation knowing that users of the code won't be affected.

I'm not sure who said it, but a reviewer of Brian Goetz's Java Concurrency in Practice said that, after reading the book, he was convinced that the only person in the world that really did understand concurrency was Goetz himself. It's an amazing book, worth several reads, and I echo the reviewer's thoughts entirely.

Now it's your turn to sound off. I'll give you a start. "Yo clown, stick to your circus skills... [insert concurrency edge case here]." And for this wanting to tinker with the code, the entire Groovy script is available here. Thanks!

3 comments:

Robert Fischer said...

Well, I still don't quite understand why double-checked locking is broke in Java, but aside from that, I grok concurrency pretty well.

I do agree that slapping the "synchronized" keyword into the method name is kinda exposing implementation, but it's better than "synchronized(this)" -- the reason is because there's a piece of state (specifically, a monitor and its locks) that you're obscuring from the user. And that's just begging for a concurrency bug.

Your monitors should either be purely internal (meaning no reference to a synchronization lock escapes the object), or clearly externalized in the API (either through a lock API or having methods be "synchronized").

In general, you should synchronize as tightly as possible in order to allow liveness, and if you can pawn the synchronization off into an member, all the better -- in this case, you'd be better off using something like AtomicInteger.

This guideline goes in the face of performance people, who will tell you that acquiring and releasing locks is an expensive proposition, and that you should have as coarse-grained locks as possible. As you discovered, though, this tends to result in crappy concurrency and (often) buggy code. So the tighter locks are better, even if they're less performant -- if you don't like that, use a language whose concurrency model doesn't suck.

The other option (and this is actually better) is to simply not share non-thread-safe objects between multiple threads. Better yet, only share read-only objects (incl. interfaces which are read-only) between threads: they don't need any locking at all, so that'll give you maximum liveness.

Hamlet D'Arcy said...

The AtomicInteger and AtomicBoolean classes in Java 5 are a nice addition, and worth a look. In this particular example AtomicInteger would actually change the behavior to not run the 10,000 increments atomically. In real life (tm) I find I commonly turn to AtomicBoolean as a way to track whether objects have been released. I've been hosting COM objects from Java code lately, so this is a frequent need for me.

Goetz's book recommends as fine grained locking as possible. And the presentation here says that acquiring locks in Java 5 is performant: www.scribd.com/doc/248581/Java-Tech-Performance-Myths

Robert Fischer said...

Well, to be fair, that links says that synchronization isn't terribly unslow. Their example -- the uncontended case -- is the boring one. That's like saying that GC is fast when there are no objects to clean: who cares?

Synchronization is always well worth it, though, since correctness trumps performance every time: I can always write super fast code that doesn't work. That's not the hard part.