Thursday, July 16, 2009

Big Flat Test Cases Suck (and what to do about it)

It seems like all 900 pages of xUnit Patterns was just published a little while ago... my how things have changed since May of 2007. Back then, when it came to organizing test methods, we only had three options. Beginners always created a test case per class; there was a one to one mapping between production classes and test classes, and I'd guess most people are still doing this. The enlightened testers were creating one test case per fixture, in which test methods were all in the same test class if they shared the same setUp() method. This was a response to the problem with test case per class, which was complexity. As the number of test methods increased the cohesion of the test case decreased. As more and more fields are added to the test to support different scenarios of testing, then each test method starts to use only a fraction of the fields available within the test case. Test case per fixture solves the problem by breaking out test methods into a new test case when new fixture and setUp() data is added. This was an improvement and was the recommended approach in Astel's Test Driven Development book. The really cool kids, however, were doing test case per feature. This entails grouping test methods together based on what feature or user story they implement. Kinda sounds like an easyb story, huh? Yeah, but it hadn't really caught on yet back then.

But don't be fooled! All three of these test organization methods are essentially the same: tests are going to be written as methods and enclosed in a class. How 2007... how xUnit-ish. See, all three approaches share the same deficiency.

Test cases grow. They grow really long. And they all grow in one direction: down the screen. Navigation isn't too bad an issue... any IDE test runner is going to navigate you right to the correct line number when a test fails. And you'll have a naming convention for tests, so keyboarding around the file isn't too tough. It's not simple though, and some speakers are suggesting it's better to use hugely long test names and add underscores to test methods to tell them apart easier. So this:

public void testInitialPositionSetCorrectlyUponInstantiation()
Is better as:
public void test_initial_position_set_correctly_upon_instantiation()
Sure, I guess that's nicer. But exactly what problem is this solving? For that matter, what is the problem with big, long test cases?

Pretend we do have tests grouped by feature and that we have to make a change to that feature. How many tests are going to fail? Is there any way to tell beyond scrolling through the file and reading all the tests? It would be much easier if you could clearly see how the tests all relate to one another. But a flat list doesn't show relationships. Maybe there is a naming convention for test methods that might help, but you're still reading all the method signatures and making a decision based on name.

A hierarchical test case would let you do this, though. Test dependencies would be self-evident if you could define a test scenario and then make test methods or other scenarios children to as many levels deep as you want. You could still have the ease of grouping test methods by class, because you'd get a nice one-to-one mapping between test classes and production classes. And you'd also get test case per feature because the tests for each feature could just be child scenarios under the main scenario. And you'd also get test case per fixture because each scenario could have setUp() data that is shared with all children scenarios. Can you say best of all possible worlds? An easyb example it is then!
scenario "testing a simple user service", {

scenario "testing service with 0 users", {

given "a zero user service", {
service = new UserService()
}

scenario "cannot remove a user", {
then "removing users triggers exception", {
ensureThrows(IllegalStateException) {
service.removeUser(0)
}
}
}

scenario "can add a user", {
given "a user", {
user = ["August", "Schells"]
}

scenario "adding a user once", {
when "he is added", {
service.addUser(*user)
}
then "user count is 1", {
service.userCount.shouldBe 1
}
}

scenario "adding the same user twice", {
when "he is added again", {
service.addUser(*user)
}
then "count is 2", {
service.userCount.shouldBe 2
}
}
}
}

scenario "testing a service with 10 users", {

given "a 10 user service", {
service = new UserService()
(1..10).each {
service.addUser("fname$it", "lname$it")
}
}

scenario "can remove a user", {
when "user is removed", {
service.removeUser(0)
}
then "count is decremented", {
service.userCount.shouldBe 9
}
}

scenario "can remove all users", {
when "user is removed", {
service.removeAllUsers()
}
then "count remove all users", {
service.userCount.shouldBe 0
}
}
}
}
This is a long example. I'm sorry, it sort of needs to be if you're going to show scenarios (think: test methods) nested within one another. Any parent scenario is capable of creating test data that is only visible within the scope of that scenario, including child scenarios. No more restrictions to just one setUp and one tearDown per test! And now, if you want to think about how test methods relate to each other, you can simply examine the indentation. The scoping rules of scenarios guarantees that related tests are going to be grouped together in the file. No more need to scroll through files reasoning about dependencies.

One note about this approach is that scenario given blocks (think: setUp()) do not create a fresh fixture; the scenario given block is run once for all the child scenarios to reuse, not one for each child scenario. This is a shared fixture pattern and creates chained tests, whereas the xUnit setUp does not. Perhaps there is a way for nested scenarios to have fresh fixtures from given blocks, but I'm unaware of it.

So regardless of your skepticism for behavior driven development, using Groovy and easyb for testing can bring benefits in test understandability and maintainability. Easy.

2 comments:

Eric said...

There is certainly a common theme here. The next specs version (http://scala-tools.org/repo-snapshots/org/scala-tools/testing/specs/1.6.0-SNAPSHOT/) will offer full support for nested examples (with non-shared fixtures by default).

Even if I haven't used it much in the past I guess that this is pretty natural to define scenarios/contexts as refinements of more general scenarios.

But thinking as I write, can't this be solved by having an inheritance tree for testing classes, with each setup refining the parent one?

Eric.

Hamlet D'Arcy said...

@Eric. Possibly inheritance can 'solve' this problem however inheritance brings its own set of problems. I've completely turned away from inheritance in test cases for several reasons: when a test case fails, you don't know which source file it failed in, reading a test case no longer explains how a feature should work, many private classes clutter a test case... I guess there's little you can do with inheritance that can't be done with composition.