"Every integration point will eventually fail in some way" - Michael T. Nygard
They sure do... and the book contains patterns to follow, like the Circuit Breaker, to ensure your system stays alive when the other system dies.
"Generating a slow response is worse than refusing a connection or returning an error" - Michael T. Nygard
The book explains how slow responses tend to propagate upward from layer to layer in a pattern he calls "cascading failure". There are several patterns to address this issue, especially Timeouts.
"Hope is not a design method." - Michael T. Nygard
Quotable nuggets like this make the book an entertaining read from start to finish.
"Ohnosecond: that very short moment in time during which you realize that you have pressed the wrong key and brought down a server, deleted vital data, or otherwise damaged the peace and harmony of stable operations." - Michael T. Nygard
It'd be alright with me if I never had credentials to log onto another production server in my life.
"Data purging always gets the short end of the stick. It certainly doesn't demo as well as... well, anything else in the world demos better than data purging." - Michael T. Nygard
There's not a lot to say about purging except, "Why aren't you doing it?" Still, the book contains a good call-to-arms about the topic.
"Nothing is as permanent as a temporary fix" - Michael T. Nygard
The last chapter contains a wonderful description of Enterprise Architecture in which a crystal palace has been built and calcified, and now developers tiptoe through the code speaking in hushed tones, trying not to touch anything. Reading this passage was worth the cost of the book alone.
"Storage is a service, not a device" - Michael T. Nygard
The real cost of hardware is thoroughly explained, and Michael shows how the real cost of ownership on local storage is sometimes seven times higher than the actual hardware cost. As for processing power, he introduces the concept of a "Forklift Upgrade"... adding CPUs to a box is pretty cheap, unless it requires a new box and your box weighs two tons. Makes you think differently.
"Some technologies just beg to be misused. Take bottle rockets. No matter how many warning labels manufacturers put on bottle rockets it's still a firecracker with a handle." - Michael T. Nygard
Or the combination of StringBuffer, SQL, and a JDBC connection. Oh God, please don't build up a SQL command by combining all these. The book's coverage of this one is a little more nuanced than mine.
"Nobody deliberately selects a design with the purpose of harming the system's capacity; instead, they select a functional design without regard to its effect on capacity." - Michael T. Nygard
Design for Capacity... can you honestly claim this as one of your design principles? This is different than designing so you don't ruin your capacity. Mr. Nygard explains how this isn't premature optimization and provides concrete patterns to follow to get capacity.
"Factor the cost of generating content out of individual requests and into the deployment process" - Michael T. Nygard
If web content is static between releases, then precompute that content as part of the build. This is a good example of how the book contains specific, concrete advice, and isn't just a collection of high-level maxims.
"The proper way to frame the availability decision is in straightforward financial terms: actual cost vs. avoided losses." - Michael T. Nygard
Service Level Agreements (SLAs) are given a fair and developer focused treatment, complete with some light math. See his blog posts on Reliability Math Getting Real About Reliability for a flavor of how some math can actually help explain some of these concepts (although those posts get into a lot more math than the book did).
"Most of the time, the real culprit [for deployment and production defects] is a mismatch in topology between QA and production" - Michael T. Nygard
I spend a staggering amount of time trying to replicate defects in development. Part of this are the hidden dependencies that come with housing several components on the same machine in QA but not production or differing network configurations like having firewalls set up between servers in production but nowhere else.
"A running application can be interrogated for its internal state, but a halted one cannot." - Michael T. Nygard
For me, this was non obvious. This quote comes in the context of building a clean start-up sequence for all of your system components, and not accepting work until the entire system is running successfully. Compare this to a system that aborts and exits if something fails during start-up. Which system would you rather troubleshooot? Which system has a better chance of being able to tell you what is wrong with it?
"Java GUIs make terrible administrative interfaces for long-term production operation." - Michael T. Nygard
GUIs make terrible administrative interfaces, whether they be desktop or web. Scriptability is advocated here, and exposing your application as JMX MBeans is held up as an excellent example of how to do it right without much effort.
"Without transparency, the system will drift into decay, functioning a bit worse with each release." - Michael T. Nygard
Many of the book's concepts are explained with analogies from outside computer science. For instance, eutrophication from Biology, bulkheads from the shipbuilding craft, and circuit breakers from electronics are all used to explain some of the patterns and anti-patterns laid out. Part of the joy of reading this book is finding those quotes and concepts from other scientific fields that somehow apply to our craft. The writing goes beyond the obvious tie-ins to entropy and constraint systems. Readers love a diverse bibliography.
"Can we make it through the holiday season? (Notice that this requires a projection about a projection, which doesn't just double the possibility of error but squares it.)" - Michael T. Nygard
It's OK to think in mathematical terms. Really, it is. This book makes you think differently about capacity, availability, and reliability by bringing it back to some simple numbers.
"A startling number of business-level issues can be traced back to batch jobs failing invisibly for 33 days straight." - Michael T. Nygard
With better monitoring you'd have had 32 days to fix the issue. Now it's the 33rd and you're working the weekend. Enjoy it. By the way, this is a design issue, not a reporting issue.
"Logging should be aimed at production operations rather than development or testing. One consequence is that anything logged at level "ERROR" or "SEVERE" should be something that requires action on the part of operations." - Michael T. Nygard
You can show up at work Monday and start applying this advice in your code, and you can show up and start applying the principles in your designs. That's a unique combination for a book. Release It! blends enjoyable writing, high level conceptual thinking, and lower level implementation steps. My work did a book club on Release It! and within a few weeks it has changed the flavor of our design sessions. I really enjoyed this book and highly recommend it to others.
Sunday, April 19, 2009