Wednesday, April 30, 2008

The Frictionless Architecture (and other lessons from the punchcard era)

Tom Evans gave an inspired talk at OTUG last week with the (possibly) uninspiring name of Business Agility Principles and Architecture: Lessons from the Punch Card Era. I'm gun-shy about attending things with agile in the title, but this turned out to be a very practical session for software developers. The two points I took away from it were around his musings on designing a frictionless architecture and designing the failure path first.

The frictionless architecture idea came from his days in the telecom industry, in which punch cards were used to transport customer data. Requesting a new line to your office? It gets encoded on a human-readable punchcard and sent via inter-office mail to the correct department. Requesting residential service? This punchcard is delivered the same way and goes to a different department. The point is that the transport layer doesn't care at all about what is being transferred. Inter-office mail will deliver anything as long as it is in a brown envelope. Did you forget to fill out your billing address? Well, the punchcard is still delivered. Did you fill out the request in Kanji? Inter-office mail doesn't care. It will wind up in someone's inbox, ready to be processed. Compare this with my current b2b web services project. When a customer sends me a request and has forgotten to include required fields (the address perhaps), then that request errors out on dtd validation and hopefully the information is written to a log file by my web server. If you send a request in a multi-byte language like Kanji, and accidentally screw up the encodings, my web service is probably going to puke too. Hey, it's your fault for sending me "garbage" data, right? The point is that a customer is trying to contact me, and I'm just falling over and not fulfilling the request simply because they didn't know enough to fill out the form correctly. In the punchcard era, that request gets delivered properly. As long as the punchcard machines get oiled (the frictionless part), then every single transaction will be delivered. And that request is human readable, which leads to the next point: designing the failure path first.

The failure path for the punchcards was built into the system. A human received the punchcard. If that request was missing the address, then the telecom employee could just look at your telephone number on the request and call you to get the missing information. If the request was written in Kanji, then they can go find an interpretor to create the translated version. I have an apostrophe in my name (D'Arcy), and I can't tell you how many times websites and web services simply don't work because of that. I don't get a callback, or a human trying to fix it; I get an error message if I'm lucky. Punchcards had a great failure path. The business could almost always fulfill a request even if their customer totally screwed things up. So what's this about the failure path first stuff? Well, by default the business was able to fulfill automated transactions that failed. If they rolled out an entirely new service that was not supported by IT, they could still have humans fulfill all of those orders by hand. Supporting a new service by default required hiring temp workers to type in orders by hand. If my business wants to roll out a new service, there is no way for me to harness manual labor at all. Unless there is a new service and a dtd/schema (and maybe a wsdl, and and endpoint, etc. etc) I won't be able to handle any new services! On one of Tom's projects, his company actually rolled out new services, and those orders arrived on punch cards or faxes, and IT finally caught up with an automated SmallTalk system later when the daily order volume started exceeding 100 orders a day. It would be a huge value if, by default, all unrecognized incoming transactions were put into a queue in their entirety, where a fleet of temp workers could slowly process them by hand.

And business agility? The telecom could roll out new services without any IT support. Not even an iteration 1 to get the skeleton together. Talk about delivering value early; the architecture could deliver the value even before the first sprint started. Brilliant.

This slides aren't posted yet (c'mon Tom, email them over), but Tom did say he was preparing a longer published work later this year. Definitely looking forward to that. Lastly, if you're in the Twin Cities please come out to the May meeting to hear Ted Neward speak about Scala. Attendance and the speakers have been good this year... come join the party.

1 comment:

Peter Pascale said...

I wasn't sure about this concept from a technical/analysis approach, because it's building the exception cases first. "How can that add the appropriate business value?" And yet - that's the point - when the business value is immediate responsiveness. The concern in these cases is not how to handle massive load - its how to respond today.

This has a nice side effect - so often in software we don't provide the appropriate level of tooling for the exception cases, and operational support folks can be hamstrung. In the Frictionless Architecture - that is built up front.

I'm not sure I think of it as architecture though - perhaps its 'Frictionless Operations' or Frictionless Business? I think we can do our business counterparts a service and raise concepts like this - but these are business-driven service approach concepts.