by John Allspaw
Getting tight operationally means strengthening the resiliency of both your stack _and_ your organization's response to issues that arise. Outage postmortem meetings done wrong can be stress-filled blamefests, and done right can be collaborative illustrations of Resilience Engineering. I'll use Etsy.com examples to illustrate sticky topics such as Root Cause Analysis and Human Error.
14th–16th June 2011