A large percentage of recovery time during an unexpected outage is often spent determining the extent of the problem and its source. Tools that help localize the problem and quickly measure its severity are extremely helpful. The last thing you need during an outage is to have your mail server fall over, too.
And yet, why don't we have a general purpose solution to this?
This talk will explore designing error aggregation systems. We’ll cover effectively capturing events, efficiently processing them, and displaying the relevant information in real time. Error aggregators nicely compliment your existing logging systems and email systems, taking the heat when there’s a problem and intelligently rolling that data up for easy analysis during a crisis.
21st–24th June 2011