Tuesday 22nd March, 2016
4:55pm to 5:25pm
The 'Black Friday fail' is the greatest fear of every major online retailer. Since downtime equals money, and in Black Friday it means quite a lot of money.
But the sad truth is that a failure of a service is inevitable, especially in a large distributed system. So how can we survive a failure of a service when it inevitably fails.
* In this lecture I will show how failures in large systems differs from failures in small systems.
* Will show examples of resilience engineering.
* Why simulate failures, and how to do it in your system.
* How to use gradual rollout, circuit breakers and automatic fallback to protect your system.
* The importance of failing fast, and failing silently.
* And the misconceptions we all have on how a large scale website failure unfolds.
Sign in to add slides, notes or videos to this session