The Web is too large to fit on a single machine, so it's no surprise that searching the Web requires the coordination of many machines, too. A single Google query may touch over a thousand machines before the results are returned to the user, all in a fraction of a second.
With all those machines, the opportunities for parallelism and distributed computation are offset by the likelihood of hardware failure. If one machine breaks on average every few years, a pool of a thousand machines will have machines break on a daily basis. A key part of the Google story is that by designing a system to cope with breakage, we can provide not only robustness, but also parallelizability, efficiency, and economies of scale.
Sign in to add slides, notes or videos to this session