Saturday 29th September, 2012
12:00pm to 1:00pm
The wayback machine is a high traffic website that has been online for over a decade. It was a mostly Java application. One component of the application is the Liveweb proxy. This is an HTTP proxy that archives a resource which is requested through it and the core data source for the wayback machine.
The liveweb proxy was rearchitected from scratch in Python and deployed on the actual website and has been running for a few months now without a single hitch. There were limitations in the standard library which needed to be worked around, careful tuning of parameters to balance disk I/O and memory usage, fine details of the HTTP protocol that needed to be understood and respected.
This talk discusses the architecture and design of the new system to handle the kind of traffic and patterns which are expected of an archiving proxy and how it was deployed.
Passionate programmer, opensource enthusiast, working for @internetarchive
Muslim, Father, Husband, Programmer, Trainer, Mapilla, Amateur calligrapher etc. *Not* a photographer. Working for @internetarchive. bio from Twitter
Sign in to add slides, notes or videos to this session