Rewriting the Wayback machine's live web proxy in Python

A session at PyCon India 2012

Saturday 29th September, 2012

12:00pm to 1:00pm (HMT)

The wayback machine is a high traffic website that has been online for over a decade. It was a mostly Java application. One component of the application is the Liveweb proxy. This is an HTTP proxy that archives a resource which is requested through it and the core data source for the wayback machine.

The liveweb proxy was rearchitected from scratch in Python and deployed on the actual website and has been running for a few months now without a single hitch. There were limitations in the standard library which needed to be worked around, careful tuning of parameters to balance disk I/O and memory usage, fine details of the HTTP protocol that needed to be understood and respected.

This talk discusses the architecture and design of the new system to handle the kind of traffic and patterns which are expected of an archiving proxy and how it was deployed.

About the speakers

This person is speaking at this event.
Anand Chitipothu

Passionate programmer, opensource enthusiast, working for @internetarchive

This person is speaking at this event.
Noufal Ibrahim

Muslim, Father, Husband, Programmer, Trainer, Mapilla, Amateur calligrapher etc. *Not* a photographer. Working for @internetarchive. bio from Twitter

Sign in to add slides, notes or videos to this session

PyCon India 2012

India India, Bangalore

28th30th September 2012

Tell your friends!


Time 12:00pm1:00pm HMT

Date Sat 29th September 2012

Short URL


View the schedule


See something wrong?

Report an issue with this session