by Erik Rose
Mozilla's projects have thousands of tests, so we've had to venture beyond vanilla test runners to keep things manageable. Our secret sauce can be used with your project as well. Reach beyond the test facilities that came with your project, harnessing pluggable test frameworks, dynamically reordering tests for speed, exploring various mocking libraries, and profiling your way to testing nirvana.
A partial outline:
Intro
Motivation: a test not run is no test at all.
For most web apps, the easiest test speed win is a conquest of I/O.
The nose testrunner
Test discovery lets you organize tests well.
Pluggability
Gluing to projects with custom testrunners: django-nose and test-utils
py.test
Compare to nose. Nose forked from it. Explain history.
Very cool assertion re-evaluation
Plugin compatibility between py.test and nose
Profiling
Start here. Premature optimization sucks.
time on the commandline to divide CPU from I/O
--with-profile
Killing I/O for speedy justice: case study of support.mozilla.com
Fixture speed hacks (a 5x improvement!)
Once-per-class setup
How to use DB transactions to avoid repetitive I/O
Dynamic test reordering and fixture sharing
DB reuse and other startup optimizations
37,583 queries to 4,116. Watch them fly by!
What to do instead of fixtures: the model-maker pattern
Lexical proximity
Lower coupling
Speed
Using mocking to kill the fixtures altogether
mock, the canonical lib
fudge, new declarative hotness
Syntax, capabilities
Example: oedipus, a better API for the Sphinx search engine. I used fudge to unit-test oedipus without requiring devs to set up and populate Sphinx.
Dangers of mocking
Don't mock out your caching unless your invalidation is perfect.
Some of our mistakes in oedipus
The nose-progressive display engine
Test results that are a pain to read don't get read.
Progress indication
Elision of junk frames
Easier round-tripping from test failure to source code
Continuous integration
Motivation
Jenkins
Buildbot
IRC bots
Next steps: what to do once you're CPU-bound
More parallelization.
Multithreading really buys you no speed bump for CPU-bound (or I/O bound?) tasks in Python due to the GIL. (Ref: PyCodeConf talk by David Beazley.)
State of multiprocess plugins in various testrunners.
Mozilla's Jenkins test farm
QA's big stacks of Mac Minis
What global warming? ;-)
For many DSLs such as templating languages it's important to use code generation to achieve acceptable performance in Python. The current version of Jinja went through many different iterations to end up where it is currently. This talk walks through the design of Jinja2's compiler infrastructure and why it works the way it works and how one can use newer Python features for better results.
Why Code Generation?
It seems like the general consensus for code generation in many dynamic language communities is: eval is evil, do not use it. However if done properly code generation solves a lot of problems easily, securely and with much better performance than an interpreter written on top of an interpreted language like Python.
Code generation is what powers most template languages in Python, what powers object relational mappers and more. It is also an excellent tool to simplify debugging.
Why Codegen is no Silver Bullet
Just because you generate code does not mean you're faster than an interpreter written in Python. This part of the talk focuses on why compiling Django templates to Python bytecode does not automatically make it fast.
Design of Jinja2
Jinja2 underwent multiple design iterations, most of which were made to either improve performance or debug-ability. The internals however are largely undocumented and confusing unless you're familiar with the code. In it however are a few gems hidden and interesting tricks to make code generation work in the best possible way.
Python's Support for Code Generation
Over the years Python's support for code generation was steadily improved with different ways to access the abstract syntax tree and to compiling it back to bytecode. This section highlights some alternative ways to do code generation that are not yet fully implemented in Jinja2 but are otherwise widely used.
by Jeremiah Jordan
Using Apache Cassandra from Python is easy to do. This talk will cover setting up and using a local development instance of Cassandra from Python. It will cover using the low level thrift interface, as well as using the higher level pycassa library.
by Moshe Zadka
With Python, segmentation faults and the like simply don't happen -- programs do not crash. However, the world is a messy, chaotic place. What happens when your programs crash? I will talk about how to make sure that your application survives crashes, reboots and other nasty problems.
Handling crashes is divided into two parts -- resilience (making sure that your software maintains correctness in the face of crashes) and speed of recovery (optimizing the time it takes back to get back to full working condition). I will talk about techniques to allow for resilience -- separating master data from cache data, minimizing the amount of master data, using atomic file operations, using databases and persisting structures in the right order. Then I will talk about speedy recovery techniques, among them process separation, working while restarting and more. I will conclude with surveying the options in testing all of these things so that the crashes are made to happen in the development/testing environment.
Outline:
by Walker Hale
This talk covers the power and metaprogramming features of Python that cater to mad scientists and evil geniuses. This will also be of interest to others who just want to use of Python in a more power (hungry) way. The core concept is that you can synthesize functions, classes and modules without a direct correspondence to source code. You can also mutate third-party objects and apps.
This talk covers the power and metaprogramming features of Python that cater to mad scientists and evil geniuses. This will also be of interest to others who just want to use of Python in a more power (hungry) way.
Users of Python are not limited to the usual model of a one-to-one correspondence between source code and live objects. Python allows you to synthesize functions, classes and modules without a direct correspondence to source code. You can mutate third-party objects, classes, modules and applications through monkey patching -- changing their behavior without altering their source code. You can even "chop-up" third-party objects to create new objects from the pieces. Find out how to unleash your inner Mad Scientist!
Thesis: Python is an ideal language for both:
Although most of the material is presented from the point of view of the Mad Scientist, it is equally useful to the Evil Genius.
Since the Python community prides itself on diversity, I should emphasize that the sane, the non-evil, and "do-gooders" are all welcome.
by Adam Lowry
Understanding the internal state of a running system can be vital to maintaining a high performance, stable system, but conventional approaches such as logging and error handling only expose so much. This talk will touch on how to instrument Python programs in order to observe the state of the system, measure performance, and identify ongoing problems.
Something is wrong with your web application. The time it’s taking to serve requests is growing. Your logs don’t contain enough. Your database appears bored. How do you know what’s going wrong?
In high-performance production servers it’s vital to know as much about the internals of your system as possible. Traditionally this is done by simple methods like logging anything of potential interest or sending error emails with unexpected exceptions. These methods are insufficient, both due to the level of noise inherent in such systems and because of the difficulty in anticipating what metrics are important during an incident.
Environments such as the JVM and .Net VM have advanced tools for communicating with the VM and for applications to expose internal state, but CPython has lacked similar tooling.
This talk will cover what options CPython application developers have for introspecting their programs; new tools for instrumenting, exposing, and compiling performance and behavior metrics; and techniques for diagnosing runtime issues without restarting the process.