Sessions at PyCon US 2012 with notes on Friday 9th March

Your current filters are…

Clear
  • Speedily Practical Large-Scale Tests

    by Erik Rose

    Mozilla's projects have thousands of tests, so we've had to venture beyond vanilla test runners to keep things manageable. Our secret sauce can be used with your project as well. Reach beyond the test facilities that came with your project, harnessing pluggable test frameworks, dynamically reordering tests for speed, exploring various mocking libraries, and profiling your way to testing nirvana.

    A partial outline:

    Intro
    Motivation: a test not run is no test at all.
    For most web apps, the easiest test speed win is a conquest of I/O.
    The nose testrunner
    Test discovery lets you organize tests well.
    Pluggability
    Gluing to projects with custom testrunners: django-nose and test-utils
    py.test
    Compare to nose. Nose forked from it. Explain history.
    Very cool assertion re-evaluation
    Plugin compatibility between py.test and nose
    Profiling
    Start here. Premature optimization sucks.
    time on the commandline to divide CPU from I/O
    --with-profile
    Killing I/O for speedy justice: case study of support.mozilla.com
    Fixture speed hacks (a 5x improvement!)
    Once-per-class setup
    How to use DB transactions to avoid repetitive I/O
    Dynamic test reordering and fixture sharing
    DB reuse and other startup optimizations
    37,583 queries to 4,116. Watch them fly by!
    What to do instead of fixtures: the model-maker pattern
    Lexical proximity
    Lower coupling
    Speed
    Using mocking to kill the fixtures altogether
    mock, the canonical lib
    fudge, new declarative hotness
    Syntax, capabilities
    Example: oedipus, a better API for the Sphinx search engine. I used fudge to unit-test oedipus without requiring devs to set up and populate Sphinx.
    Dangers of mocking
    Don't mock out your caching unless your invalidation is perfect.
    Some of our mistakes in oedipus
    The nose-progressive display engine
    Test results that are a pain to read don't get read.
    Progress indication
    Elision of junk frames
    Easier round-tripping from test failure to source code
    Continuous integration
    Motivation
    Jenkins
    Buildbot
    IRC bots
    Next steps: what to do once you're CPU-bound
    More parallelization.
    Multithreading really buys you no speed bump for CPU-bound (or I/O bound?) tasks in Python due to the GIL. (Ref: PyCodeConf talk by David Beazley.)
    State of multiprocess plugins in various testrunners.
    Mozilla's Jenkins test farm
    QA's big stacks of Mac Minis
    What global warming? ;-)

    At 12:10pm to 12:55pm, Friday 9th March

    In D5, Santa Clara Convention Center

  • Code Generation in Python: Dismantling Jinja

    by Armin Ronacher

    For many DSLs such as templating languages it's important to use code generation to achieve acceptable performance in Python. The current version of Jinja went through many different iterations to end up where it is currently. This talk walks through the design of Jinja2's compiler infrastructure and why it works the way it works and how one can use newer Python features for better results.

    Why Code Generation?
    It seems like the general consensus for code generation in many dynamic language communities is: eval is evil, do not use it. However if done properly code generation solves a lot of problems easily, securely and with much better performance than an interpreter written on top of an interpreted language like Python.

    Code generation is what powers most template languages in Python, what powers object relational mappers and more. It is also an excellent tool to simplify debugging.

    Why Codegen is no Silver Bullet
    Just because you generate code does not mean you're faster than an interpreter written in Python. This part of the talk focuses on why compiling Django templates to Python bytecode does not automatically make it fast.

    Design of Jinja2
    Jinja2 underwent multiple design iterations, most of which were made to either improve performance or debug-ability. The internals however are largely undocumented and confusing unless you're familiar with the code. In it however are a few gems hidden and interesting tricks to make code generation work in the best possible way.

    Python's Support for Code Generation
    Over the years Python's support for code generation was steadily improved with different ways to access the abstract syntax tree and to compiling it back to bytecode. This section highlights some alternative ways to do code generation that are not yet fully implemented in Jinja2 but are otherwise widely used.

    At 2:00pm to 2:40pm, Friday 9th March

    In E2, Santa Clara Convention Center

  • Apache Cassandra and Python

    by Jeremiah Jordan

    Using Apache Cassandra from Python is easy to do. This talk will cover setting up and using a local development instance of Cassandra from Python. It will cover using the low level thrift interface, as well as using the higher level pycassa library.

    • Very brief intro to Apache Cassandra
    • What is Apache Cassandra and where do I get it?
    • Using the Cassandra CLI to setup a keyspace (table) to hold our data
    • Installing the Cassandra thrift API module
    • Using Cassandra from the thrift API
    • Connecting
    • Writing
    • Reading
    • Batch operations
    • Installing the pycassa module
    • Using Cassandra from the pycassa module
    • Connecting
    • Reading
    • Writing
    • Batch operations
    • Indexing in Cassandra
    • Automatic vs Rolling your own
    • Using Composite Columns
    • Setting them up from the CLI
    • How to using them from pycassa
    • Lessons learned

    At 2:40pm to 3:20pm, Friday 9th March

    In E2, Santa Clara Convention Center

  • Make Sure Your Programs Crash

    by Moshe Zadka

    With Python, segmentation faults and the like simply don't happen -- programs do not crash. However, the world is a messy, chaotic place. What happens when your programs crash? I will talk about how to make sure that your application survives crashes, reboots and other nasty problems.

    Handling crashes is divided into two parts -- resilience (making sure that your software maintains correctness in the face of crashes) and speed of recovery (optimizing the time it takes back to get back to full working condition). I will talk about techniques to allow for resilience -- separating master data from cache data, minimizing the amount of master data, using atomic file operations, using databases and persisting structures in the right order. Then I will talk about speedy recovery techniques, among them process separation, working while restarting and more. I will conclude with surveying the options in testing all of these things so that the crashes are made to happen in the development/testing environment.

    Outline:

    • Ways Python programs can crash
    • Infinite loops
    • Getting stuck
    • Memory leaks
    • Exceptions
    • Catching exceptions considered scary
    • Threads dead-locks
    • Minimizing effects of a crash
    • Atomic file operations
    • Databases
    • Vertical process splitting
    • Horizontal process splitting
    • Limiting process lifetime
    • Detecting crashes
    • Process death
    • Process inresponsiveness
    • Test communication
    • Helper checker processes
    • Restarting processes
    • Minimize master data
    • Boot-up speed
    • Order of start-up and communication
    • Testing by killing processes
    • Testing by pausing processes
    • Conclusions
    • Python processes can still crash
    • Plan for crashes
    • Test your plan for crashes

    At 3:20pm to 3:50pm, Friday 9th March

    In E1, Santa Clara Convention Center

  • Python Metaprogramming for Mad Scientists and Evil Geniuses

    by Walker Hale

    This talk covers the power and metaprogramming features of Python that cater to mad scientists and evil geniuses. This will also be of interest to others who just want to use of Python in a more power (hungry) way. The core concept is that you can synthesize functions, classes and modules without a direct correspondence to source code. You can also mutate third-party objects and apps.

    This talk covers the power and metaprogramming features of Python that cater to mad scientists and evil geniuses. This will also be of interest to others who just want to use of Python in a more power (hungry) way.

    Users of Python are not limited to the usual model of a one-to-one correspondence between source code and live objects. Python allows you to synthesize functions, classes and modules without a direct correspondence to source code. You can mutate third-party objects, classes, modules and applications through monkey patching -- changing their behavior without altering their source code. You can even "chop-up" third-party objects to create new objects from the pieces. Find out how to unleash your inner Mad Scientist!

    Thesis: Python is an ideal language for both:

    • Mad Scientists
    • Evil Geniuses
    • Mad Scientist versus Evil Genius
    • Mad Scientist: creating new things because it's cool
    • Evil Genius: practical applications
    • Typical Mad Science Goals
    • Create new living code objects from scraps without corresponding source code.
    • Mutate third-party code to suite our purposes without modifying the third-party source code.
    • Synthetics
    • Synthetic Functions
    • Synthetic Classes
    • Synthetic Modules
    • Applications of Synthetics
    • Monkey Patching
    • Monkey Patching Modules
    • Monkey Patching Classes
    • Monkey Patching Instances
    • sitecustomize.py
    • Dealing with Angry Villagers
    • Limitations: When not to do this
    • For the Evil Geniuses

    Although most of the material is presented from the point of view of the Mad Scientist, it is equally useful to the Evil Genius.

    Since the Python community prides itself on diversity, I should emphasize that the sane, the non-evil, and "do-gooders" are all welcome.

    At 4:25pm to 5:20pm, Friday 9th March

    In E3, Santa Clara Convention Center

    Coverage video note

  • Introspecting Running Python Processes

    by Adam Lowry

    Understanding the internal state of a running system can be vital to maintaining a high performance, stable system, but conventional approaches such as logging and error handling only expose so much. This talk will touch on how to instrument Python programs in order to observe the state of the system, measure performance, and identify ongoing problems.

    Something is wrong with your web application. The time it’s taking to serve requests is growing. Your logs don’t contain enough. Your database appears bored. How do you know what’s going wrong?

    In high-performance production servers it’s vital to know as much about the internals of your system as possible. Traditionally this is done by simple methods like logging anything of potential interest or sending error emails with unexpected exceptions. These methods are insufficient, both due to the level of noise inherent in such systems and because of the difficulty in anticipating what metrics are important during an incident.

    Environments such as the JVM and .Net VM have advanced tools for communicating with the VM and for applications to expose internal state, but CPython has lacked similar tooling.

    This talk will cover what options CPython application developers have for introspecting their programs; new tools for instrumenting, exposing, and compiling performance and behavior metrics; and techniques for diagnosing runtime issues without restarting the process.

    At 5:20pm to 6:00pm, Friday 9th March

    In E1, Santa Clara Convention Center