Sessions at PyCon US 2012 with slides and video in Santa Clara Convention Center

Your current filters are…


Wednesday 7th March 2012

  • Introduction to Django

    by Chander Ganesan

    The Django framework is a fast, flexible, easy to learn, and easy to use framework for designing and deploying web sites and services using Python. In this session, we'll cover the fundamentals of development with Django, generate a Django data model, and put together a simple web site using the framework.

    • Detailed Tutorial Outline
    • Django Overview and Basic Introduction
    • Downloading & Installing Django
    • Creating a new project
    • Choosing a database
    • Creating a new application
    • Installing & Using Django contrib applications
    • Overview of Django flow (i.e., URLconf expression, view function, HTTPResponse object, etc.)
    • Generating Simple Django Views
    • Configuring a URLConf for basic views
    • Creating Django Templates (template syntax, common filters and tags, loops, etc)
    • Creating & using Template Context objects
    • Introduction to Django Models
    • Defining basic Django models
    • Understanding basic model fields & options
    • Generating & Reviewing Model SQL
    • Adding data to a model
    • Simple data retrieval using models
    • Working with QuerySets (filters, slicing, ordering, common methods)
    • Overview of Q objects)
    • Using the Admin interface
    • Using Generic views
    • Access control with sessions & users

    At 9:00am to 12:20pm, Wednesday 7th March

    In H3, Santa Clara Convention Center

  • IPython in-depth: high-productivity interactive and parallel python

    by Brian E. Granger, Min Ragan-Kelley and Fernando Pérez

    IPython provides tools for interactive and parallel computing that are widely used in scientific computing, but can benefit any Python developer. We will show how to use IPython in different ways, as: an interactive shell, an embedded shell, a graphical console, a network-aware VM in GUIs, a web-based notebook with code, graphics and rich HTML, and a high-level framework for parallel computing.

    IPython started as a better interactive Python interpreter in 2001, but over the last decade it has grown into a rich and powerful set of interlocking tools aimed at maximizing developer productivity with Python while using the language interactively.

    Today, IPython consists of a kernel that executes the user code and controls the user's namespace, and a collection of tools to control this kernel either in-process or out-of-process thanks to a well-specified communications protocol implemented over ZeroMQ. The kernel can do much more than execute user code, including introspection of objects in the user's namespace, detailed error reporting with rich tracebacks, history logging of inputs and outputs with an SQLite backend, a user-extensible system of commands for interactive control that don't collide with user variables, and much more.
    Our communications architecture allows these same features to be accessed via a variety of clients, each providing unique functionality tuned to a specific use case. We expose a number of directly usable applications:

    An interactive, terminal-based shell with many capabilities far beyond the default Python interactive interpreter (this is the default application opened by the ipython command that most users are familiar with).

    A Qt console that provides the look and feel of a terminal, but adds support for inline figures, graphical calltips, a persistent session that can survive crashes (even segfaults) of the kernel process, and more. A user-based review of some of these features can be found here.

    A web-based notebook that can execute code and also contain rich text and figures, mathematical equations and arbitrary HTML. This notebook controls the same kernel as the other two applications, but instead of offering a linear, terminal-like workflow, it presents a document-like view with cells where code is executed but that can be edited in-place, reordered, mixed with explanatory text and figures, etc. This model is a kind of literate programming environment popular in scientific computing and pioneered by the Mathematica system, that allows for the creation of rich documents that combine computational experimentation and results with other explanatory elements. A detailed review of this system can be found here.

    A high-performance, low-latency system for parallel computing that supports the control of a cluster of IPython engines communicating over ZeroMQ, with optimizations that minimize unnecessary copying of large objects (especially numpy arrays). These engines can be controlled interactively while developing and doing exploratory work, or can run in batch mode either on a local machine or in a large cluster/supercomputing environment via a batch scheduler.

    In this hands-on, in-depth tutorial, we will briefly describe IPython's architecture and will then show how to use and configure each of the above components. We will also discuss how to use the underlying IPython libraries in your own application to provide interactive control.

    An outline of the tutorial follows:

    • Introductory description of the project and architecture.
    • IPython basics: the magic command system, shell aliases, full shell access, the history system, variable caching, object introspection tools.
    • Development workflow: combining the interpreter session with python files via the %run command.
    • Effective use of IPython at the command-line for typical development tasks: timing, profiling, debugging.
    • Embedding IPython in terminal applications.
    • The IPython Qt console: unique features beyond the terminal.
    • Embedding an IPython kernel in a GUI app to expose network-based interactive control.
    • Configuring IPython: the profile and configuration system for multiple applications.
    • The IPython notebook: interactive usage of the application, the IPython display protocol, defining custom display methods for your own objects, generating HTML and PDF output.
    • Parallelism with IPython: basic architecture, interactive control of a cluster, standalone execution of applications, integration with MPI, blocking and asynchronous parallelism, execution in batch-controlled environments, IPython engines in the cloud (illustrated with Amazon EC2 instances).

    A short listing of other features not covered in this tutorial, as guidance for users to later learn about on their own.

    For full details about IPython including documentation, previous presentations and videos of talks, please see the project website.

    At 9:00am to 12:20pm, Wednesday 7th March

    In F2, Santa Clara Convention Center

  • Django in Depth

    by James Bennett

    A tutorial that goes beyond all other Django tutorials; we'll dive deep into the guts of the framework, and learn how each commonly-used component -- ORM, templates, HTTP handling, views and the admin -- work from the bottom up, covering both public and internal APIs in excruciating detail.

    At 1:20pm to 4:40pm, Wednesday 7th March

    In D1, Santa Clara Convention Center

Thursday 8th March 2012

  • High Performance Python I

    by Ian Ozsvald

    At EuroPython 2011 I ran a very hands-on tutorial for High Performance Python techniques. This updated tutorial will cover profiling, PyPy, Cython, numpy, NumExpr, ShedSkin, multiprocessing, ParallelPython and pyCUDA. Here's a 55 page PDF write-up of the EuroPython material: http://ianozsvald.com/2011/07/25...

    At EuroPython 2011 I ran a very hands-on tutorial for High Performance Python techniques. This updated tutorial will cover:

    • profiling with cProfile, run_snake and line_profiler
    • PyPy
    • Cython
    • numpy with and without vectors
    • NumExpr
    • ShedSkin Py->C++ compiler
    • multiprocessing for multi-core
    • ParallelPython for multi-machine
    • pyCUDA demos

    I plan to expand the original material and to maybe also cover other tools like execnet and PyPy-numpy.

    At 9:00am to 12:20pm, Thursday 8th March

    In D2, Santa Clara Convention Center

  • Optimize Performance and Scalability with Parallelism and Concurrency

    by Robert Hancock

    From how the operating system handles your requests through design principles on how to use concurrency and parallelism to optimize your program's performance and scalability. We will cover processes, threads, generators, coroutines, non-blocking IO, and the gevent library.

    How processes, threads, coroutines, and non-blocking IO work from the operating system through code implementation and design principles to optimize Python programs. The difference between parallelism and concurrency and when to use each.

    The premise is that to make an informed decision you need to know what is happening under the hood. Once you understand the low level functionality, you can make the correct decision in the design phase.

    The emphasis is on practical application to solve real world problems.


    • How the operating system handles traps and interrupts
    • Scheduling
    • Processes
    • Threads
    • The GIL
    • Generators
    • What is a coroutine?
    • What is a Python coroutine?
    • Blocking/Non-blocking I/O.
    • Parallelism versus Concurrency
    • How do these work with CPython, Pypy, and Stackless
    • Greenlets and libevent (gevent)
    • Design principles
    • Example networked application
    • Performance results
    • What are other the other options?

    At 9:00am to 12:20pm, Thursday 8th March

    In F2, Santa Clara Convention Center

Friday 9th March 2012

  • PBS KIDS: Building a login system for kids and teens in Python

    by Edgar Roman

    Our challenge was to create a login system for little people who might barely read, maybe no email, perhaps no home computer. And we had to watch out for privacy laws - especially tough for minors. But these kids want to play games, write stories, and create online avatars to share and compete against their buddies. Listen to how we developed the PBS KIDS login and moderation system in Django.

    At 10:50am to 11:30am, Friday 9th March

    In E1, Santa Clara Convention Center

  • Data, Design, Meaning

    by Idan Gazit

    The ultimate goal of data visualization is to tell a story and supply meaning. There are tools and science that can inform your choice of data to present and how best to present it. We reflexively evaluate data and fit it into a narrative which aids decisionmaking; learn how to take advantage of this tendency in order to deliver meaning, not just numbers and charts.

    Data visualization is a hot field right now—and for good reason. In our age of info-saturation, true value is found in distilling large amounts of data into a form that is easy to comprehend and act upon. This talk provides an overview of tools and techniques which you can use to level up your data presentation, regardless of application.

    As humans, we are adept at evaluating visual information. From an early age, we learn to make inferences about things based on their visual properties—large and small, near and far, motion, direction, and other attributes. Taking advantage of the visual process we’ve been practicing since birth is an easy way to optimize delivery of your data into the brains of your audience.

    Unfortunately, it isn’t enough to appeal to the part of our brains responsible for figuring out whether we can successfully hit an animal with a rock. A great visualization must appeal to our sense of beauty. Structure, layout, typography, and color are all tools which can be used (and abused) to delight your audience and direct their attention where you want it to go.

    Whether you’re building an information dashboard for a webapp or presenting scientific data, an understanding of these techniques will make your data more accessible to your audience, and more of a delight to read and learn from.

    At 12:10pm to 12:55pm, Friday 9th March

    In E4, Santa Clara Convention Center

  • Practical Machine Learning in Python

    by Matt Spitz

    There are a plethora of options when it comes to deciding how to add a machine learning component to your python application. In this talk, I'll discuss why python as a language is well-suited to solving these particular problems, the tradeoffs of different machine learning solutions for python applications, and some tricks you can use to get the most out of whatever package you decide to use.

    This is the age of data. As more companies expose their datasets through APIs, it's becoming increasingly easier to pull information about users, places, and things. But having this data isn't always enough; we want to understand it, find correlations, and identify trends. Fortunately, the area of computer science known as machine learning has a variety of algorithms specifically designed to do this sort of data wrangling. For the python application developer, there are many off-the-shelf toolkits that include implementations of these algorithms (Orange, NLTK, SHOGUN, PyML and scikit-learn to name just a few), but choosing which one to use can be daunting.

    There are a number of tradeoffs one makes when making a selection, depending on the specifics of the implementation and the needs of the application. In this talk, I'll give an overview of some of the packages available and discuss what factors might go into deciding which one to use. I'll also offer some python-specific tricks you can use to work with large amounts of data efficiently.

    At 12:10pm to 12:40pm, Friday 9th March

    In E1, Santa Clara Convention Center

  • Speedily Practical Large-Scale Tests

    by Erik Rose

    Mozilla's projects have thousands of tests, so we've had to venture beyond vanilla test runners to keep things manageable. Our secret sauce can be used with your project as well. Reach beyond the test facilities that came with your project, harnessing pluggable test frameworks, dynamically reordering tests for speed, exploring various mocking libraries, and profiling your way to testing nirvana.

    A partial outline:

    Motivation: a test not run is no test at all.
    For most web apps, the easiest test speed win is a conquest of I/O.
    The nose testrunner
    Test discovery lets you organize tests well.
    Gluing to projects with custom testrunners: django-nose and test-utils
    Compare to nose. Nose forked from it. Explain history.
    Very cool assertion re-evaluation
    Plugin compatibility between py.test and nose
    Start here. Premature optimization sucks.
    time on the commandline to divide CPU from I/O
    Killing I/O for speedy justice: case study of support.mozilla.com
    Fixture speed hacks (a 5x improvement!)
    Once-per-class setup
    How to use DB transactions to avoid repetitive I/O
    Dynamic test reordering and fixture sharing
    DB reuse and other startup optimizations
    37,583 queries to 4,116. Watch them fly by!
    What to do instead of fixtures: the model-maker pattern
    Lexical proximity
    Lower coupling
    Using mocking to kill the fixtures altogether
    mock, the canonical lib
    fudge, new declarative hotness
    Syntax, capabilities
    Example: oedipus, a better API for the Sphinx search engine. I used fudge to unit-test oedipus without requiring devs to set up and populate Sphinx.
    Dangers of mocking
    Don't mock out your caching unless your invalidation is perfect.
    Some of our mistakes in oedipus
    The nose-progressive display engine
    Test results that are a pain to read don't get read.
    Progress indication
    Elision of junk frames
    Easier round-tripping from test failure to source code
    Continuous integration
    IRC bots
    Next steps: what to do once you're CPU-bound
    More parallelization.
    Multithreading really buys you no speed bump for CPU-bound (or I/O bound?) tasks in Python due to the GIL. (Ref: PyCodeConf talk by David Beazley.)
    State of multiprocess plugins in various testrunners.
    Mozilla's Jenkins test farm
    QA's big stacks of Mac Minis
    What global warming? ;-)

    At 12:10pm to 12:55pm, Friday 9th March

    In D5, Santa Clara Convention Center

  • Build reliable, traceable, distributed systems with ZeroMQ

    by Jérôme Petazzoni

    We will show how to build simple yet powerful RPC code with ZeroMQ, with very few (if any!) modification to existing code. We will build fan-in and fan-out topologies with ZeroMQ special socket types to implement PUB/SUB patterns and scale up job-processing tasks. Thanks to introspection, the resulting services will be self-documented. Finally, we will show how to implement distributed tracing.

    We will show how to leverage ZeroMQ to build a simple yet powerful RPC for Python code. We will focus on simplicity, the goal being to expose almost any Python module or class to network calls – with very few (if any!) modification to existing code.

    We will then explain the purpose and show some use-cases for ZeroMQ special socket types (PUSH/PULL, PUB/SUB, ROUTER/DEALER) to build fan-in and fan-out topologies, as well as asynchronous processing (to avoid blocking when doing long-running requests). A by-product is the ability to scale up job-processing tasks with a message queue, which can even be made broker-less (you don’t have to deploy heavy machinery if you don’t need it).

    We will also demonstrate how introspection can make development and debugging easier, exposing docstrings, and provideing a few command-line helpers to poke, debug, and experiment directly from the shell.

    At the end of the talk (or in a separate talk), we will explain how to implement a tracing framework for distributed RPC. By hooking into the right places, we will show how to get full tracebacks and profiling information; more precisely:

    how each complex call (involving multiple subcalls) can be accurately traced;
    how to handle exceptions, and know easily when and where they happened (without checking dozens of log files);
    which complex calls take too long, and where they spend their time (distributed profiling).
    Those guidelines are the result of an on-going development work at dotCloud, and actively used and implemented at the core of our leading Platform-as-a-Service offering.

    We don’t expect the audience to be familiar with ZeroMQ or RPC. However, it will certainly help to have basic knowledge of serialization (e.g. pickle) and sockets.

    At 2:00pm to 2:40pm, Friday 9th March

    In E4, Santa Clara Convention Center

  • Code Generation in Python: Dismantling Jinja

    by Armin Ronacher

    For many DSLs such as templating languages it's important to use code generation to achieve acceptable performance in Python. The current version of Jinja went through many different iterations to end up where it is currently. This talk walks through the design of Jinja2's compiler infrastructure and why it works the way it works and how one can use newer Python features for better results.

    Why Code Generation?
    It seems like the general consensus for code generation in many dynamic language communities is: eval is evil, do not use it. However if done properly code generation solves a lot of problems easily, securely and with much better performance than an interpreter written on top of an interpreted language like Python.

    Code generation is what powers most template languages in Python, what powers object relational mappers and more. It is also an excellent tool to simplify debugging.

    Why Codegen is no Silver Bullet
    Just because you generate code does not mean you're faster than an interpreter written in Python. This part of the talk focuses on why compiling Django templates to Python bytecode does not automatically make it fast.

    Design of Jinja2
    Jinja2 underwent multiple design iterations, most of which were made to either improve performance or debug-ability. The internals however are largely undocumented and confusing unless you're familiar with the code. In it however are a few gems hidden and interesting tricks to make code generation work in the best possible way.

    Python's Support for Code Generation
    Over the years Python's support for code generation was steadily improved with different ways to access the abstract syntax tree and to compiling it back to bytecode. This section highlights some alternative ways to do code generation that are not yet fully implemented in Jinja2 but are otherwise widely used.

    At 2:00pm to 2:40pm, Friday 9th March

    In E2, Santa Clara Convention Center

  • Apache Cassandra and Python

    by Jeremiah Jordan

    Using Apache Cassandra from Python is easy to do. This talk will cover setting up and using a local development instance of Cassandra from Python. It will cover using the low level thrift interface, as well as using the higher level pycassa library.

    • Very brief intro to Apache Cassandra
    • What is Apache Cassandra and where do I get it?
    • Using the Cassandra CLI to setup a keyspace (table) to hold our data
    • Installing the Cassandra thrift API module
    • Using Cassandra from the thrift API
    • Connecting
    • Writing
    • Reading
    • Batch operations
    • Installing the pycassa module
    • Using Cassandra from the pycassa module
    • Connecting
    • Reading
    • Writing
    • Batch operations
    • Indexing in Cassandra
    • Automatic vs Rolling your own
    • Using Composite Columns
    • Setting them up from the CLI
    • How to using them from pycassa
    • Lessons learned

    At 2:40pm to 3:20pm, Friday 9th March

    In E2, Santa Clara Convention Center

  • Decorators and Context Managers

    by Dave Brondsema

    Learn how decorators and context managers work, see several popular examples, and get a brief intro to writing your own. Decorators wrap your functions to easily add more functionality. Context managers use the 'with' statement to make indented blocks magical. Both are very powerful parts of the python language; come learn how to use them in your code.

    Decorators wrap or modify your functions. They are functions themselves, so you can pass parameters to them. You can wrap a function with multiple decorators too. We'll cover how to use decorators in all those situations and how to write them too. Examples will be demonstrated with:

    • @property
    • @memoize
    • mock/patch
    • TurboGears
    • Django
    • Allura

    Context managers wrap a block of code using the 'with' statement to do something on the way into the block and something on the way out. Opening and closing a file is a very common case, but there is a lot more you can do. Examples include:

    • modifying state
    • capturing stdout
    • mock/patch
    • locks
    • timing
    • transactions

    More about decorators: http://www.python.org/dev/peps/p...

    More about context managers: http://www.python.org/dev/peps/p...

    At 3:20pm to 3:50pm, Friday 9th March

    In E3, Santa Clara Convention Center

  • Putting Python in PostgreSQL

    by Frank Wiles

    PostgreSQL is pretty powerful all on it's own, but did you know you can use Python as a stored procedure language? Not only does using a familiar language make development easier, but you get the power of the standard library and PyPi to boot. Come learn the ins and outs of putting Python in your DB.

    Pushing logic in your database can be a blessing or a curse. While these techniques aren't appropriate for most users in most situations, it's good to know what kind of power you have at your disposal with PL/Python if the need ever arises.

    Learn about how to: - Use triggers to off load data processing tasks - Fire off email and log alerts based on what is happening in your tables - Create more granular and detailed constraints such as verifying email address or credit card checksums at the database level - Replace batch jobs with in database triggers - Learn when and when NOT to use these techniques - Debugging techniques - Security concerns

    At 3:20pm to 4:05pm, Friday 9th March

    In E2, Santa Clara Convention Center

  • Introspecting Running Python Processes

    by Adam Lowry

    Understanding the internal state of a running system can be vital to maintaining a high performance, stable system, but conventional approaches such as logging and error handling only expose so much. This talk will touch on how to instrument Python programs in order to observe the state of the system, measure performance, and identify ongoing problems.

    Something is wrong with your web application. The time it’s taking to serve requests is growing. Your logs don’t contain enough. Your database appears bored. How do you know what’s going wrong?

    In high-performance production servers it’s vital to know as much about the internals of your system as possible. Traditionally this is done by simple methods like logging anything of potential interest or sending error emails with unexpected exceptions. These methods are insufficient, both due to the level of noise inherent in such systems and because of the difficulty in anticipating what metrics are important during an incident.

    Environments such as the JVM and .Net VM have advanced tools for communicating with the VM and for applications to expose internal state, but CPython has lacked similar tooling.

    This talk will cover what options CPython application developers have for introspecting their programs; new tools for instrumenting, exposing, and compiling performance and behavior metrics; and techniques for diagnosing runtime issues without restarting the process.

    At 5:20pm to 6:00pm, Friday 9th March

    In E1, Santa Clara Convention Center

Saturday 10th March 2012

  • Django Templating: More Than Just Blocks

    by Christine Cheung

    Django's template language is designed to strike a balance between power and ease of use; learn how to use this balance to create awesome looking websites. This talk will cover the basics and best practices of Django templating, from custom tag and filter creation, to the finer points of template rendering and loading, and even to replacing the default templating engine itself.

    Harness the power of Django templates to help present your data with ease! Learn about:
    Basic block formations, common patterns, and using includes wisely.
    Tips and tricks in using the built-in template tags and filters.
    How to make custom tags and filters: examples, what you should and shouldn’t do, and tools to help the process such as django-classy-tags.
    Different ways to load and render templates.
    Replacing Django’s default template language: pros and cons

    At 10:25am to 11:05am, Saturday 10th March

    In E4, Santa Clara Convention Center

  • Django Form Processing Deep Dive

    by Nathan Yergler

    Django Form processing often takes a back seat to flashier, more visible parts of the framework. But Django forms, fully leveraged, can help developers be more productive and write more cohesive code. This talk will dive deep into the stock Django forms package, as well as discuss a strategy for abstracting validation for forms, and the use of unit and integration tests with forms.

    Django Form processing often takes a back seat to flashier, more visible parts of the framework. But Django forms are an integral part of the framework that can help developers be more productive and write more cohesive, well tested code. This talk will dive deep into the stock Django forms package, providing an examples of:

    • custom validation and validation patterns
    • processing multiple forms at once (form sets)
    • persisting validated form data to models (model forms)

    We'll also discuss ways to build on Django forms, including:

    * writing unit and integration tests for forms, and how writing tests can help you understand code cohesion
    & abstracting validation for forms to provide tiered validation (for example, one set of criteria to save, additional criteria to publish)
    * approaches to working with multiple, heterogeneous forms simultaneously

    At 11:05am to 11:45am, Saturday 10th March

    In E4, Santa Clara Convention Center

  • Pragmatic Unicode, or, How do I stop the pain?

    by Ned Batchelder

    Python has great Unicode support, but it's still your responsibility to handle it properly. I'll do a quick overview of what Unicode is, but only enough to get your program working properly. I'll describe strategies to make your code work, and keep it working, without getting too far afield in Unicode la-la-land.

    Python has great Unicode support, but it's still your responsibility to handle it properly. Even expert programmers get tripped up with the encodings and decodings that can happen implicitly, throwing errors in unexpected places.

    This talk will present a quick overview of what Unicode is, why it exists, and how it works, but only enough to get your program working properly. Unicode can be intricate and fascinating, but really, who cares? You just want your code to work without throwing a UnicodeEncodeError every time an accented character sneaks in somehow.

    I'll describe strategies to make your code work, and keep it working, without getting too far afield in Unicode la-la-land.

    How Unicode is handled is one of the biggest changes in Python 3. I'll touch on what those changes are, and how you can use them to keep even your Python 2 code running smoothly.

    Bytes vs. text
    ASCII, 8859-1, etc.
    Python 2: str vs unicode
    encode and decode
    implicit conversions!!
    Python 3: bytes vs str
    Everybody's happy!

    At 11:45am to 12:15pm, Saturday 10th March

    In E3, Santa Clara Convention Center

  • Testing and Django

    by Carl Meyer

    A deep dive into writing tests with Django, covering Django's custom test-suite-runner and the testing utilities in Django, what all they actually do, how you should and shouldn't use them (and some you shouldn't use at all!). Also, guidelines for writing good tests (with or without Django), and my least favorite things about testing in Django (and how I'd like to fix them).

    Django has a fair bit of custom test code: a custom TestSuiteRunner, custom TestCase subclasses, some test-only monkeypatches to core Django code, and a raft of testing utilities. I'll cover as much of that code as I find interesting and non-trivial, taking a close look at what it's actually doing and what that means for your tests.

    This will be a highly opinionated talk. There are some things in Django's test code I really don't like; I'll talk about why, and how I'd like to see them changed. As a natural part of this, I'll also be outlining some principles I try to follow for writing effective and maintainable tests, and note where Django makes it easy or hard.

    This is an "extreme" talk, so I'll be assuming you've used Django and done some testing, and you're familiar with the basic concepts of each. This won't be an introductory "testing with Django" howto.

    At 11:45am to 12:30pm, Saturday 10th March

    In E4, Santa Clara Convention Center

  • A Gentle Introduction to GIS

    by Jason Scheirer

    Datums! Coordinate systems! Map projections! Topologies! Spatial applications are a nebulous, daunting concept to most Pythonistas. This talk is a gentle introduction into the concepts, terminology and tools to demystify the world of the world.

    This talk will have multiple parts:

    • What is GIS?
    • Cartography
    • Analysis
    • Geodatabases
    • Types of data
    • Vector
    • Raster
    • How do you talk about spots on earth?
    • What is a datum and why does it matter?
    • Spatial reference systems: WUT?
    • Map projections: what's appropriate and what's web mercator
    • Great circle distances and geodesy (or, your lines are wrong and you should feel bad)
    • Analysis
    • Overlay operators
    • Clementini operators
    • Buffer, clip their friends
    • Will demonstrate concepts with commercial tools (the ArcGIS stack) and some open source (GDAL, OGR, QGIS).

    At 1:20pm to 2:15pm, Saturday 10th March

    In E3, Santa Clara Convention Center

  • Web Server Bottlenecks And Performance Tuning

    by Graham Dumpleton

    New Python web developers seem to love running benchmarks on WSGI servers. Reality is that they often have no idea what they are doing or what to look at. This talk will look at a range of factors which can influence the performance of your Python web application. This includes the impact of using threads vs processes, number of processors, memory available, the GIL and slow HTTP clients.

    A benchmark of a hello world application is often what developers use to make the all important decision of what web hosting infrastructure they use. Worse is that in many cases this is the only sort of performance testing or monitoring they will ever do. When it comes to their production applications they are usually flying blind and have no idea of how it is performing and what they need to do to tune their web application stack.

    This talk will discuss different limiting factors or bottlenecks within your WSGI server stack and system that can affect the performance of your Python web application. It will illustrate the impacts of these by looking at typical configurations for the more popular WSGI hosting mechanisms of Apache/mod_wsgi, gunicorn and uWSGI, seeing how they perform under various types of traffic and request loads and then tweaking the configurations to see whether they perform better or worse.

    Such factors that will be discussed will include:

    Use of threads vs processes.
    Number of processors available.
    Python global interpreter lock (GIL)
    Amount of memory available.
    Slow HTTP browsers/clients.
    Browser keep alive connections.
    Need to handle static assets.
    From this an attempt will be made to provide some general guidelines of what is a good configuration/architecture to use for different types of Python web applications. The importance of continuous production monitoring will also be covered to ensure that you know when the performance of your system is dropping off due to changing traffic patterns as well as code changes you have made in your actual web application.

    At 1:35pm to 2:15pm, Saturday 10th March

    In E4, Santa Clara Convention Center

  • Militarizing Your Backyard with Python: Computer Vision and the Squirrel Hordes

    by Kurt Grandis

    Has your garden been ravaged by the marauding squirrel hordes? Has your bird feeder been pillaged? Tired of shaking your fist at the neighbor children? Learn how to use Python to tap into computer vision libraries and build an automated sentry water cannon capable of soaking intruders.

    Using the Python bindings for the computer vision library, OpenCV, we will investigate the components and steps needed to power a sentry gun. In addition to basic object and motion tracking, concepts of object recognition (friend or foe) will be discussed. Communication and control of the underlying hardware is performed using Python and will also be covered.

    Additional peace-time applications of the above technology will be demonstrated.

    • OpenCV
    • Object detection
    • Motion tracking
    • Friend-or-foe object recognition
    • Stereo vision
    • Building training sets with Amazon Mechanical Turk
    • Python-Arduino communication
    • Python + Kinect

    At 2:15pm to 2:55pm, Saturday 10th March

    In D5, Santa Clara Convention Center

  • RESTful APIs With Tastypie

    by Daniel Lindsley

    Providing full-featured REST APIs is an increasingly popular request. Tastypie allows you to easily implement a customizable REST API for your Python or Django applications.

    Who am I? (Primary author of Tastypie)
    Why REST?
    A touch of philosophy

    Use HTTP the best we can
    Flexible serialization (not everyone wants JSON)
    What you can GET should be able to be POST/PUT
    Should be reasonable by default but easy to extend
    URIs Everywhere!
    Why Tastypie?

    Works with Django
    Any data source (Not just ORM)
    Designed to be extensible
    Supports a variety of serialization formats (JSON/XML/YAML/bplist)
    URIs everywhere by default
    Lots of hooks for customization
    Demonstrate a simple setup

    Then explore the API based on that trivial setup
    Demonstrate adding authentication/authorization

    Demonstrate adding custom serialization
    Demonstrate adding a different data source
    Demonstrate adding a custom endpoint

    At 2:15pm to 2:55pm, Saturday 10th March

    In E4, Santa Clara Convention Center

  • Storm: the Hadoop of Realtime Stream Processing

    by Gabriel Grant

    Twitter's new scalable, fault-tolerant, and simple(ish) stream programming system... with Python!

    Storm is a high-volume, continuous, reliable stream processing system developed at BackType and recently open-sourced by Twitter. Though most of the system (and it's documentation) is written in Java-based languages, it is possible to use in a Python environment with Python-based analysis code. At DotCloud (our application-platform-as-a-service) we're doing just that, and we'll be showing how you can too.

    We collect a lot of data: we have tens of thousands of customers, many of whom have dozens of services running on our platform, each of which in turn produces dozens of metrics every second. All in all, we're dealing with millions of datapoints per minute. Storm will be the third iteration of our metrics system, an attempt at standardizing a number of previously-distinct pieces of our infrastructural software, to enable automated, real-time reactions to changes in the platform's state.

    We'll start by touching on what problems Storm is (and isn't) trying to solve and why it's model is so powerful, informed by our previous attempts to solve the stream processing problem. We'll then move on to a deep dive into how to get Storm up and running with the most Python and least Java-enduced-pain possible and finish up with tips to solve some of the challenges we've encountered while adopting Storm into our Python-based development process.

    The Problem:

    What is stream processing?
    High volume, Continuous, Reliable data analysis
    How do people solve this today?
    The Solution:

    Storm's overall model
    Automatic parts
    Why is this solution better?
    The Hard Part, Made Simple:

    Build a topology
    Code your processors
    The Simple Part, Made Hard (made simple):

    Java (ugh)
    Clojure (less ugh)

    At 2:15pm to 2:55pm, Saturday 10th March

    In E2, Santa Clara Convention Center

  • Spatial data and web mapping with Python

    by Paul Smith

    Spatial data are often seen as opaque to most developers, and while dealing with them does require a shift in approach from the data types we most regularly handle, they needn’t be the domain of specialists. High-quality Python libraries and Python-based applications exist for operating on and transforming spatial data, and for creating visualizations, including maps for presentation on the web.

    This talk will be an overview of the Python libraries and applications available for handling spatial and geospatial data and creating maps for the web. It will cover libraries for open and transforming spatial data formats and representations, spatial operators and predicates for queries and relationships, spatial indexes for efficient queries, and compositing and rendering map tiles, as well as desktop applications extensible with Python that replace much of the functionality of "enterprise" GIS software.

    At 4:00pm to 4:55pm, Saturday 10th March

    In E3, Santa Clara Convention Center

  • What Python can learn from Java

    by Jonathan Ellis

    Java is in some ways a bogeyman to the Python community -- the language that parents scare their children with, the Cobol of the 21st century. But if we look past the cesspool of JEE it turns out that Java has quietly become an excellent systems environment, one that is still in many ways ahead of its time.

    • Introduction
    • The difference between systems and web (and scientific) computing
    • Why Pythonistas should care about systems programming
    • Concurrency
    • Why "just use multiple processes" is inadequate
    • Why event loops are inadequate
    • java.util.concurrent: low level (collections, synch primitives)
    • j.u.c.: high level (executors, futures, fork/join)
    • The VM
    • State of the art GC
    • Built-in (and extensible) telemetry
    • More functional than Python?!
    • immutability
    • Guava: com.google.*

    At 4:55pm to 5:30pm, Saturday 10th March

    In E1, Santa Clara Convention Center

Sunday 11th March 2012

  • Parsing Horrible Things with Python

    by Erik Rose

    If you've ever wanted to get started with parsers, here's your chance for a ground-floor introduction. A harebrained spare-time project gives birth to a whirlwind journey from basic algorithms to Python libraries and, at last, to a parser for one of the craziest syntaxes out there: the MediaWiki grammar that drives Wikipedia.

    Some languages were designed to be parsed. The most obvious example is Lisp and its relatives which are practically parsed when they hit the page. However, many others—including most wiki grammars—grow organically and get turned into HTML by sedimentary strata of regular expressions, all backtracking and warring with one another, making it difficult to output other formats or make changes to the language.

    We will explore the tools and techniques necessary to attack one of the hairiest lingual challenges out there: MediaWiki syntax. Join me for an introduction to the general classes of parsing algorithms, from the birth of the field to the state of the art. Learn how to pick the right one. Have a comparative look at a dozen different Python parsing toolkits. And finally, learn some optimization tricks to get a grammar going at a reasonable clip.

    At 12:00pm to 12:30pm, Sunday 11th March

    In E3, Santa Clara Convention Center

  • Sketching a Better Product

    by Idan Gazit

    If writing is a means for organizing your thoughts, then sketching is a means for organizing your thoughts visually. Just as good writing requires drafts, good design requires sketches: low-investment, low-resolution braindumps. Learn how to use ugly sketching to iterate your way to a better product.

    Building a better mousetrap is rarely accomplished in one zenlike moment of clarity—like all creative processes, it is iterative. Sketches are those iterations for things we can see and interact with. They are the key tool used to explore ideas and decide if they have merit; they are just as important in deciding what not to pursue.

    A good sketch is crude and fast. It isn’t necessarily pretty, and more often than not, it consists of just boxes and lines, rarely with color. It doesn’t matter what tools you use, so long as you can do it fast and get your ideas out where everybody can see them. It doesn’t matter if you can’t draw, because everybody can draw boxes and lines.

    This talk will cover sketching as a tool in the interface design process, including both the why and the how of sketches. It will include practical techniques for sucking less at making the kind of sketches that are useful for decision making, as well as tips on simple sketching methods to make it feel like an interface.

    At 12:00pm to 12:30pm, Sunday 11th March

    In D5, Santa Clara Convention Center

  • Building A Python-Based Search Engine

    by Daniel Lindsley

    Search is an increasingly common request in all types of applications as the amount of data all of us deal with continues to grow. The technology/architecture behind search engines is wildly different from what many developers expect. This talk will give a solid grounding in the fundamentals of providing search using Python to flesh out these concepts in a simple library.

    • Core concepts
    • Terminology
    • Document-based
    • Show basic starting code for a document
    • Inverted Index
    • Show a simple inverted index class
    • Stemming
    • N-gram
    • Show a tokenizer/n-gram processor
    • Fields
    • Show a document handler which ties it all together
    • Searching
    • Show a simple searcher (& the whole thing working together)
    • Faceting (likely no demo)
    • Boost (likely no demo)
    • More Like This
    • Wrap up

    At 1:30pm to 2:10pm, Sunday 11th March

    In E3, Santa Clara Convention Center

  • Deep Freeze: building better stand-alone apps with Python

    by Ryan Kelly

    There's more to shipping a stand-alone python app than just running py2exe over your code. Want to deploy automatic updates? Want to be sure it runs on legacy platforms? Want to add professional touches like code signing? And want to do this all in a cross-platform manner? This talk will show you the tools you can use to make your frozen apps better in a variety of small yet important ways.


    Python has a powerful and mature suite tools of tools for "freezing" your python scripts into a stand-alone application, including py2exe, py2app, cxfreeze, bbfeezee and PyInstaller. But there's more to shipping a stand-alone app than just running py2exe over your code.

    This talk will show you the tools you can use to make your frozen apps better in a variety of small yet important ways.

    Want to deploy automatic updates? The "esky" package provides a simple API for building, publishing and installing updates, and jumps through all the hoops needed to ensure failed updates won't leave your program unusable.

    Need to run on older versions of OSX or ancient linux boxes? The "myppy" package can build a python runtime optimized for portable deployment and binary compatibility with older systems.

    Want to add code-signing for that professional touch? The "signedimp" package provides cross-platform hooks for code signing and extends the protection to code loaded at runtime.

    Each of these tools has been extracted from a real-life build process for a complex cross-platform application, and each is designed to help make your frozen applications just that little bit better.

    At 1:30pm to 2:10pm, Sunday 11th March

    In D5, Santa Clara Convention Center