Sessions at PyCon US 2012 in E4

Your current filters are…

  • Space: E4

Friday 9th March 2012

  • Graph Processing in Python

    by Van Lindberg

    Graphs are everywhere - from your distributed source code control to Twitter analytics. This session presents a set of three problems and shows how they can be decomposed into operations on graphs, and then demonstrates solutions using the various graph libraries available for (or accessible to) Python.

    Graphs are a fundamental computer science datatype, and graphs show up in all sorts of models in all sorts of places. So when you have a graph, what can you do with it? Particularly if it is really big?

    Thirty minutes isn't a lot of time to discuss graph processing as a topic, so there won't be a lot of discussion relative to graph theory generally or the terminology of graphs. Instead, this is inspired by Raymond Hettinger's "mastering team play" - a series of exercises showing the lowering of a problem into a graph representation, followed by a demonstration of how the problem can be solved through graph processing. There will also be a little bit of compare-and-contrast between the available graph libraries to show differences. Each problem will be given 8-10 minutes.

    Problem 1: Python's (legal) history
    Python has developed over time under a number of organizations - each with their own license. What portions of Python's codebase are under each license?

    The CVS/SVN/HG trees as graphs modeling change in time
    Identifying and labeling node types
    Graphing and reporting on results
    Problem 2: Development Cliques
    Linux is famously developed with "lieutenants" in charge of different subsystems of the kernel. Python doesn't have lieutenants... or does it? Put another way, if you have a patch, who should you submit it to?

    Mailing list connections as a graph
    Analysis of connections, cliques, and centrality
    Graphing and reporting on results
    Problem 3: Let's get social
    Your employer has decided that its website should be turned into a social network - you know, because there aren't enough of those.

    Bootstrapping a graph by looking at pairwise analysis of products
    How to suggest who people "might know"?

    At 10:50am to 11:30am, Friday 9th March

    In E4, Santa Clara Convention Center

    Coverage video

  • How to make your websites more accessible

    by Robbie Clemons

    Is your website accessible? Have you tested it? What does it even mean for a website to be accessible? In this talk we'll show some of the most common problems disabled users have and demonstrate how to fix them. I'll also introduce you to some tools that are written in Python to help you determine how accessible your site is.

    There are several different types of Assistive Technology designed to help disabled users access the web. However a lot of websites don't work well with some Assistive Technologies and in this talk we'll show how to uncover the accessibility problems in a website and fix them. I'll provide a demonstration by using some common Assistive Technology on a few different example websites built with Python web frameworks that are problematic and then I'll show the modifications necessary to make each website accessible.

    At 11:30am to 12:10pm, Friday 9th March

    In E4, Santa Clara Convention Center

    Coverage video

  • Data, Design, Meaning

    by Idan Gazit

    The ultimate goal of data visualization is to tell a story and supply meaning. There are tools and science that can inform your choice of data to present and how best to present it. We reflexively evaluate data and fit it into a narrative which aids decisionmaking; learn how to take advantage of this tendency in order to deliver meaning, not just numbers and charts.

    Data visualization is a hot field right now—and for good reason. In our age of info-saturation, true value is found in distilling large amounts of data into a form that is easy to comprehend and act upon. This talk provides an overview of tools and techniques which you can use to level up your data presentation, regardless of application.

    As humans, we are adept at evaluating visual information. From an early age, we learn to make inferences about things based on their visual properties—large and small, near and far, motion, direction, and other attributes. Taking advantage of the visual process we’ve been practicing since birth is an easy way to optimize delivery of your data into the brains of your audience.

    Unfortunately, it isn’t enough to appeal to the part of our brains responsible for figuring out whether we can successfully hit an animal with a rock. A great visualization must appeal to our sense of beauty. Structure, layout, typography, and color are all tools which can be used (and abused) to delight your audience and direct their attention where you want it to go.

    Whether you’re building an information dashboard for a webapp or presenting scientific data, an understanding of these techniques will make your data more accessible to your audience, and more of a delight to read and learn from.

    At 12:10pm to 12:55pm, Friday 9th March

    In E4, Santa Clara Convention Center

  • Build reliable, traceable, distributed systems with ZeroMQ

    by Jérôme Petazzoni

    We will show how to build simple yet powerful RPC code with ZeroMQ, with very few (if any!) modification to existing code. We will build fan-in and fan-out topologies with ZeroMQ special socket types to implement PUB/SUB patterns and scale up job-processing tasks. Thanks to introspection, the resulting services will be self-documented. Finally, we will show how to implement distributed tracing.

    We will show how to leverage ZeroMQ to build a simple yet powerful RPC for Python code. We will focus on simplicity, the goal being to expose almost any Python module or class to network calls – with very few (if any!) modification to existing code.

    We will then explain the purpose and show some use-cases for ZeroMQ special socket types (PUSH/PULL, PUB/SUB, ROUTER/DEALER) to build fan-in and fan-out topologies, as well as asynchronous processing (to avoid blocking when doing long-running requests). A by-product is the ability to scale up job-processing tasks with a message queue, which can even be made broker-less (you don’t have to deploy heavy machinery if you don’t need it).

    We will also demonstrate how introspection can make development and debugging easier, exposing docstrings, and provideing a few command-line helpers to poke, debug, and experiment directly from the shell.

    At the end of the talk (or in a separate talk), we will explain how to implement a tracing framework for distributed RPC. By hooking into the right places, we will show how to get full tracebacks and profiling information; more precisely:

    how each complex call (involving multiple subcalls) can be accurately traced;
    how to handle exceptions, and know easily when and where they happened (without checking dozens of log files);
    which complex calls take too long, and where they spend their time (distributed profiling).
    Those guidelines are the result of an on-going development work at dotCloud, and actively used and implemented at the core of our leading Platform-as-a-Service offering.

    We don’t expect the audience to be familiar with ZeroMQ or RPC. However, it will certainly help to have basic knowledge of serialization (e.g. pickle) and sockets.

    At 2:00pm to 2:40pm, Friday 9th March

    In E4, Santa Clara Convention Center

  • Throwing Together Distributed Services With Gevent

    by Jeff Lindsay

    In this talk we learn how to throw together a distributed system using gevent and a simple framework called gservice. We'll go from nothing to a distributed messaging system ready for production deployment based on experiences building scalable, distributed systems at Twilio.

    As some have found, gevent is one of the best kept secrets of Python. It gives you fast, evented network programming without messes of callbacks, code that is more Pythonic, and lets you use most regular Python networking libraries and protocol implementations. Now, let's build on this.

    In this talk we learn how to throw together distributed services using gevent and a simple framework called gservice. We'll go from nothing to a distributed messaging system based on experiences building scalable, distributed systems at Twilio.

    This talk will be full of code, live coding, and real production applications with guest appearances by other fun technologies like ZeroMQ, WebSocket, and Doozer.

    At 2:40pm to 3:20pm, Friday 9th March

    In E4, Santa Clara Convention Center

    Coverage video

  • Static analysis of Python extension modules using GCC

    by Dave Malcolm

    Want to analyse C/C++ code using Python? I've written a plugin for GCC that embeds Python inside the compiler, allowing you to write new C/C++ compilation passes in Python. I've used this to build a static analysis tool that understands the CPython extension API, and can automatically detect reference-counting bugs, and other errors.

    I've written a plugin for GCC that embeds Python inside the compiler, allowing you to write new C/C++ compilation passes in Python.

    I've used this to build a static analysis tool that understands the CPython extension API, and can automatically detect various errors (e.g. reference counting mistakes).

    I'll be talking about how to use the GCC plugin to analyse C and C++ code with Python scripts, and giving a guided tour of the static analysis tool on some real-world Python extension modules.

    At 3:20pm to 4:05pm, Friday 9th March

    In E4, Santa Clara Convention Center

  • Non-Profit Centers of FLOSS Development

    by Bradley M. Kuhn

    Free, Libre & Open Source Software (FLOSS) began as a not-for-profit endeavor. FLOSS licenses permit commercial & non-commercial activity, but the heart of FLOSS remains in the not-for-profit space. Kuhn will discuss advantages of non-profit structure and how non-profits facilitate neutral territory. Kuhn will also present options for projects that seek to operate officially as a non-profit org.

    At 4:40pm to 5:20pm, Friday 9th March

    In E4, Santa Clara Convention Center

    Coverage video

  • A resume-based WSGI Load Balancer

    by Jim Fulton

    When a web application is large, it's a good idea to send different kinds of requests to different servers to reduce the content corpus managed by each server. A decentralized load balancing-based approach is presented in which each application server keeps track of what it's good at and presents its resume to load balancers, which use application server resumes to distribute load.

    We host newspaper web sites and content management systems for several hundred newspapers. The working set for all newspapers is too large to be effectively managed by individual application servers. Manual distribution of load is inflexible and ineffective. We created a resume-based dynamic and decentralized load balancer that distributed work to application servers in a way that greatly reduces the working set on each server.

    Outline:

    Problem
    Previous work
    Architecture
    Request Classification
    Workers maintain their own resumes
    LB serves work to workers (and responses to browsers)
    Multiple load balancers
    WSGI integration
    Results
    Compare work distribution before and after
    Compare database cache utilization before and after
    Work distribution as workers are added and removed
    Limitations

    At 5:20pm to 6:00pm, Friday 9th March

    In E4, Santa Clara Convention Center

Saturday 10th March 2012

  • Django Templating: More Than Just Blocks

    by Christine Cheung

    Django's template language is designed to strike a balance between power and ease of use; learn how to use this balance to create awesome looking websites. This talk will cover the basics and best practices of Django templating, from custom tag and filter creation, to the finer points of template rendering and loading, and even to replacing the default templating engine itself.

    Harness the power of Django templates to help present your data with ease! Learn about:
    Basic block formations, common patterns, and using includes wisely.
    Tips and tricks in using the built-in template tags and filters.
    How to make custom tags and filters: examples, what you should and shouldn’t do, and tools to help the process such as django-classy-tags.
    Different ways to load and render templates.
    Replacing Django’s default template language: pros and cons

    At 10:25am to 11:05am, Saturday 10th March

    In E4, Santa Clara Convention Center

  • Django Form Processing Deep Dive

    by Nathan Yergler

    Django Form processing often takes a back seat to flashier, more visible parts of the framework. But Django forms, fully leveraged, can help developers be more productive and write more cohesive code. This talk will dive deep into the stock Django forms package, as well as discuss a strategy for abstracting validation for forms, and the use of unit and integration tests with forms.

    Django Form processing often takes a back seat to flashier, more visible parts of the framework. But Django forms are an integral part of the framework that can help developers be more productive and write more cohesive, well tested code. This talk will dive deep into the stock Django forms package, providing an examples of:

    • custom validation and validation patterns
    • processing multiple forms at once (form sets)
    • persisting validated form data to models (model forms)

    We'll also discuss ways to build on Django forms, including:

    * writing unit and integration tests for forms, and how writing tests can help you understand code cohesion
    & abstracting validation for forms to provide tiered validation (for example, one set of criteria to save, additional criteria to publish)
    * approaches to working with multiple, heterogeneous forms simultaneously

    At 11:05am to 11:45am, Saturday 10th March

    In E4, Santa Clara Convention Center

  • Testing and Django

    by Carl Meyer

    A deep dive into writing tests with Django, covering Django's custom test-suite-runner and the testing utilities in Django, what all they actually do, how you should and shouldn't use them (and some you shouldn't use at all!). Also, guidelines for writing good tests (with or without Django), and my least favorite things about testing in Django (and how I'd like to fix them).

    Django has a fair bit of custom test code: a custom TestSuiteRunner, custom TestCase subclasses, some test-only monkeypatches to core Django code, and a raft of testing utilities. I'll cover as much of that code as I find interesting and non-trivial, taking a close look at what it's actually doing and what that means for your tests.

    This will be a highly opinionated talk. There are some things in Django's test code I really don't like; I'll talk about why, and how I'd like to see them changed. As a natural part of this, I'll also be outlining some principles I try to follow for writing effective and maintainable tests, and note where Django makes it easy or hard.

    This is an "extreme" talk, so I'll be assuming you've used Django and done some testing, and you're familiar with the basic concepts of each. This won't be an introductory "testing with Django" howto.

    At 11:45am to 12:30pm, Saturday 10th March

    In E4, Santa Clara Convention Center

  • Web Server Bottlenecks And Performance Tuning

    by Graham Dumpleton

    New Python web developers seem to love running benchmarks on WSGI servers. Reality is that they often have no idea what they are doing or what to look at. This talk will look at a range of factors which can influence the performance of your Python web application. This includes the impact of using threads vs processes, number of processors, memory available, the GIL and slow HTTP clients.

    A benchmark of a hello world application is often what developers use to make the all important decision of what web hosting infrastructure they use. Worse is that in many cases this is the only sort of performance testing or monitoring they will ever do. When it comes to their production applications they are usually flying blind and have no idea of how it is performing and what they need to do to tune their web application stack.

    This talk will discuss different limiting factors or bottlenecks within your WSGI server stack and system that can affect the performance of your Python web application. It will illustrate the impacts of these by looking at typical configurations for the more popular WSGI hosting mechanisms of Apache/mod_wsgi, gunicorn and uWSGI, seeing how they perform under various types of traffic and request loads and then tweaking the configurations to see whether they perform better or worse.

    Such factors that will be discussed will include:

    Use of threads vs processes.
    Number of processors available.
    Python global interpreter lock (GIL)
    Amount of memory available.
    Slow HTTP browsers/clients.
    Browser keep alive connections.
    Need to handle static assets.
    From this an attempt will be made to provide some general guidelines of what is a good configuration/architecture to use for different types of Python web applications. The importance of continuous production monitoring will also be covered to ensure that you know when the performance of your system is dropping off due to changing traffic patterns as well as code changes you have made in your actual web application.

    At 1:35pm to 2:15pm, Saturday 10th March

    In E4, Santa Clara Convention Center

  • RESTful APIs With Tastypie

    by Daniel Lindsley

    Providing full-featured REST APIs is an increasingly popular request. Tastypie allows you to easily implement a customizable REST API for your Python or Django applications.

    Who am I? (Primary author of Tastypie)
    Why REST?
    A touch of philosophy

    Use HTTP the best we can
    Flexible serialization (not everyone wants JSON)
    What you can GET should be able to be POST/PUT
    Should be reasonable by default but easy to extend
    URIs Everywhere!
    Why Tastypie?

    Works with Django
    Full GET/POST/PUT/DELETE/PATCH
    Any data source (Not just ORM)
    Designed to be extensible
    Supports a variety of serialization formats (JSON/XML/YAML/bplist)
    URIs everywhere by default
    Lots of hooks for customization
    Demonstrate a simple setup

    Then explore the API based on that trivial setup
    Demonstrate adding authentication/authorization

    Demonstrate adding custom serialization
    Demonstrate adding a different data source
    Demonstrate adding a custom endpoint

    At 2:15pm to 2:55pm, Saturday 10th March

    In E4, Santa Clara Convention Center

  • Advanced Celery

    by Ask Solem

    This talk will delve deep into advanced aspects of the Celery task queue and ecosystem. Previous experience with task queues and message oriented middleware is beneficial.

    Tasks
    We will look at task examples and rewrite them to better fit the distributed paradigms.

    Celery + Eventlet

    Task Routing

    direct/broadcast/topic
    Optimization techniques.

    Monitoring and troubleshooting.

    Logging (syslog, sentry, error-emails).
    Events.
    Tracing memory leaks.
    Writing a Celery worker in Ruby using celeryd as a proxy.

    RabbitMQ

    Clustering and HA
    Introducing Cyme

    Q/A

    At 2:55pm to 3:40pm, Saturday 10th March

    In E4, Santa Clara Convention Center

    Coverage video

  • Building a Robot that Can Play Angry Birds on a Smartphone, (or Robots are the Future of Testing)

    by Jason Huggins

    Can your robot play Angry Birds? On an iPhone? Mine can. I call it "BitbeamBot". It started as an art project, but it has a much more serious practical application: mobile web testing. To trust that your mobile app truly works, you need an end-to-end test on the actual device. BitbeamBot is an Arduino-powered open-source hardware CNC robot that can test any application on any mobile device.

    For the confidence that your mobile app truly works, you need an end-to-end test on an actual device. This means the full combination of device manufacturer, operating system, data network, and application. And since mobile devices were meant to be handled with the human hand, you need something like a real hand to do real end-to-end testing. At some point, after lots of repetitive manual testing, the inevitable questions is asked "Can we / should we automate the testing of the old features, so I can focus the manual testing effort on the new features?"

    That's where the BitbeamBot comes in. BitbeamBot is an Arduino-powered open-source hardware CNC robot that can test any application on any mobile device -- touching the screen of a mobile device just like a user would. It also uses Python and the Selenium automation API to work its magic. In the future your testing will be automated... with robots.

    At the moment, BitbeamBot is just a prototype, but it can play games with simple mechanics, like Angry Birds. However, it's not very smart; it can't yet "see" where objects are on the screen. From my computer, I send electrical signals to two motors to move the pin over any point on an iPhone screen. I then use a third motor to move the pin down to the screen surface and click or drag objects. This open loop, flying-blind approach to automation is how automated software testing was done years ago. It's the worst way to automate. Without any sense of what's actually visible on screen, the script will fail when there's a discrepancy between what's actually on the screen and what you expected to be on screen at the time you wrote the automation script.

    A better approach to testing with BitbeamBot will involve closing the feedback loop and have software determine where to click based on what is actually on screen. There are two styles I'll experiment with: black box and grey box. Black box testing is easier to get started with, but grey box testing is more stable long term.

    Black box testing requires no internal knowledge of the application. It treats the application like a metaphorical "black box". If something is not visible via the user interface, it doesn't exist. To get this approach to work with BitbeamBot, I'll place a camera over the phone, send the image to my computer, and use an image-based testing tool like Sikuli. Sikuli works by taking a screenshot and then using the OpenCV image recognition library to find elements like text or buttons in the screenshot.

    The other style is grey box testing. It's a more precise approach, but it requires access to and internal knowledge of the application. I'll implement this approach by extending the Selenium library and tethering the phone to my computer via a USB cable. With the USB debugging interface turned on, I can ask the application precisely which elements are on screen, and where they are before moving the BitbeamBot pin to the right location.

    BitbeamBot's home is bitbeam.org. The hardware, software, and mechanical designs are open source and available on github.

    At 4:15pm to 4:55pm, Saturday 10th March

    In E4, Santa Clara Convention Center

  • The Pyed Piper: A Modern Python Alternative to awk, sed and Other Unix Text Manipulation Utilities

    by Toby Rosen

    "The Pyed Piper", or pyp, is a linux command line text manipulation tool similar to awk or sed, but which uses standard python string and list methods as well as custom functions evolved to generate fast results in an intense production environment.

    Pyp is a linux command line text manipulation tool similar to awk or sed, but which uses standard python string and list methods as well as custom functions evolved to generate fast results in an intense production environment. Pyed Pyper was developed at Sony Pictures Imageworks to facilitate the construction of complex image manipulation unix commands during visual effects work on Alice in Wonderland, Green Lantern, and the upcoming The Amazing Spiderman.

    Because pyp employs its own internal piping syntax ("|") similar to unix pipes, complex operations can be proceduralized by feeding the output of one python command to the input of the next. This greatly simplifies the generation and troubleshooting of multistep operations without the use of temporary variables or nested parentheses.

    pyp output has been optimized for typical production scenarios. For example, if text is broken up into an array using the "split()" method, the output will be automatically numbered by field making selecting a particular field trivial. Numerous other conveniences have been included, such as an accessible history of all inter-pipe sub-results, an ability to perform mathematical operations, and a complement of variables based on common metacharcter split/join operations.

    For power users, commands can be easily saved and recalled from disk as macros, providing an alternative to quick and dirty scripting. For the truly advanced user, additional methods can be added to the pyp class via a config file, allowing tight integration with larger facilities data structures or custom toolsets.

    At 4:55pm to 5:30pm, Saturday 10th March

    In E4, Santa Clara Convention Center

    Coverage video

Sunday 11th March 2012

  • Patterns for building large Pyramid applications

    by Carlos de la Guardia

    Pyramid is a very flexible framework, but when dealing with large projects and multiple developers it pays to establish a few ground rules and follow some conventions. In this talk we'll discuss some patterns for organizing and developing a large Pyramid application.

    Pyramid is a very flexible framework, but when dealing with large projects and multiple developers it pays to establish a few ground rules and follow some conventions.

    Karl is one of the largest Pyramid applications in production. It actually guided the development of repoze.BFG, the ancestor of Pyramid. We'll use the Karl code base to illustrate some of the patterns that were used both to organize the project and deal with a large user base.

    KARL is an open source web system for collaboration, organizational intranets, and knowledge management. Developed by the Open Society Foundations (OSF), it was first introduced to the market in 2008, and is now used by many international organizations, such as OXFAM GB, and OSF.

    Not everything discussed will be based on Karl, though. There are some key questions about how to define a project that are common to any Pyramid development, such as which persistence system to use and whether to go with traversal or URL dispatch. I'll cover a few of those questions on the first segment.

    Among the things that will be covered are:

    Hard questions you need to answer before beginning your project. Pyramid offer many configuration possibilities and I'll go into them briefly. Tips for how to assemble your toolkit, how to choose a persistence backend, whether to use traversal or routes, how to handle authentication and authorization and how to layout the project. (8 min.)

    Views. Pyramid has a very strong view configuration system. I'll cover how to take advantage of it, how to use predicates effectively and how to create custom predicates. People usually find themselves having to pass lots of information to the templates, so I'll discuss an strategy to deal with this. (7 min.)

    Code patterns. I'll single out some useful patterns from karl and discuss them briefly. (8 min.)

    Deployment and maintenance. There are also many options for this (e.g. Nginx or mod_wsgi, use buildout or not, useful deployment tools)and I'll go over them quickly. (5 min.)

    At 12:00pm to 12:30pm, Sunday 11th March

    In E4, Santa Clara Convention Center

    Coverage video

  • web2py: ideas we stole and ideas we had

    by Massimo Di Pierro

    In this talk we will provide an overview of some of the web2py design decisions and its newest features. In particular we will discuss which design decisions were inspired by other frameworks (Django, Turbogears, Flask) and which were not and why.

    In this talk we will provide an overview of some of the web2py design decisions and its newest features. In particular we will discuss which design decisions were inspired by other frameworks (Django, Turbogears, Flask) and which were not and why.

    This talk will be an occasion to acknowledge the importance played by other frameworks in the design of web2py and thank them. It will also be a way to explain the motivation behind some of the controversial design decisions and which unique features in web2py depend on them.

    At 1:30pm to 2:10pm, Sunday 11th March

    In E4, Santa Clara Convention Center

    Coverage video