Get Lanyrd on your mobile (iPhone, Android and more) - check it out here

Sessions at PyCon US 2012 about Python with video

Your current filters are…

Clear

Wednesday 7th March 2012

  • Bayesian statistics made (as) simple (as possible)

    by Allen Downey

    This tutorial is an introduction to Bayesian statistics using Python. My goal is to help participants understand the concepts and solve real problems. We will use material from my book, Think Stats: Probability and Statistics for Programmers (O’Reilly Media).

    Bayesian statistical methods are becoming more common and more important, but there are not many resources to help beginners get started. People who know Python can use their programming skills to get a head start.

    I will present simple programs that demonstrate the concepts of Bayesian statistics, and apply them to a range of example problems. Participants will work hands-on with example code and practice on example problems.

    Students should have at least basic level Python and basic statistics. If you learned about Bayes’s Theorem and probability distributions at some time, that’s enough, even if you don’t remember it! Students should be comfortable with logarithms and plotting data on a log scale.

    Students should bring a laptop with Python 2.x and matplotlib. You can work in any environment; you just need to be able to download a Python program and run it.

    Outline:

    • Bayes’s theorem.
    • Representing probability distributions.
    • Bayesian estimation.
    • Biased coins and student test scores.
    • Censored data.
    • The locomotive / German tank problem.
    • Hierarchical models and the hidden species problem.

    At 9:00am to 12:20pm, Wednesday 7th March

    In D2, Santa Clara Convention Center

    Coverage video

  • Faster Python Programs through Optimization

    by Mike Müller

    This tutorial provides an overview of techniques to improve the performance of Python programs. The focus is on concepts such as profiling, difference of data structures and algorithms as well as a selection of tools and libraries that help to speed up Python.

    Objective

    This tutorial provides an overview of techniques to improve the performance of Python programs. The focus is on concepts such as profiling, diffrence of data structures and algorithms as well as a selection of tools an libraries that help to speed up Python.

    Intended Audience

    Python programmers who would like concepts to improve performance.

    Audience Level

    Programmers with good Python knowledge.

    Prerequisites

    Please bring your laptop with the operating system of your choice (Linux, Mac OS X, Windows). In addition to Python 2.6 or 2.7, we need:

    Method

    This is a hands-on course. Students are strongly encouraged to work along with the trainer at the interactive prompt. There will be exercises the students need to do on their own. Experience shows that this active involvement is essential for an effective learning.

    Outline

    • How fast is fast enough? (10 min)
    • Optimization guidelines (10 min)
    • Premature optimization
    • Optimization rules
    • Seven steps for incremental optimization
    • Optimization strategy (30 min)
    • Measuring in stones
    • Profiling CPU usage
    • Profiling memory usage
    • Algorithms and Anti-patterns (40 min)
    • String concatenation
    • List and generator comprehensions
    • The right data structure
    • Caching
    • The example (5 min)
    • Testing speed (5 min)
    • Pure Python (15 min)
    • Meet Psyco, the JIT (5 min)
    • Using PyPy (15 min)
    • NumPy for numeric arrays (10 min)
    • Using multiple CPUs with multiprocessing (20 min)
    • Combination of optimization strategies (10 min)
    • Results of different example implementations (5 min)

    At 9:00am to 12:20pm, Wednesday 7th March

    In D1, Santa Clara Convention Center

    Coverage video

  • Introduction to Django

    by Chander Ganesan

    The Django framework is a fast, flexible, easy to learn, and easy to use framework for designing and deploying web sites and services using Python. In this session, we'll cover the fundamentals of development with Django, generate a Django data model, and put together a simple web site using the framework.

    • Detailed Tutorial Outline
    • Django Overview and Basic Introduction
    • Downloading & Installing Django
    • Creating a new project
    • Choosing a database
    • Creating a new application
    • Installing & Using Django contrib applications
    • Overview of Django flow (i.e., URLconf expression, view function, HTTPResponse object, etc.)
    • Generating Simple Django Views
    • Configuring a URLConf for basic views
    • Creating Django Templates (template syntax, common filters and tags, loops, etc)
    • Creating & using Template Context objects
    • Introduction to Django Models
    • Defining basic Django models
    • Understanding basic model fields & options
    • Generating & Reviewing Model SQL
    • Adding data to a model
    • Simple data retrieval using models
    • Working with QuerySets (filters, slicing, ordering, common methods)
    • Overview of Q objects)
    • Using the Admin interface
    • Using Generic views
    • Access control with sessions & users

    At 9:00am to 12:20pm, Wednesday 7th March

    In H3, Santa Clara Convention Center

  • SQL for Python Developers

    by Brandon Rhodes

    Relational databases are often the bread-and-butter of large-scale data storage, yet they are often poorly understood by Python programmers. Organizations even split programmers into SQL and front-end teams, each of which jealously guards its turf. These tutorials will take what you already know about Python programming, and advance into a new realm: SQL programming and database design.

    The class will consist of six 25-minute lessons, each of which features a 10-minute lecture, 10 minutes of interesting exercises, and a 5-minute wrap-up in which the instructor recaps the exercises by giving his own answers. The focus will be on keeping things simple so that each building block is grasped clearly. The six lessons will be laid out something like this:

    1. Tables, INSERT, and SELECT.

    • Create a simple sqlite3 table with the DB-API interface provided by the Python Standard Library.
    • Use INSERT to fill the table with data.
    • Concatenate INSERT statements to increase the speed and reduce the number of database round-trips required during a bulk data load.
    • Read back table rows with SELECT.
    • Add dynamic expressions to the rows returned by SELECT.
    • Quote values correct to avoid SQL injection attacks.
    • Avoid “gotchya” differences between Python and SQL data types, with particular attention to Unicode, date-times, and the behavior of NULL verses None.

    2. WHERE and the importance of being indexed.

    • Run quick performance checks that demonstrate that WHERE usually requires the entire table to be read into memory and scanned.
    • Add a simple index to shortcut specific WHERE clauses and return their results more quickly.
    • Check whether an index is being used, and learn several reasons why apparently useful indexes get ignored by the database.
    • Add aggregate indexes that yield performance increases for very specific WHERE clauses.
    • Investigate how our data distrubtion — for example, whether a particular column has thousands of different values, or merely thousands of instances of a handful of values — can impact the wisdom and performance of various query plans.

    3. FOREIGN KEY and JOIN

    • Use a foreign key to relate rows in one table with rows in another.
    • Add JOIN clauses to a SELECT statement to assemble query-result rows that are built from pieces of several tables.
    • Diagnose performance problems with JOIN by observing the cost of full N×M scans that compare every row from one table with every row from another.
    • Think about the indexes that a query plan could take advantage of behind the scenes.
    • Create indexes that let the database take shortcuts when doing common JOINs.

    4. Post-processing.

    • Use ORDER BY to control the rows which are returned first by a given query.
    • Combine OFFSET and LIMIT to return "paged" results suitable for displaying on a limited display, like a web page or GUI window
    • Observe how indexes affect the performance of ORDER BY / LIMIT.
    • Use GROUP BY to support aggregate operations such as sums, averages, maxima, and minima.
    • Filter aggregate results with the HAVING clause.

    The exercises will present small Python scripts that post-process data, and ask students to write the equivalent GROUP BY / HAVING expressions to remove the need for the Python post-processing.

    5. Modifying tables.

    • Write WHERE clauses for UPDATE and DELETE using the same patterns already learned for SELECT.
    • Use transactions in combinations with UPDATE and DELETE to prevent inconsistent database states from becoming visible to other clients.

    6. ORMs, Objects, and Tables.

    • Create tables of objects using the SQLAlchemy declarative schema in combination with classes.
    • Understand the main differences between SQLAlchemy and the Django ORM, including the idea of explicit saves versus a unit-of-work pattern.
    • See how ORM query syntaxes mix down to SQL statements.
    • Determine when an ORM will be helpful, versus when straight SQL might be a better solution for a particular problem.

    Of course, mastery of these topics cannot be conveyed in a single three-hour course! The tutorial will have succeeded if students learn the main moving parts that are involved in a relationally-backed Python application, if they have gotten some practice with SQL and the kind of tasks that it seeks to simplify, and if they have a foundation upon which to build when they are next faced with writing or modifying Python code that interfaces with a SQL database.

    At 9:00am to 12:20pm, Wednesday 7th March

    In H2, Santa Clara Convention Center

    Coverage video

  • Writing a Pyramid application

    by Carlos de la Guardia

    Pyramid is the web framework at the core of the Pylons Project. It's a "pay only for what you eat" framework. You can get started easily and learn new concepts as you go, and only if you need them. It's simple, well tested, well documented, and fast. This course will present Pyramid and lead you through the creation of a an application as the concepts from the framework are introduced.

    Pyramid is the web framework at the core of the Pylons Project. It’s a “pay only for what you eat” framework. You can get started easily and learn new concepts as you go, and only if you need them. It’s simple, well tested, well documented, and fast.

    Though it’s in part inspired by Zope and uses concepts and software that may be familiar to Zope programmers, no prior Zope experience is required to use it. Also, unlike Zope, you don’t need to understand many concepts and technologies fully before you can be truly productive.

    Pyramid is also inspired by Django and Pylons. It tries to learn valuable lessons from things that have gone well with different web frameworks and give the user great flexibility in applying them.

    This course will present Pyramid and lead you through the creation of a an application as the concepts from the framework are introduced. The extensive Pyramid documentation will be used as “text book”.

    Proposed outline:

    • Installation
    • Scaffolds
    • Persistence options
    • URL dispatch
    • Views
    • View configuration
    • Renderers
    • Static views
    • Security
    • Declarative configuration
    • Testing
    • Deployment

    At 9:00am to 12:20pm, Wednesday 7th March

    In H1, Santa Clara Convention Center

    Coverage video

  • Data analysis in Python with pandas

    by Wes McKinney

    The tutorial will give a hands-on introduction to manipulating and analyzing large and small structured data sets in Python using the pandas library. While the focus will be on learning the nuts and bolts of the library's features, I also aim to demonstrate a different way of thinking regarding structuring data in memory for manipulation and analysis.

    The tutorial will teach the mechanics of the most important features of pandas. It will be focused on the nuts and bolts of the two main data structures, Series (1D) and DataFrame (2D), as they relate to a variety of common data handling problems in Python. The tutorial will be supplemented by a collection of scripts and example data sets for the users to run while following along with the material. As such a significant part of the tutorial will be spend doing interactive data exploration and working examples from within the IPython console.

    The tutorial will also teach participants best practices for structuring data in memory and the do's and don'ts of high performance computing with large data sets in Python. For participants who have never used IPython, this will also provide a gentle introduction to interactive scientific computing with IPython.

    At 1:20pm to 4:40pm, Wednesday 7th March

    In D2, Santa Clara Convention Center

    Coverage video

  • How to get the most out of your PyPy

    by Alex Gaynor, Maciej Fijalkowski and Armin Rigo

    For many applications PyPy can provide performance benefits right out of the box. However, little details can push your application to perform much better. In this tutorial we'll give you insights on how to push pypy to it's limites. We'll focus on understanding the performance characteristics of PyPy, and learning the analysis tools in order to maximize your applications performance.

    We aim to teach people how to use performance tools available for PyPy as well as to understand PyPy's performance characteristics. We'll explain how different parts of PyPy interact (JIT, the garbage collector, the virtual machine runtime) and how to measure which part is eating your time. We'll give you a tour with jitviewer which is a crucial tool for understanding how your Python got compiled to assembler and whether it's performing well. We also plan to show potential pitfalls and usage patterns in the Python language that perform better or worse in the case of PyPy.

    We'll also briefly mention how to get your application running on PyPy and how to avoid common pitfalls there, like reference counting or relying on C modules.

    This tutorial is intended for people familiar with Python who have performance problems, no previous experience with PyPy is needed. We ask people to come with their own problems and we'll provide some example ones. Attendees should have the latest version of PyPy preinstalled on their laptops.

    At 1:20pm to 4:40pm, Wednesday 7th March

    In H1, Santa Clara Convention Center

    Coverage video

  • MongoDB and Python

    by Rick Copeland and Bernie Hackett

    This intermediate-level class will teach you techniques using the popular NoSQL database MongoDB, its driver PyMongo, and the object-document mapper Ming to write maintainable, high-performance, and scalable applications. We will cover everything you need to become an effective Ming/MongoDB developer from basic PyMongo queries to high-level object-document mapping setups in Ming.

    The class will begin with a brief overview of MongoDB and its Python driver PyMongo. We will cover basic operations using PyMongo, including data manipulation, querying, and GridFS. Students will install MongoDB and PyMongo as part of this section.

    We will then describe the design philosophy and setup of Ming, a SQLAlchemy-inspired object-document mapper (ODM) for MongoDB developed at SourceForge.

    Next we will cover the base-level implementation of Ming, including schema design, the session and datastore, lazy migrations, data polymorphism, and GridFS support. We will also cover effective MongoDB index design, querying, and updating techniques, and how to use these with Ming. Students will install Ming as a part of this section, and have exercises covering schema design, lazy migrations, and GridFS.

    The final segment will cover the object-document mapper portion of Ming. We will cover the unit of work design pattern, object relations, ODM-level polymorphism, and how to drop down to the base layer (or even down to pymongo) when you really need to. This section will include exercises in designing your ODM model and effectively using the unit-of-work session.

    This talk targets Python 2.6-2.7 and MongoDB 2.0. Students should have Python 2.6 or 2.7 installed on their machines prior to the class and should be comfortable using virtualenv and pip or easy_install to install packages.

    At 1:20pm to 4:40pm, Wednesday 7th March

    In H2, Santa Clara Convention Center

    Coverage video note

  • Web scraping: Reliably and efficiently pull data from pages that don't expect it

    by Asheesh Laroia

    Exciting information is trapped in web pages and behind HTML forms. In this tutorial, you'll learn how to parse those pages and when to apply advanced techniques that make scraping faster and more stable. We'll cover parallel downloading with Twisted, gevent, and others; analyzing sites behind SSL; driving JavaScript-y sites with Selenium; and evading common anti-scraping techniques.

    • Basics of parsing
    • The website is the API
    • HTML is a mess, but we can parse it anyway
    • Why regular expressions are a bad idea
    • Extracting information, using XPath, CSS selectors, and the BeautifulSoup API
    • Expect exceptions: How to handle errors
    • Basics of crawling
    • A quick review of HTTP
    • Why cookies are necessary for maintaining a session
    • How servers can track you
    • How to submit forms with mechanize
    • Debugging the web
    • Comparing FireBug and Chrome's DOM Inspector
    • The "Net" tab
    • Using a logging HTTP proxy to record traffic
    • Counter-measures, and how to circumvent them
    • JavaScript
    • Hidden form fields (e.g., Django CSRF)
    • CAPTCHAs
    • IP address limitations
    • How to cover your scraping code with tests
    • Why you should store snapshotted pages
    • Using mock objects to avoid network I/O
    • Using a fake getPage for Twisted
    • Parallelism
    • A quick tour of different models:
    • Twisted
    • gevent
    • celery
    • Handling JavaScript
    • Automating a full web browser with Selenium RC
    • Running JavaScript within Python using python-spidermonkey
    • Conclusion
    • Use your power for good, not evil.
    • Q&A

    At 1:20pm to 4:40pm, Wednesday 7th March

    In H3, Santa Clara Convention Center

    Coverage video

Thursday 8th March 2012

  • Documenting Your Project With Sphinx

    by Brandon Rhodes

    Python projects can succeed or fail because of their documentation. Thanks to Sphinx, Python now has a “documentation framework” with indexing, syntax highlighting, and integration with your code. Students will be given a small undocumented Python package, and during the exercises they will give the package a tutorial and reference manual. Plus: deployment and theming!

    Python projects can succeed or fail based on their documentation. Thanks to Sphinx, Python now has a "documentation framework" that provides convenient indexing and automatic syntax highlighting, and can also integrate your documentation with your code (your documentation can be run as a test, and your class and function docstrings can become your reference documentation). Students will be given an undocumented sample Python package, and be lead through exercises that result, by the end of the tutorial, in their giving the package a full tutorial and reference manual. Deployment and theming will also be taught.

    Besides a 15-minute introduction and 15 minutes for questions and discussion at the end, the tutorial will be organized in six 25-minute sessions which each involve a short lecture and then an interactive exercise that asks the students to apply what they have just learned. Here are the major topics covered by each of the six sessions:

    • The reStructuredText markup language and its syntax; the standard doctools; and the two different conventions that Sphinx can support for laying projects out as directories and files.
    • The Sphinx documentation build process on both Unix and Windows; how to arrange your project documentation in a way that will make sense to novice, experienced, and expert users alike; and how Sphinx supports connections between different pages of documentation.
    • Running code examples in the documentation as doctests; the pros and cons of pulling docstrings from the code as API documentation (and how to do it if it proves necessary); and including non-doctest full code listings in the documentation.
    • Referencing headings in the same document; cross-referencing between documents; making class and method names automatically link to their entry in the API documentation; and how to make code objects appear in the index.
    • Theming with custom HTML and CSS, for students who happen to know web design; plugging in pre-made Sphinx themes; and how to integrate Sphinx into an entire web site for their product.
    • Shipping documentation with your package on PyPI; installing it readthedocs.org; making sure that documentation gets included with a binary install; using a version control source browser to view documentation directly in their project trunk; and deploying Sphinx to a web site.

    The Sphinx approach will be linked to other successful documentation systems in our computing heritage, most notably in the practices it shares in common with the Unix Documenter's Workbench (DWB) of the 1970s.

    At 9:00am to 12:20pm, Thursday 8th March

    In H2, Santa Clara Convention Center

    Coverage video

  • High Performance Python I

    by Ian Ozsvald

    At EuroPython 2011 I ran a very hands-on tutorial for High Performance Python techniques. This updated tutorial will cover profiling, PyPy, Cython, numpy, NumExpr, ShedSkin, multiprocessing, ParallelPython and pyCUDA. Here's a 55 page PDF write-up of the EuroPython material: http://ianozsvald.com/2011/07/25...

    At EuroPython 2011 I ran a very hands-on tutorial for High Performance Python techniques. This updated tutorial will cover:

    • profiling with cProfile, run_snake and line_profiler
    • PyPy
    • Cython
    • numpy with and without vectors
    • NumExpr
    • ShedSkin Py->C++ compiler
    • multiprocessing for multi-core
    • ParallelPython for multi-machine
    • pyCUDA demos

    I plan to expand the original material and to maybe also cover other tools like execnet and PyPy-numpy.

    At 9:00am to 12:20pm, Thursday 8th March

    In D2, Santa Clara Convention Center

  • Introduction to Game Development

    by Katie Cunningham and Richard Jones

    This tutorial will walk the attendees from some introductory game development theory (what makes a good game) and through development of a simple game (how to make a good game) with time left over for some experimentation and exploration of different types of games.

    The tutorial will start with Katie Cunningham giving an introduction to video games, covering the basic components of a game, and some general game genres. Some basic tropes in modern games will be explored, as well as pitfalls to avoid in making a game for a today’s audience. Genres will be paired with inexpensive/free examples that can be explored by the tutorial attendees later.

    The baton will then pass to Richard Jones who will walk through the practicalities of building a simple video game from scratch, starting with presenting one approach to structuring the game code to keep it sane. He will talk about what libraries are available and then focus on the facilities present in the library used in the tutorial.

    We will then walk through the development of a simple game during which the attendees will code the game. Once the game is developed we will talk about potential further development that possibilities and use the remaining tutorial time to encourage and assist attendees in their efforts to do so.

    The game developed will cover the key game-writing skills of controlling what appears on the screen (including animation), loading resources, handling user input and simulating the environment within the game.

    Participants should be familiar with Python, and must have pygame installed. We will not have time to deal with installation and compatibility issues so participants must check their laptops can run pygame applications.

    At 9:00am to 12:20pm, Thursday 8th March

    In H3, Santa Clara Convention Center

    Coverage video

  • Optimize Performance and Scalability with Parallelism and Concurrency

    by Robert Hancock

    From how the operating system handles your requests through design principles on how to use concurrency and parallelism to optimize your program's performance and scalability. We will cover processes, threads, generators, coroutines, non-blocking IO, and the gevent library.

    How processes, threads, coroutines, and non-blocking IO work from the operating system through code implementation and design principles to optimize Python programs. The difference between parallelism and concurrency and when to use each.

    The premise is that to make an informed decision you need to know what is happening under the hood. Once you understand the low level functionality, you can make the correct decision in the design phase.

    The emphasis is on practical application to solve real world problems.

    Outline

    • How the operating system handles traps and interrupts
    • Scheduling
    • Processes
    • Threads
    • The GIL
    • Generators
    • What is a coroutine?
    • What is a Python coroutine?
    • Blocking/Non-blocking I/O.
    • Parallelism versus Concurrency
    • How do these work with CPython, Pypy, and Stackless
    • Greenlets and libevent (gevent)
    • Design principles
    • Example networked application
    • Performance results
    • What are other the other options?

    At 9:00am to 12:20pm, Thursday 8th March

    In F2, Santa Clara Convention Center

  • Plotting with matplotlib

    by Mike Müller

    When it comes to plotting with Python many people think about matplotlib. It is widely used and provides a simple interface for creating a wide variety of plots from very simple diagrams to sophisticated animations. This tutorial is a hands-on introduction that teaches the basics of matplotlib. Students will learn how to create publication-ready plots with just a few lines of Python.

    Target Audience

    This tutorial is for Python users who would like to create nice 2d plots with Python.

    Audience Level

    Students should have a working knowledge of Python. NumPy knowledge is helpful but not required.

    Prerequisites

    Please bring your laptop with the operating system of your choice (Linux, Mac OS X, Windows). In addition to Python 2.6 or 2.7, we need: - a current versions of matplotlib (http://matplotlib.sourceforge.net) - IPython (http://ipython.org) and - NumPy (http://numpy.scipy.org).

    Method

    This is a hands-on course. Students are strongly encouraged to work along with the trainer at the interactive prompt. There will be exercises the students need to do on their own. Experience shows that this active involvement is essential for an effective learning.

    Content

    The library matplotlib provides many different types of diagrams from within Python with only few lines of code. Examples are used to exercise the use of this library. The tutorial provides an overview how to create plots with matplotlib. IPython in combination with pylab from matplotlib provides an interactive environment for fast testing of ideas. We will be using this for most of the tutorial.

    With a simple plot we learn how to add axis labels, titles and a legend. The GUI offers zooming, panning, changing of plot sizes and other interactive ways to modify the plot. We will use Python to change properties of existing plots such as line colors, marker symbols, or line styles. There are several ways how to place text on plots. You will learn about the different coordinate systems relative to the plot, the canvas or the figure. Another topic are ticks, where to put them and how to format them to achieve publication-quality plots. The concepts of figures, subplots, and axes and how they relate to each other will be explained with examples.

    matplotlib offers many different types of plots. The tutorial introduces several of them with an example. A more advanced topic will be creating your own plot types. We will build a stacked plot type. Finally, we will create a small animation to explore the possibilities to visualize changes.

    Outline

    • Introduction (5 min)
    • IPython (5 min)
    • pylab (5 min)
    • Simple plots (20 min)
    • Properties (15 min)
    • Text (20 min)
    • Ticks (25 min)
    • Figures, subplots, and axes (25 min)
    • Other types of plots (10 min)
    • The class library (15 min)
    • Creating New Plot Types (20 min)
    • Animations (15 min)

    At 9:00am to 12:20pm, Thursday 8th March

    In H1, Santa Clara Convention Center

    Coverage video

  • Python Epiphanies

    by Stuart Williams

    This tutorial is for software developers who've been using Python with success for a while but are looking for a deeper understanding of the language. It demystifies a number of language features that are often misunderstood.

    In many ways Python is very similar to other programming languages. However, in a few sometimes subtle ways it is quite different, and many software developers new to Python, after their initial successes, hit a plateau and can't figure out how to get past it. Others don't hit or perceive a plateau, but still find some of Python's features a little mysterious. This tutorial will help deconstruct your incorrect assumptions about Python and pull away the mists of confusion.

    If in your use of Python you sometimes feel like an outsider, like you're missing the inside jokes, or like you have most of the puzzle pieces but they don't quite fit together yet, this may be a good tutorial for you.

    After completing this tutorial you'll have a deeper understanding of many Python features. Here are some of the topics we'll cover:

    • How namespaces really work, after which you'll understand:
    • most of the differences between variables in other languages and Python, including
    • why Python has neither pass-by-value nor pass-by-reference function call semantics, or why sometimes variables passed to a function can be changed by it, and sometimes they cannot.
    • Iterables, iterators, and the iterator protocol, including how to add it to a class
    • How generators can make your code easier to read and understand
    • Creating classes without a class statement in order to better understand how they work
    • Bound versus unbound methods and interesting uses of the former
    • How and why you might want to create or use a partial function
    • Other use-cases of functions as first-class citizens
    • Unpacking and packing arguments with * and ** on function call and definition

    Attendee Requirements

    Bring a laptop computer with a recent version of Python 2.7 or Python 3 installed.

    Prerequisites

    Intermediate ability in Python and little or no fear of iterators, generators, classes, methods, and how to call a function that's stored in a data structure.

    At 9:00am to 12:20pm, Thursday 8th March

    In F1, Santa Clara Convention Center

    Coverage video

  • Social Network Analysis with Python

    by Maksim Tsvetovat

    Social Network data permeates our world -- yet we often don't know what to do with it. In this tutorial, I will introduce both theory and practice of Social Network Analysis -- gathering, analyzing and visualizing data using Python and other open-source tools. I will walk the attendees through an entire project, from gathering and cleaning data to presenting results.

    SNA techniques are derived from sociological and social-psychological theories and take into account the whole network (or, in case of very large networks such as Twitter -- a large segment of the network). Thus, we may arrive at results that may seem counter-intuitive -- e.g. that Justin Bieber (7.5 mil. followers) and Lady Gaga (7.2 mil. followers) have relatively little actual influence despite their celebrity status -- while a middle-of-the-road blogger with 30K followers is able to generate tweets that "go viral" and result in millions of impressions.

    In this tutorial, we will conduct social network analysis of a real dataset, from gathering and cleaning data to analysis and visualization of results. We will use Python and a set of open-source libraries, including NetworkX, NumPy and Matplotlib.

    Outline:

    • Introduction. Why should we do this? What is the data like? Why is this different from other techniques? What can we learn?
    • Centralities: Degree, closeness, betweenness, PageRank, Klout Score
    • Beyond Klout Score: Finding communities of interest, finding clusters in networks
    • Information diffusion in networks -- how do things go viral?

    At 9:00am to 12:20pm, Thursday 8th March

    In D3, Santa Clara Convention Center

    Coverage video

  • High Performance Python II

    by Travis Oliphant

    In this tutorial, I will cover how to write very fast Python code for data analysis. I will briefly introduce NumPy and illustrate how fast code for Python is written in SciPy using tools like Fwrap / F2py and Cython. I will also describe interesting new approaches to creating fast code that is leading changes to NumPy on a fundamental level.

    In this tutorial, I will cover how to write very fast Python code for data analysis including making use of NumPy and using GPUs. I will largely focus on writing extensions to Python using hand-wrapping and Cython but will touch also on using tools like weave, Instant, ShedSkin and compare them to PyPy. I will also spend the last part of the tutorial on using GPUs with Python and discuss the performance trade-offs of the technology. This will be a high-level overview of the space with deep dives in Cython and GPUs

    Outline:

    • Brief Introduction to NumPy, SciPy and array-oriented computing with Python including exercises (1 hour)
    • Introduction to hand-wrapping and extending Python (1 hour)
    • Detailed description of Cython and how to use it to connect to machine-compiled code (1 hour)
    • Detailed description of GPUs and how to use them best with NumPy (45 minutes)
    • Summary and overview of using Python to write super fast code (15 minutes)

    At 1:20pm to 4:40pm, Thursday 8th March

    In D2, Santa Clara Convention Center

    Coverage video

  • Introduction to Interactive Predictive Analytics in Python with scikit-learn

    by Olivier Grisel

    The goal of this tutorial is to give the attendee a first experience of machine learning tools applied to practical software engineering tasks such as language detection of tweets, topic classification of web pages, sentiment analysis of customer products reviews and facial recognition in pictures from the web or from your own webcam.

    The demand for software engineers with Data Analytics and Machine Learning skills is rapidly growing and Python / Numpy is one of the best environments for quickly prototyping scalable data-centric applications or interactively exploring your data especially thanks to tools such as IPython and Matplotlib.

    scikit-learn is a very active open source project that implements a variety of state-of-the art machine learning algorithms. The goal of this project and tutorial is to take the algorithms out of the academic papers and make them work on a selection of real world tasks to unleash the value of your data.

    We will focus on providing hints to perform the right data preprocessing steps and on how to select algorithms and parameters suitable for the task at hand. We will also introduce tools and methodologies to measure the performance of the trained models as objectively as possible.

    At 1:20pm to 4:40pm, Thursday 8th March

    In F1, Santa Clara Convention Center

    Coverage video

Friday 9th March 2012

  • Graph Processing in Python

    by Van Lindberg

    Graphs are everywhere - from your distributed source code control to Twitter analytics. This session presents a set of three problems and shows how they can be decomposed into operations on graphs, and then demonstrates solutions using the various graph libraries available for (or accessible to) Python.

    Graphs are a fundamental computer science datatype, and graphs show up in all sorts of models in all sorts of places. So when you have a graph, what can you do with it? Particularly if it is really big?

    Thirty minutes isn't a lot of time to discuss graph processing as a topic, so there won't be a lot of discussion relative to graph theory generally or the terminology of graphs. Instead, this is inspired by Raymond Hettinger's "mastering team play" - a series of exercises showing the lowering of a problem into a graph representation, followed by a demonstration of how the problem can be solved through graph processing. There will also be a little bit of compare-and-contrast between the available graph libraries to show differences. Each problem will be given 8-10 minutes.

    Problem 1: Python's (legal) history
    Python has developed over time under a number of organizations - each with their own license. What portions of Python's codebase are under each license?

    The CVS/SVN/HG trees as graphs modeling change in time
    Identifying and labeling node types
    Graphing and reporting on results
    Problem 2: Development Cliques
    Linux is famously developed with "lieutenants" in charge of different subsystems of the kernel. Python doesn't have lieutenants... or does it? Put another way, if you have a patch, who should you submit it to?

    Mailing list connections as a graph
    Analysis of connections, cliques, and centrality
    Graphing and reporting on results
    Problem 3: Let's get social
    Your employer has decided that its website should be turned into a social network - you know, because there aren't enough of those.

    Bootstrapping a graph by looking at pairwise analysis of products
    How to suggest who people "might know"?

    At 10:50am to 11:30am, Friday 9th March

    In E4, Santa Clara Convention Center

    Coverage video

  • Introduction to Metaclasses

    by Luke Sneeringer

    Python's metaclasses grant the Python OOP ecosystem all the power of more complex object inheritance systems in other languages, while retaining for most uses the simplicity of the straightforward class structures most developers learn when being introduced to object-oriented programming. This talk is an explanation of metaclasses: first, what they are, and second, how to use them.

    • Metaclasses
    • Introduction (2.5m)
    • Python's metaclasses grant the Python OOP ecosystem all the power of more complex object inheritance systems in other languages, while retaining the simplicity of the straightforward class structure that traditional C++ and Java programmers learned, and is taught in programming courses.
    • Classes are Objects, Too! (5m)
    • Classes are first-class objects in Python, like functions/methods
    • Classes, like other objects, can be assigned to variables and passed as arguments
    • ...and this ability is one of the tricks in the reusable code toolbox
    • Concept: Metaclasses generate classes. (5m)
    • The hierarchy starts with "type"
    • Classes are themselves instances of their metaclasses
    • By extension, classes provide code that runs when instances are created, while metaclasses provide code that runs when classes are created.
    • Remember the "analogies" section on standardized tests in the United States (and many other countries)?
    • Babylon 5 : J. Michael Strazynski :: Star Trek : ___
    • Instances : Classes :: Classes : Metaclasses
    • Think about a self-enclosed machine that creates, say, t-shirts. The machine is the class; the individual shirts are the instances. The guy who builds the t-shirt machines is the metaclass.
    • Concrete Code Examples (10m)
    • will cover 3.0 and 2.7
    • (stub: I haven't decided what my example will be yet)
    • Is metaclassing wise? (2.5m)
    • There's nothing inherently wrong or bad about it. Furthermore, sometimes it's by far the best way to solve a problem.
    • Beware, though: Some people find metaclassing confusing.
    • "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." (Brian Kernighan)
    • Questions (5m)

    At 10:50am to 11:30am, Friday 9th March

    In E3, Santa Clara Convention Center

    Coverage video

  • Stop Mocking, Start Testing

    by Augie Fackler and Nathaniel Manista

    Project Hosting at Google Code is a large, well-established system written mostly in Python. We'll share our battle-born convictions about creating tests for test-unfriendly code and the larger topic of testing.

    When launched, Project Hosting’s testing consisted of the stock Subversion test suite and a handful of ad hoc smoke test scripts that required starting the entire system and manually inspecting the test’s output.

    Over six years of codebase evolution, tests have been added with varying degrees of coverage and maintainability. Early system design decisions made adding tests difficult: the first tests added to the system used mock objects unwisely and large numbers of mock objects made refactoring costly in time and effort.

    Frustration with the difficulty of enhancing the service led us to reevaluate our testing practice and led to the discovery of better ways to test applications of this complexity. We will share our experiences with testing and discuss designing for maintainability and testability and appropriate use of testing tools such as frameworks and mocks.

    At 10:50am to 11:30am, Friday 9th March

    In D5, Santa Clara Convention Center

    Coverage video

  • Extracting musical information from sound

    by Adrian Holovaty

    Music Information Retrieval technology has gotten good enough that you extract musical metadata from your sound files with some degree of accuracy. Find out how to use Python (along with third-party APIs) to determine everything from the key/tempo of a song to the pitch/timbre of individual notes. Then we'll do some amusing analysis of popular tunes.

    Music Information Retrieval technology has gotten good enough that you extract musical metadata from your sound files with some degree of accuracy. Find out how to use Python (along with third-party APIs) to determine everything from the key/tempo of a song to the pitch/timbre of individual notes. Then we'll do some amusing analysis of popular tunes.

    Getting basic data about sounds.
    Visualizing waveforms.
    Parsing musical information at the level of song.
    Detecting individual notes ("segments").
    What fun can we have?

    At 11:30am to 12:10pm, Friday 9th March

    In E2, Santa Clara Convention Center

    Coverage video

  • Scalability at YouTube

    by Shannon -jj Behrens and Mike Solomon

    This talk covers scalability at YouTube. It's given by one of the original engineers at YouTube, Mike Solomon. It's a rare glimpse into the heart of YouTube which is one of the largest websites in the world, and one of the few extremely large websites to be written in Python.
    Abstract

    Every day, people watch an average of 3 billion videos on YouTube. Every minute, people upload an average of 48 hours of video to YouTube. YouTube operates at a scale that few other websites will ever see, and it's written mostly in Python.

    Mike Solomon is one of the original engineers at YouTube. In this informal, high-level talk, he'll give an overview of the lessons he's learned as he's brought YouTube to scale. He'll also point out ways in which his philosophy on scaling, testing, and writing Python fly in the face of accepted wisdom. Last of all, we'll also be giving a very short introduction to YouTube APIs and how you can integrate your application with YouTube.

    At 11:30am to 12:10pm, Friday 9th March

    In E1, Santa Clara Convention Center

  • The Art of Subclassing

    by Raymond Hettinger

    All problems have simple, easy-to-understand, logical wrong answers. Subclassing in Python is no exception. Avoid the common pitfalls and learn everything you need to know about making effective use of inheritance in Python.

    Avoid the common pitfalls and learn everything you need to know about how subclass in Python.

    • Overriding and extending
    • Calling your parents
    • The ellipse / circle problem
    • What does a subclass mean?
    • Liskov Substitution Principle
    • Open Closed Principle
    • Facts of life when subclassing builtin types
    • Cooperative Multiple Inheritance
    • Common subclassing patterns
    • Use of the double underscore

    At 11:30am to 12:10pm, Friday 9th March

    In E3, Santa Clara Convention Center

    Coverage video

  • Practical Machine Learning in Python

    by Matt Spitz

    There are a plethora of options when it comes to deciding how to add a machine learning component to your python application. In this talk, I'll discuss why python as a language is well-suited to solving these particular problems, the tradeoffs of different machine learning solutions for python applications, and some tricks you can use to get the most out of whatever package you decide to use.

    This is the age of data. As more companies expose their datasets through APIs, it's becoming increasingly easier to pull information about users, places, and things. But having this data isn't always enough; we want to understand it, find correlations, and identify trends. Fortunately, the area of computer science known as machine learning has a variety of algorithms specifically designed to do this sort of data wrangling. For the python application developer, there are many off-the-shelf toolkits that include implementations of these algorithms (Orange, NLTK, SHOGUN, PyML and scikit-learn to name just a few), but choosing which one to use can be daunting.

    There are a number of tradeoffs one makes when making a selection, depending on the specifics of the implementation and the needs of the application. In this talk, I'll give an overview of some of the packages available and discuss what factors might go into deciding which one to use. I'll also offer some python-specific tricks you can use to work with large amounts of data efficiently.

    At 12:10pm to 12:40pm, Friday 9th March

    In E1, Santa Clara Convention Center

  • Stepping Through CPython

    by Larry Hastings

    Ever wondered how CPython actually works internally? This talk will show you. We start with a simple Python program, then slowly step through CPython, showing in exhaustive detail what happens when it runs that program. Along the way we'll examine the design and implementation of various major CPython subsystems and see how they fit together. The audience should be conversant in C and Python.

    The goal of the talk is to sufficiently familiarize the audience with CPython's internal structure such that a programmer versed in C and Python but having never dealt with an interpreter would be able to comfortably dive in and start hacking on CPython.

    The program examined will be simple but deliberately designed to exercise most of CPython's runtime behavior. This will include loading modules implemented in C and in Python, loading bytecode cached on disk, and a cross-section of bytecodes. (For example, I only need to examine one of the BINARY_* math operands, I don't need to walk through every single one.)

    Areas I expect to examine:

    • built-in modules, including ones that are automatically loaded before your program starts bytecode, including
    • the various implementations of the inner loop (switch statement, labels-as-values)
    • the peephole optimizer
    • on-disk format
    • marshal
    • the magic version number
    • mention lnotab but probably skip the gory details the stack machine
    • unwinding the stack after an exception (and producing tracebacks)
    • contrast CPython's approach with Stackless All the possible fields of PyObject, an overview of fields in PyType built-in types
    • the implementations of a few key internal types
    • list, dict, tuple, str, byte, int, bool, None
    • though not to the level of detail that Hettinger or Rhodes did in past talks
    • interned values the GIL and reference counting
    • weakrefs
    • garbage collection
    • Py_TRASHCAN CPython's small-block and arena allocators
    • The parser, though I don't want to spend a lot of time on it (runtime is where the fun is ;)
    • Internal utility functions like PyArg_Parse

    I'll be giving the talk based on CPython 3.2.

    At 12:10pm to 12:55pm, Friday 9th March

    In E2, Santa Clara Convention Center

    Coverage video

  • Stop Writing Classes

    by Jack Diederich

    Classes are great but they are also overused. This talk will describe examples of class overuse taken from real world code and refactor the unnecessary classes, exceptions, and modules out of them.

    Classes must be nouns but not every noun must be a class. If your class only has two methods and one of them is init you probably meant to write a function.
    MuffinMail recently refactored their API; it went from 20 classes scattered in 22 modules down to 1 class just 15 lines long. It was a welcome change, but we'll further refactor that down to a single function 3 lines long.

    The Python stdlib is an example of a namespace that is relatively flat. You won't find packages that consist of a single module defining an exception, and you won't find many exceptions at all - just 165 kinds in 200k lines of code. That's a tiny ratio compared to most projects including Django.

    Of course there are things, like containers, that should be classes. As a final example we'll add a Heap type to the heapq module (admit it, you already have one in your utils.py).

    At 12:10pm to 12:40pm, Friday 9th March

    In E3, Santa Clara Convention Center

    Coverage video

  • Advanced Security Topics

    by Paul McMillan

    If your Python application has users, you should be worried about security. This talk will cover advanced material, highlighting common mistakes. Topics will include hashing and salts, timing attacks, serialization, and much more. Expect eye opening demos, and an urge to go fix your code right away.

    If your Python application has users (even if it's used offline), you should be worried about security. This talk will cover advanced material, highlighting common mistakes.

    Hashing and encryption can be tricky to get right. We'll discuss when to use hashing to sign data, and how to choose the right encryption algorithm (spoiler: don't). We'll demonstrate length extension attacks, and discuss how to prevent them.

    Another common mistake is the incorrect use of pseudo-random number generators. We'll discuss the fix, and some of the dangers associated with it.

    Timing attacks are relatively exotic, but as applications move into shared data centers (and shared virtual machines) they have become easier to implement and more dangerous. They're a very common class of bugs, but fixing them (and proving they're fixed) can be difficult.

    Pickle is a common and easy to use serialization format for Python objects. Unfortunately, it's also insecure when attackers can send or modify the pickled data. We'll discuss strategies for signing pickled objects, and alternate serialization formats.

    The final portion of the talk will discuss a meta security problem within the Python community. I'll be demonstrating live code that can compromise even the most locked down of servers, and discussing the steps we need to take as a community to mitigate this threat moving forward.

    At 1:45pm to 2:40pm, Friday 9th March

    In E1, Santa Clara Convention Center

    Coverage video

  • Build reliable, traceable, distributed systems with ZeroMQ

    by Jérôme Petazzoni

    We will show how to build simple yet powerful RPC code with ZeroMQ, with very few (if any!) modification to existing code. We will build fan-in and fan-out topologies with ZeroMQ special socket types to implement PUB/SUB patterns and scale up job-processing tasks. Thanks to introspection, the resulting services will be self-documented. Finally, we will show how to implement distributed tracing.

    We will show how to leverage ZeroMQ to build a simple yet powerful RPC for Python code. We will focus on simplicity, the goal being to expose almost any Python module or class to network calls – with very few (if any!) modification to existing code.

    We will then explain the purpose and show some use-cases for ZeroMQ special socket types (PUSH/PULL, PUB/SUB, ROUTER/DEALER) to build fan-in and fan-out topologies, as well as asynchronous processing (to avoid blocking when doing long-running requests). A by-product is the ability to scale up job-processing tasks with a message queue, which can even be made broker-less (you don’t have to deploy heavy machinery if you don’t need it).

    We will also demonstrate how introspection can make development and debugging easier, exposing docstrings, and provideing a few command-line helpers to poke, debug, and experiment directly from the shell.

    At the end of the talk (or in a separate talk), we will explain how to implement a tracing framework for distributed RPC. By hooking into the right places, we will show how to get full tracebacks and profiling information; more precisely:

    how each complex call (involving multiple subcalls) can be accurately traced;
    how to handle exceptions, and know easily when and where they happened (without checking dozens of log files);
    which complex calls take too long, and where they spend their time (distributed profiling).
    Those guidelines are the result of an on-going development work at dotCloud, and actively used and implemented at the core of our leading Platform-as-a-Service offering.

    We don’t expect the audience to be familiar with ZeroMQ or RPC. However, it will certainly help to have basic knowledge of serialization (e.g. pickle) and sockets.

    At 2:00pm to 2:40pm, Friday 9th March

    In E4, Santa Clara Convention Center

  • Code Generation in Python: Dismantling Jinja

    by Armin Ronacher

    For many DSLs such as templating languages it's important to use code generation to achieve acceptable performance in Python. The current version of Jinja went through many different iterations to end up where it is currently. This talk walks through the design of Jinja2's compiler infrastructure and why it works the way it works and how one can use newer Python features for better results.

    Why Code Generation?
    It seems like the general consensus for code generation in many dynamic language communities is: eval is evil, do not use it. However if done properly code generation solves a lot of problems easily, securely and with much better performance than an interpreter written on top of an interpreted language like Python.

    Code generation is what powers most template languages in Python, what powers object relational mappers and more. It is also an excellent tool to simplify debugging.

    Why Codegen is no Silver Bullet
    Just because you generate code does not mean you're faster than an interpreter written in Python. This part of the talk focuses on why compiling Django templates to Python bytecode does not automatically make it fast.

    Design of Jinja2
    Jinja2 underwent multiple design iterations, most of which were made to either improve performance or debug-ability. The internals however are largely undocumented and confusing unless you're familiar with the code. In it however are a few gems hidden and interesting tricks to make code generation work in the best possible way.

    Python's Support for Code Generation
    Over the years Python's support for code generation was steadily improved with different ways to access the abstract syntax tree and to compiling it back to bytecode. This section highlights some alternative ways to do code generation that are not yet fully implemented in Jinja2 but are otherwise widely used.

    At 2:00pm to 2:40pm, Friday 9th March

    In E2, Santa Clara Convention Center

Schedule incomplete?

Add a new session

Filter by Day

Filter by coverage

Filter by Topic

Filter by Venue

Filter by Space