Sessions at EuroPython 2011 about Performance

Your current filters are…

Monday 20th June 2011

  • High-performance computing on gamer PCs

    by Yann Le Du

    In Electron Paramagnetic Resonance Imaging, we are faced with a deconvolution problem that has a strong impact on the image actually reconstructed. Faced with the need of mapping the distribution of organic matter in Terrestrial and Martian rock samples for applications in exobiology, we needed to see how to extract a maximum amount of information from our data : our approach uses reservoir computing artificial neural networks coupled to a particle swarm algorithm that evolves the reservoirs’ weights.

    The code runs on the Hybrid Processing Units for Science (HPU4Science) cluster located at the Laboratoire de Chimie de la Matière Condensée de Paris (LCMCP). The cluster is composed of a central data storage machine and a heterogeneous ensemble of 6 decentralized nodes. Each node is equipped with a Core2 Quad or i7 CPU and 3-7 NVIDIA Graphical Processing Units (GPUs) including the GF110 series. Each of the 28 GPUs independently explores a different parameter space sphere of the same problem. Our application shows a sustained real performance of 15.6 TFLOPS. The HPU4Science cluster cost $36,090 resulting in a 432.3 MFLOPS/$ cost performance.

    That talk is meant to demonstrate on a practical case how consumer grade computer hardware coupled to a very popular computer language can be used to tackle a difficult yet very elementary scientific problem : how do you go from formulating the problem, to choosing the right hardware and software, and all the way to programming the algorithms using the appropriate development tools and methodologies (notably Literate Programming). On the math side, the talk requires a basic understanding of matrix algebra and of the discretization process involved when computing integrals.

    At 2:30pm to 3:30pm, Monday 20th June

    Coverage video

Tuesday 21st June 2011

Wednesday 22nd June 2011

  • Exploit your GPU power with PyCUDA (and friends)

    by Stefano Brilli

    CUDA technology permits to exploit the power of modern NVIDIA GPUs. In this talk, after a brief introduction to GPU architecture, we will focus on how CUDA got inside Python through libraries like PyCUDA and others…

    By some examples we will show the main concepts and techniques for good GPU programming.

    This talk targets anyone who wants to know how to exploit this technology from Python, the suitable use cases, the using techniques and the do-not-using techniques to get the best from his own GPU

    At 12:15pm to 1:15pm, Wednesday 22nd June

    Coverage video

  • Experiences making CPU-bound tasks run much faster

    by Ian Ozsvald

    As a long-time R&D consultant I'm often working to make slow, experimental code run faster for tasks like physics simulation, flood modeling and natural language processing. Python allows a smooth progression from rough-and-ready (but slow) algorithms through to finely tuned tasks that efficiently use as much CPU power as you can bring to bear. Speed-ups of 10-500* can be expected for the Mandelbrot code we'll use.

    In this talk I'll cover a set of libraries that make CPU-bound tasks run much faster. We'll begin with a look at profiling using RunSnakeRun and line_profiler to identify our bottleneck. We'll take a look at slow algorithms in Python and how they can run faster using numpy and numexpr.

    Next we'll cover the use of multiprocessing to utilise multiple CPU cores along with Cython or ShedSkin to easily use C code in a friendly Python wrapper. Multiprocessing on a quad-core system can often provide a 4* speed-up for the right tasks. Next parallelpython will let us run our code on a network of machines.

    Finally we'll look at pyCUDA to utilise an NVIDIA GPU. CUDA can give the best improvements for mathematical problems (over 100* on the right tasks) but works on a narrower set of problems.

    How it'll work:
    The tutorial will be hands on, you'll be converting example files from normal Python to faster variants using the tools below. All of it is optional, you'll get the most benefit by having everything installed. We'll work in groups and open discussion is encouraged.

    NOTE - you are expected to have all these tools installed *before* the tutorial (if you don't, you might find it hard to follow what's going on!).

    I'll be using Python 2.7.1 on a Macbook (Snow Leopard). All of these tools run on Windows and Linux, as long as your versions are fairly recent everything should run just fine.

    My versions (roughly ordered by importance):
    Python 2.7.1
    RunSnakeRun 2.0.1b6 (with wxPython Unicode)
    line_profiler (1.0b2)
    Cython 0.14.1
    ShedSkin 0.7.1
    numpy 1.5.1
    numexpr 1.4.2
    ParallelPython 1.6.1
    pyCUDA HEAD from git as of 14th June 2011 (with CUDA 4.0 drivers)
    PyPy 1.5

    Some background reading:

    At 2:30pm to 6:30pm, Wednesday 22nd June

Thursday 23rd June 2011

  • Python for High Performance and Scientific Computing

    by Andreas Schreiber

    Python is an accepted high-level scripting language with a growing community in academia and industry. It is used in a lot of scientific applications in many different scientific fields and in more and more industries, for example, in engineering or life science). In all fields, the use of Python for high-performance and parallel computing is increasing. Several organizations and companies are providing tools or support for Python development. This includes libraries for scientific computing, parallel computing, and MPI. Python is also used on many core architectures and GPUs, for which specific Python interpreters are being developed. A related topic is the performance of the various interpreter and compiler implementations for Python.

    The talk gives an overview of Python’s use in HPC and Scientific Computing and gives information on many topics, such as Python on massively parallel systems, GPU programming with Python, scientific libraries in Python, and Python interpreter performance issues. The talk will include examples for scientific codes and applications from many domains.

    At 2:30pm to 3:30pm, Thursday 23rd June

  • PyPy in production

    by Antonio Cuni and Armin Rigo

    The PyPy project has recently gathered a lot of attention for its
    progress in speeding up the Python language -- it is the fastest
    Python interpreter, and the most compatible and most stable
    'alternative´ one. No longer merely a research project, PyPy
    is now suitable for production use. We are working on improvements
    on calling into C libraries and generally integrating with
    the existing Python extensions ecosystem.

    We will give an overview on how the tracing Just-in-Time compiler
    works in PyPy. From there, we will then focus on what the PyPy
    project has achieved, particularly in the past two years:

    • most Python benchmarks run much faster than with CPython or Psyco
    • the real-world PyPy compiler toolchain itself (200 KLocs) runs twice as fast
    • already supports 32 and 64bit x86 and is in the process of supporting ARM
    • full compatibility with CPython (more than Jython/IronPython)
    • full (and JIT-ed) ctypes support to call C libraries from Python
    • supports Stackless Python (in-progress)
    • new "cpyext" layer which integrates existing CPython C extensions
    • an experimental super-fast JIT-compilation of calls to C++ libraries

    We want to reserve time for discussing potential future work like SWIG
    and/or Cython compatibility and other areas brought up by the audience.
    There are many interesting details that can be explored further;
    we will focus on the points the audience is most interested in.

    For more info:

    [1] Eurostars Eureka is our funding source since 2009. It is a
    cross-European funding collaboration that targets small firms
    which produce research.

    At 3:30pm to 4:30pm, Thursday 23rd June

    Coverage video

Friday 24th June 2011

  • Making CPython Fast Using Trace-based Optimisations

    by Mark Shannon and Mark Shannon

    CPython can be made faster by implementing the sort of
    optimizations used in the PyPy VM, and in my HotPy VM.
    All the necessary changes can be made without modifying the language or the API.

    The CPython VM can be modified to support optimizations by adding
    an effective garbage collector and by separating the
    virtual-machine state from the real-machine state (like Stackless).

    Optimizations can be implemented incrementally.
    Since almost all of the optimizations are implemented in the interpreter,
    all hardware platforms can benefit.
    JIT compiler(s) can then be added for common platforms (intel, ARM, etc.).

    For more information see http://hotpy.blogspot.com/

    At 2:30pm to 3:30pm, Friday 24th June

    Coverage video