Sessions at EuroPython 2012 about Data Analysis on Friday 6th July

View as grid

Your current filters are…

Clear
  • Cubes - Light-weight OLAP Framework and Server

    by Štefan Urbánek

    Cubes is a light-weight Python framework for Online Analytical Processing (OLAP), multidimensional analysis and (in the future) pre-aggregated cube computation. Main features are:

    • aggregation browser of multidimensional hierarchical data
    • logical model metadata (end user layer) description of how data are being analysed and reported, * independent of physical data implementation
    • hierarchical dimensions (attributes that have hierarchical dependencies, such as category-
    • subcategory or country-region)
    • localizable metadata and data Localization
    • OLAP server
    • modular back-end architecture

    Intended use is aggregated browsing, slicing and dicing the data.

    I would like to introduce the OLAP briefly and then the framework, explain the reason of the framework (alternative to corporate tools and over-bloated tools), explain basic principles, show how it can be used for multi-dimensional aggregations and to demonstrate how can one set-up OLAP HTTP server “in 15 minutes”.

    References:

    • Github
    • Documentation
    • Blog
    • Project home
    • Example of an application using Cubes Slicer Server as backend
    • Project is using SQLAlchemy for relational database access and Werkzeug for the HTTP server.

    At 9:45am to 10:30am, Friday 6th July

    In Tagliatelle, Grand Hotel Mediterraneo

  • Introduction to Number Crunching

    by Valerio Maggio and Enrico Franchi

    As computer scientists and geeks, we hate repetitive and manual operations and usually prefer making all the processing as automatic as possible (http://jonudell.net/images/geeks...). Manual operations are boring, time consuming and mostly error-prone and do not allow for any kind of replication or reuse. On the other hand, automatic processing highly promote a better reuse of common operations and may easily scale on problem of different size, from few to very large amount of data.

    All such data analysis processes are usually indicated by the term “crunching”, which refers to the analysis of large amounts of information in an automatic fashion, together with its corresponding set of “complex” operations.

    Many tools used for data analysis are not overly geek friendly as they require a great deal of repetitive tasks: consider for example the simple case in which we have to collect values obtained by an experimental trial and we have to compute the mean, the minimum and the maximum of such values. A typical solution is copying all the data into an Excel file and to perform all the analysis of interest from there. However, all of these operations become infeasible in real world scenarios where we have to deal with huge amount of data and when “doing things manually” means go ahead by copying and pasting data from several different files.

    While other tools, such as Matlab, allow a better automation and offer a more programmer friendly environment, Python offers extremely interesting solutions for these kind of problems. In particular, Python allows to exploits the benefits of a general purpose programming language in combination with a huge number of capabilities for crunching (Numpy, Scipy), data storage (pytables, nosql interfaces), data visualization (matplotlib) and an easy to use interactive environment (iPython, iPython Notebook).

    In this talk we present some of the powerful tools available in the Python environment to automatically analyze, filter and process large amount of data. In particular, we present different real-world case studies along with the corresponding working Python code.

    Basic maths skills and basic knowledge of the Python programming language are the only suggested prerequisites.

    At 2:30pm to 3:30pm, Friday 6th July

    In Spaghetti, Grand Hotel Mediterraneo

    Coverage video