As computer scientists and geeks, we hate repetitive and manual operations and usually prefer making all the processing as automatic as possible (http://jonudell.net/images/geeks...). Manual operations are boring, time consuming and mostly error-prone and do not allow for any kind of replication or reuse. On the other hand, automatic processing highly promote a better reuse of common operations and may easily scale on problem of different size, from few to very large amount of data.
All such data analysis processes are usually indicated by the term “crunching”, which refers to the analysis of large amounts of information in an automatic fashion, together with its corresponding set of “complex” operations.
Many tools used for data analysis are not overly geek friendly as they require a great deal of repetitive tasks: consider for example the simple case in which we have to collect values obtained by an experimental trial and we have to compute the mean, the minimum and the maximum of such values. A typical solution is copying all the data into an Excel file and to perform all the analysis of interest from there. However, all of these operations become infeasible in real world scenarios where we have to deal with huge amount of data and when “doing things manually” means go ahead by copying and pasting data from several different files.
While other tools, such as Matlab, allow a better automation and offer a more programmer friendly environment, Python offers extremely interesting solutions for these kind of problems. In particular, Python allows to exploits the benefits of a general purpose programming language in combination with a huge number of capabilities for crunching (Numpy, Scipy), data storage (pytables, nosql interfaces), data visualization (matplotlib) and an easy to use interactive environment (iPython, iPython Notebook).
In this talk we present some of the powerful tools available in the Python environment to automatically analyze, filter and process large amount of data. In particular, we present different real-world case studies along with the corresponding working Python code.
Basic maths skills and basic knowledge of the Python programming language are the only suggested prerequisites.
Ph.D. Student, University of Naples Federico II
Valerio Maggio got both the BSc. and MSc. degrees in Computer Science with honours (cum laude) at the University of Naples "Federico II", where he is currently enrolled in the Ph.D. course on Computational Science and Information Tecnology. His research interests are mainly focused on the definition and on the application of Information Retrieval and Machine Learning techniques to Software Maintenance tasks such as Mining Software Repositories, Software Clustering and Clone Detection.
By the way, he is a truly passionate Python programmer and an enthusiastic Django developer, who enjoys drinking good tea and listening to good music.
Ph.D. Student, University of Parma
Enrico Franchi is currently enrolled in a Ph.D. course on Information Technologies at the University of Parma, after obtaining his degrees in Maths and Computer Science. His main research interest are Multi-Agent and Distributed Systems, Social Networks, AI and Software Engineering. <br/>
His main computer related interests are in functional and freedom languages, agile development and programming in the Unix environment. He has been interested in Python since the early noughties, and in Python he especially appreciates the essentialist aesthetics, the formal purity and the natural predisposition to agile development methods.
3:30pm Building an Advanced Python Installation for Linux and Windows by Anselm Kruis
Sign in to add slides, notes or videos to this session