Thursday 6th August, 2015
11:40am to 12:20pm
The tutorial shows a smarter process for cleaning and ingestion of dirty data. PySemantic module validates and cleans data based on human-readable rules and constraints, significantly simplifying data ingest. It provides a simple schema for defining constraints on a dataset, which is enforced before and during the ingestion of the data. Thus, the data read by a program is guaranteed to conform to certain restrictions, which saves a lot of repetitive effort. The process makes data cleaning efficient, and is a scalable system for data ingest.
Data Scientist at DataCulture
Jaidev is data scientist at DataCulture, and previously a software developer at Enthought, Inc, where he worked on data analysis and visualization. His research interests are in the fields of machine learning and signal processing. He is interested in most things computational. He is a python programmer who keeps hurting himself with feeble attempts at building FOSS packages. Take a look at his FOSS wreckage here - www.github.com/jaidevd
Sign in to add slides, notes or videos to this session