Creating automated, efficient and accurate data pipelines out of the (often) noisy, disparate and busy data flows used by today's enterprises is a difficult task. Data science teams and engineering teams may be asked to work together to create a management platform (or install one) that helps funnel these streams into the company's so-called data lake. But how are these pipelines managed? Who is in charge of maintaining services and reducing costs? How do we ensure data is not lost, not duplicated and is factually accurate? These concerns, among others, will be discussed alongside implementation decisions for those looking for a practical recommendation on the what and how of data automation workflows.
Director of Technology at @hyfn. Lover of all things Unix and pythonista extrodionaire. bio from Twitter
Sign in to add slides, notes or videos to this session