In this session you will see how BigML makes machine learning more
accessible than ever thanks to it's well defined workflow, insightful
visualizations, and fully featured REST API. Concepts demonstrated
will include predictive analytics with decision trees, how to solve
over-fitting with ensembles, how to evaluate a predictive model, how
to find patterns with clustering and how to detect anomalies.
Don't miss the opportunity to learn first-hand how to easily create
powerful predictive applications with BigML.
Indico is a Boston-based company working to democratize machine learning. We are building intelligent tools for smart data science, and hope to blur the lines between developer and data scientist: making data science and machine learning truly ubiquitous.
Need text and image analytics? Come learn how to use Indico’s suite of machine learning tools — from sentiment detection to image similarity to political analysis. We’ve got wrappers for most major languages (Python, R, Node, Ruby, Java, Objective-C, and PHP). The tutorial will include an explanation of how to build a robust image-search functionality.
This session will demonstrate how starting with a linear R script, a data scientist can create an interactive web application. The script is first split into stateless functions that do things like obtaining data, cleaning it, filtering it, transforming it, running descriptive and predictive algorithms, and generating plots. A graphical interface will then be created by dragging and dropping widgets (components like text boxes, drop-downs, tables, images). The widgets are configured to link the underlying R functions into a workflow. The resulting interactive application is immediately put into production, without any IT involvement. Computing resources are dynamically allocated for each user that runs the applications. The use of the application is then demonstrated, including sharing the state with other users.
by Greg Lamp
Building the predictive aspect of applications is the fun, sexy part. New tools like scikit-learn, pandas, and R have made building models less painful, but deploying/embedding models into production applications is challenging. We'll show how Yhat makes deploying predictive models written in Python or R fast and easy by building a beer recommendation system and an accompanying webapp.
The open-source based predictive analytics solution RapidMiner offers an API that lets users extend its functionality easily and integrate it into predictive applications. As a result, many research organizations, universities, and companies have built their work on this platform and extended its applicability to new domains. This presentation highlights the top extensions found on the RapidMiner Marketplace and how developers can use the RapidMiner API to create their own extensions as well as to integrate RapidMiner into their solutions.
by Danny Bickson
by Thomas Stone
PredictionIO is an open source machine learning server for software developers to create predictive features. Traditionally, this included personalization, recommendation and content discovery in domains such as e-commerce and media. The latest version of PredictionIO will open our platform for many more use cases such as churn analysis, trend detection and more! Allowing developers to use the power of machine learning for any web and mobile app. We will also discuss the new software design pattern DASE for building machine learning engines on top of PredictionIO's scalable infrastructure. It's time to see what an open source community can build re-imaging software with machine learning.
The data that goes into and comes out of Predictive APIs are crucial. Integrating these APIs and staying on top of the quality of data can be a challenge: due to the increase of 3rd party web APIs it is usually a lot of effort to manage the various integrations with the various 3rd parties. This talk will demonstrate a proxy-based approach where data can be transformed, harmonized, or customized between the API provider and before they hit the app. For this I will use APItools which is a free and open-source set of tools designed specifically to solve the integration pain for developers.
Jeroen Janssens is the author of Data Science at the Command Line. In this exclusive tutorial, he will show us how to use predictive APIs to make predictions from the command line.
by Andy Thurai
The birth of a sophisticated Internet of Things has catapulted hybrid data collection, which mixes structured and unstructured data, to new heights. The goal with any analytics software is to find and improve better data sets rather than spending time in identifying, prepping, cleaning, and preparing the data. Not only is predicting and prescribing an action anticipating a future issue desired, but if the action is ignored then a forward thinking automatic adoption should suggest an advanced course correction based on previous action items not acted upon. Predictive analytics algorithms should recalibrate themselves. As the incoming data evolves, so do the algorithms – they must re-fit, re-predict and re-prescribe.
Listen to Andy Thurai, Program Director at IBM (API, IoT and Connected Cloud), talk about how the time has come for machines and humans to work together to make each other smarter. The combination of APIs, IoTs, big data, smarter analytics, and cognitive computing is transforming the way we see the future — and more importantly, what we do about it.
Recycling centers where designed 40 years a ago and have now trouble managing the huge demand they are facing. Cars often queue for hours and people often find container which is already full. The result is a huge amount of waste buried or burned where it could have been recycled.
A contextual predictive model was developed in order to provide the citizens with the information: what is the best moment to go to which recycling center in terms of waiting time and bin availability ?
This predictive model depends on sensors deployed in each recycling centers and various open data sources.
The API here is the web that links avery parts:
- the sensors to push new version of the software and source the measurements in real time
- the predictive models is fed with fresh measurements and fresh data
- the web/mobile app with the predictions
- the users demands are crowdsourced to a server
- the BI tools of the waste managements authorities
We hope to demonstrate how a predictive API like ours can solve real life problems.
Bikeshare schemes are present in more than 700 cities and they're expanding rapidly (more than 200 cities are currently building such schemes). They allow people to move freely around town by bike. Using 4 years of data consisting of snapshots of the Bordeaux bike network taken every minute and of detailed weather data, we are able to predict load up to 12 hours in advance. Predictions are made available to end users through an API which is used by the popular Bordeaux bikes mobile app. Predictions are constantly updated and they can help users plan their trips, but they can also help operators anticipate bike shortage at each station (or on the opposite bike affluence) and thus optimize load balancing strategies.
Office 365 Health API enables developers to build predictive mobile and web applications to monitor application and service health. Developers can use the data streams from the API to provide custom information and quick actions for our customers, partners, and internal IT stakeholders. The API is powered by the Office 365 Health Engine that performs data curation and analytics in real time using statistical and machine learning models on top of multiple signals.
by Marc Torrens
Banks are increasingly facing competition from the giant Internet companies in the financial sector. To meet this challenge they must take advantage of their unique position: they have been collecting financial behavioral data for decades. At Strands we have implemented a platform that channels relevant commercial offers from merchants to consumers within financial institutions. The relevancy of the offers is optimized by predicting the likelihood of consumers to buy products within a given industry or merchant. In this talk, we'll share some of our methods and we’ll show how merchants can create targeted audiences for their campaigns without having to deal with the complexity of predictive modeling.
In this session, we will discuss the different technical options that have been considered in our lab to build predictive apps within a large organization. They rely either on open source or commercial products. We will provide some feedback on ongoing experimentations and thoughts on deployment strategies, types of platforms and frameworks.
Real-time bidding, in the context of digital marketing, refers to the purchase of advertising impressions one at a time, responding to tens of thousands of messages per second, paying a different price for each via an auction mechanism. This talk will cover in detail how Datacratic’s RTB Optimizer Prediction API predicts the outcome of buying a given impression, then computes the economic value of that outcome to produce optimal bidding behaviour.
Human sentiments are broad and diverse. Products, brands and people can make us feel a wide range of emotions: happiness, anger, disappointment… Users and consumers express these feelings and opinions on Twitter, Facebook, blogs and comments, within the reach of companies and organizations. Understanding language is the key to learn what the community thinks about a particular product and to make predictions about it. Mastering language, however, is tricky: the diversity of vocabulary, the differences between regions and the use of irony and metaphors makes of sentiment analysis a complex and fascinating task.
In this session we will see a practical case on how sentiment analysis allows detecting opinions, the challenges that understanding language automatically arises and the lessons learned when working on languages that are spoken on different countries, like English, Spanish or French.
Real, production-grade big data apps require integration between data ingestion components, Beyond Hadoop technologies, data science & modeling tools, publishing data to low-latency, high-throughput REST API’s backed by SQL or NoSQL stores, a visualization / application layer – as well as monitoring, instrumentation, security and administration tools. Adding intelligence to such an application also requires a broad set of machine learning algorithms, the ability to train & measure experiments in parallel, the ability to publish models as front-end low-latency robust API’s, and having online measurements in place. All this results in man-months of work on an average project to glue these pieces together into a production system. In this talk we’ll describe some of the main gaps we’ve had to address building dozens of such systems over the years, the design patterns & reference architecture we’ve come to adopt, and some handy tools to automate common tasks. We’ll demonstrate the challenges by building an end-to-end scalable, intelligent app during the session.
Big Data gets all the headlines but the larger shift will come from machine learning becoming ubiquitous. API services like Datagami enable non-specialist web developers to use machine learning in unexpected ways. We will discuss a particular example of house price modelling as a component of a larger real estate website.
At the end of 2013, Yandex organised a Learning-to-Rank competition on Kaggle. Dataiku offered to sponsor a team of 4 people (2 data scientists, one product manager, and one developer) for the contest. Our team won the first prize. This talk will provide insights on how we did it as a team:
by Danny Bickson and Shawn Scully
One of the most exciting areas in Big Data is the development of new predictive applications; apps used to drive product recommendations, predict machine failures, forecast airfare, social match-make, identify fraud, predict disease outbreaks, and repurpose pharmaceuticals. These applications output real-time predictions and recommendations in response to user and machine input to directly derive business value and create cool experiences. These hold the true promise of Big Data.
The most interesting apps utilize multiple types of data (tables, graphs, text, & images) in a creative way. Typically, these are developed using data that’s larger than single machine memory, but smaller than the Pb’s some companies brag about housing. This “Medium Data” regime of >5Gb and <10Tb is where data science magic happens. In this talk, we’ll share the trends we’re seeing at Graphlab in predictive application development, show how to build and deploy a predictive app that exploits the power of combining different data types and representations (like graphs and tables), and through customer case studies share some key lessons data scientists and developers should like to hear.
The first user of the Datagami API built an app to predict Bitcoin prices. We will demo the app and discuss the architecture of the underlying prediction engine.
17th–18th November 2014