by Bruce Smith
Social media applications encounter messy user-generated data in blog posts, status updates, tweets, user profiles, etc. These documents contain free-form text that obeys no particular rules of grammar, punctuation or spelling.
If the data is so messy, how can a computer program recognize adult content or hate speech or spam? How can a computer program tell the difference between an advertisement and a product review? How can a computer program distinguish between a positive and a negative product review?
Machine learning offers some solutions. For example, given sample tweets labeled (by people) as spam or non-spam, machine learning tools can generate a program (or model) that attempts to duplicate the human judgments. You could use this kind of model in your application to filter out tweet spam.
In this talk we will describe
•Some common machine learning algorithms
•Machine learning tools – free and commercial
•Acquiring and managing training data
•Extracting useful features from your documents
•Choosing the right technique for a problem
•Measuring quality and improving your model over time
•Integrating a machine learned model with your application
Coming out of this session, you will know where you might use machine learning in your applications, and you will know how to get started.
Big Data creates problems and opportunities that do not exist when dealing with smaller datasets. You will learn how to scale, utilize, and visualize Big Data as well as create and integrate Big Data related APIs. We will talk about how to scale your data, expose your data through APIs, integrate existing data from the data marketplace, and communicate your data through visualization.You will find out what techniques and strategies work best when working with Big Data. Many developers have learned how to scale their systems for high levels of concurrency. However, scaling for Big Data has its own unique challenges. Sometimes strategies that would make no sense for smaller systems work great when dealing with larger datasets. This Workshop is geared towards PHP developers, but all are welcome.
In the old days it was DJs, A&R folks, labels and record store owners that were the gatekeepers to music. Today, we are seeing a new music gatekeeper emerge... the developer. Using open APIs, developers are creating new apps that change how people explore, discover, create and interact with music. But developers can't do it alone. They need data like gig listings, lyrics, recommendation tools and, of course, music! And they need it from reliable, structured and legitimate sources.
In this presentation we'll discuss and explore what is happening right now in the thriving music developer ecosystem. We'll describe some of the novel APIs that are making this happen and what sort of building blocks are being put into place from a variety of different sources. We'll demonstrate how companies within this ecosystem are working closely together in a spirit of co-operation. Each providing their own pieces to an expanding pool of resources from which developers can play, develop and create new music apps across different mediums - web, mobile, software and hardware. We'll highlight some of the next-generation of music apps that are being created in this thriving ecosystem.
Finally we'll take a look at how music developers are coming together at events like Music Hack Day, where participants have just 24 hours to build the next generation of music apps. Someone once said, "APIs are the sex organs of software. Data is the DNA." If this is true, then Music Hack Days are orgies.
Faced with the costs of vertically scaling their relational database systems, developers are increasingly turning to Apache Cassandra as an alternative. Cassandra solves the scaling problem by partitioning data, expanding horizontally and promising replication consistency. Effectively utilizing Cassandra requires that developers take different approaches to the ways they model data used in their applications. This presentation will explain how Cassandra achieves scale and reliability, and give an example of porting a SQL schema to Cassandra.
Big Data solutions, such as Apache Hadoop and Apache Cassandra, are growing up and are in the process of moving out of a grassroots movement to widespread adoption. Unfortunately, the majority of the technical expertise still lies in the hands of the open source project contributors and most solutions are tackled from the bottom up, starting with the technical problems. The collateral that is presently available is largely from the social media giants that tout solutions built using 10,000 node clusters that process petabytes of data a day. The reality? The average person just cannot relate or intuitively draw parallels to their own business problems.
While Big Data solutions are worthwhile far before you reach petabyte scale data, just getting started can be a challenge in itself. New open source projects are being regularly released that tackle a variety of issues related to Big Data, some of which are just slightly different to existing technologies. Just how does one navigate the plethora of technologies to design workable solutions to business problems? What if you only have gigabytes or terabytes of "medium" data on a small cluster? This panel features Solution Architects from a variety of key companies in the Big Data space which will provide deep dive technical discussions on real solutions we've employed for our customers, across a variety of industries, starting with the business problems.
Open APIs are sweeping through public media, just like the rest of the world, but folks at NPR, PBS and others are thinking even bigger. Public media is in an unprecedented project to build an open API called the Public Media Platform (PMP) that will help developers create applications that bring personalized public media content to new platforms. Come learn from the leaders of the PMP on how this project is rolling out, where it is headed and how it can benefit you. We will be discussing how public media is creating the right technology layer, as well as balancing business rules to build new opportunities for our media to be For, By and Of the People.
11th–15th March 2011