by David Strom
Pulling from ReadWriteWeb.com’s coverage of Big Data technologies in the Enterprise, we’ll see examples of how FedEx, Associated Press and others are using Big Data to drive their decisions.
by Rob Peglar
This talk will cover several current topics in big data and specific analytic use cases in financial services and healthcare. The use of Hadoop and associated toolsets, along with optimal HDFS architecture for analysis problems at scale, will be discussed and best practices outlined.
by Bill Eldredge
At Nokia, we expect to save millions on avoided license fees this year on a single “Big Data” project by creating a symbiotic relationship between our traditional RDBMS storage and our newer Hadoop cluster. Our hybrid approach to data enables us to manage the convergence of structured and unstructured data, and save money. In our case we use Hadoop to process and import data into traditional systems. We have found that this use of Hadoop as a preprocessing engine has enabled maximum value to be derived from our systems, our data and our people.
Social media “listening research” has emerged as a powerful alternative to more traditional, “asking research.” through a number of examples, you’ll find out how to research important brand topics, provide in-depth insights to new product development, segment analysis and broader topics that you might not previously have had the funds to research. Using a mixture of “paid” and “unpaid” tools, you’ll learn how to use this unique method for your important research questions.
by Erich Hochmuth
Hadoop is quickly becoming the preferable platform for performing analysis over large datasets. We will explore opportunities for utilizing MapReduce to process genomic data in an enterprise system.
We will discuss how MapReduce is being used to scale existing data processing workflows and lessons learned migrating existing algorithms and workflows to MapReduce. Also we will touch on advanced capabilities of MapReduce such as composite keys, secondary sorting, and data serialization.
by Scott Fines
Recent years have seen a sudden and rapid introduction of new technologies for distributing applications to essentially arbitrary levels. The growth in variety and depth of these different systems has grown to match, and it can be a challenge just to keep up. In this talk, I’ll discuss some of the more common systems such as Hadoop, HBase, and Cassandra, and some of the different scenarios and pitfalls of using them. I’ll cover when MapReduce is powerful and helpful, and when it’s better to use a different approach. Putting it all together, I’ll mention ZooKeeper, Flume, and some of the surrounding small projects that can help make a useable system.
by Pritam Damania
Reliable backup and recovery is one of the main requirements for any enterprise grade application. HBase has been very well embraced by enterprises needing random, real-time read/write access with huge volumes of data and ease of scalability. As such, they are looking for backup solutions that are reliable, easy to use, and can co-exist with existing infrastructure. HBase comes with several backup options but there is a clear need to improve the native export mechanisms. This talk will cover various options that are available out of the box, their drawbacks and what various companies are doing to make backup and recovery efficient. In particular it will cover what Facebook has done to improve performance of backup and recovery process with minimal impact to production cluster.
by Alex Miller
Many big data use cases involve moving many data sources into Hadoop where the data can be merged, summarized, and transformed. However, due to the volume and variety of data being poured into Hadoop, we need better tools for describing and connecting the data outside Hadoop, the data inside Hadoop, and the transformations between a variety of domains.
Semantic web standards like RDF (Resource Description Framework) and the SPARQL query language provide flexible tools for describing and querying virtually any kind of data or metadata. Traditionally these tools are used with RDF “triple stores”, however we can also apply these technologies to describing the data inside and outside Hadoop. These technologies can be used to load data into Hadoop, transform it while it’s there, query it, and export it, all in terms defined by the business and the data owners.
This talk will demonstrate how RDF can be used to describe a variety of data and metadata, how data stored in Hadoop can be transformed or virtualized as an RDF graph, and how queries and transformations can be defined by SPARQL and R2RML (the RDB to RDF Mapping Language).
by Jim Duey
Big data requires big resources which cost big money. But if you only need answers that are good enough, rather than precisely right, probabilistic data structures can be a way to get those answers with a fraction of the resources and cost. In this talk I’ll survey some different data structures, give some theory behind them and point out some use cases.
1st August 2012