Get Lanyrd on your mobile (iPhone, Android and more) - check it out here

StampedeCon 2012 schedule

Wednesday 1st August 2012

  • Opening Remarks

    At 8:15am to 8:30am, Wednesday 1st August

  • How Big Data Can Help Your Business: Case Studies from ReadWriteWeb

    by David Strom

    Pulling from ReadWriteWeb.com’s coverage of Big Data technologies in the Enterprise, we’ll see examples of how FedEx, Associated Press and others are using Big Data to drive their decisions.

    At 8:30am to 9:10am, Wednesday 1st August

  • Big Data and the Analysis Conundrum - Challenges and Opportunies

    by Rob Peglar

    This talk will cover several current topics in big data and specific analytic use cases in financial services and healthcare. The use of Hadoop and associated toolsets, along with optimal HDFS architecture for analysis problems at scale, will be discussed and best practices outlined.

    At 9:10am to 9:50am, Wednesday 1st August

  • Making your Analytics Investment Pay Off

    by Bill Eldredge

    At Nokia, we expect to save millions on avoided license fees this year on a single “Big Data” project by creating a symbiotic relationship between our traditional RDBMS storage and our newer Hadoop cluster. Our hybrid approach to data enables us to manage the convergence of structured and unstructured data, and save money. In our case we use Hadoop to process and import data into traditional systems. We have found that this use of Hadoop as a preprocessing engine has enabled maximum value to be derived from our systems, our data and our people.

    At 10:10am to 10:50am, Wednesday 1st August

  • Listening for Insights: The Power of Social Media Listening

    by Frank Cotignola

    Social media “listening research” has emerged as a powerful alternative to more traditional, “asking research.” through a number of examples, you’ll find out how to research important brand topics, provide in-depth insights to new product development, segment analysis and broader topics that you might not previously have had the funds to research. Using a mixture of “paid” and “unpaid” tools, you’ll learn how to use this unique method for your important research questions.

    At 10:50am to 11:30am, Wednesday 1st August

    Coverage write-up

  • MapReduce Best Practices and Lessons Learned Applied to Enterprise Datasets

    by Erich Hochmuth

    Hadoop is quickly becoming the preferable platform for performing analysis over large datasets. We will explore opportunities for utilizing MapReduce to process genomic data in an enterprise system.

    We will discuss how MapReduce is being used to scale existing data processing workflows and lessons learned migrating existing algorithms and workflows to MapReduce. Also we will touch on advanced capabilities of MapReduce such as composite keys, secondary sorting, and data serialization.

    At 11:30am to 12:10pm, Wednesday 1st August

  • Welcome to the Jungle: Distributed Systems for Fun and Profit

    by Scott Fines

    Recent years have seen a sudden and rapid introduction of new technologies for distributing applications to essentially arbitrary levels. The growth in variety and depth of these different systems has grown to match, and it can be a challenge just to keep up. In this talk, I’ll discuss some of the more common systems such as Hadoop, HBase, and Cassandra, and some of the different scenarios and pitfalls of using them. I’ll cover when MapReduce is powerful and helpful, and when it’s better to use a different approach. Putting it all together, I’ll mention ZooKeeper, Flume, and some of the surrounding small projects that can help make a useable system.

    At 1:30pm to 2:10pm, Wednesday 1st August

  • HBase Backups

    by Pritam Damania

    Reliable backup and recovery is one of the main requirements for any enterprise grade application. HBase has been very well embraced by enterprises needing random, real-time read/write access with huge volumes of data and ease of scalability. As such, they are looking for backup solutions that are reliable, easy to use, and can co-exist with existing infrastructure. HBase comes with several backup options but there is a clear need to improve the native export mechanisms. This talk will cover various options that are available out of the box, their drawbacks and what various companies are doing to make backup and recovery efficient. In particular it will cover what Facebook has done to improve performance of backup and recovery process with minimal impact to production cluster.

    At 2:10pm to 2:50pm, Wednesday 1st August

  • Big Data with Semantics

    by Alex Miller

    Many big data use cases involve moving many data sources into Hadoop where the data can be merged, summarized, and transformed. However, due to the volume and variety of data being poured into Hadoop, we need better tools for describing and connecting the data outside Hadoop, the data inside Hadoop, and the transformations between a variety of domains.

    Semantic web standards like RDF (Resource Description Framework) and the SPARQL query language provide flexible tools for describing and querying virtually any kind of data or metadata. Traditionally these tools are used with RDF “triple stores”, however we can also apply these technologies to describing the data inside and outside Hadoop. These technologies can be used to load data into Hadoop, transform it while it’s there, query it, and export it, all in terms defined by the business and the data owners.

    This talk will demonstrate how RDF can be used to describe a variety of data and metadata, how data stored in Hadoop can be transformed or virtualized as an RDF graph, and how queries and transformations can be defined by SPARQL and R2RML (the RDB to RDF Mapping Language).

    At 3:10pm to 3:50pm, Wednesday 1st August

  • A Survey of Probabilistic Data Structures

    by Jim Duey

    Big data requires big resources which cost big money. But if you only need answers that are good enough, rather than precisely right, probabilistic data structures can be a way to get those answers with a fraction of the resources and cost. In this talk I’ll survey some different data structures, give some theory behind them and point out some use cases.

    At 3:50pm to 4:30pm, Wednesday 1st August