Sessions at ApacheCon North America 2011 with slides

Your current filters are…

Friday 11th November 2011

  • Whirr: Open Source Cloud Services

    by Tom White

    Apache Whirr is an Incubator project which provides a way to run distributed systems - such as Hadoop, HBase, Cassandra, and ZooKeeper - in the cloud. Whirr provides a simple API for starting and stopping clusters for evaluation, test, or production purposes. Whirr is cloud neutral, so services may be run on a wide variety of cloud providers (such as Amazon EC2 or Rackspace), simply by changing a configuration property. This talk explains Whirr's architecture and shows how to use it.

    At 10:00am to 10:50am, Friday 11th November

    Coverage slide deck

  • Keynote | Watson, a Reasoning System: based on Apache Inside!

    by David Boloker

    IBM Watson is a reasoning system with a question and answer front end that processes natural language coming from both structured and unstructured data.

    Watson additionally incorporates analytics that the system learns to derive answer confidence and scoring. Boloker will discuss the Watson System and some of its key foundations that came from the Apache Software Foundation.

    At 11:30am to 12:20pm, Friday 11th November

    Coverage slide deck

  • Using JMeter For Testing A Data Center

    by Siegfried Goeschl

    This presentation gives you first-hand information how JMeter was used for testing web applications and web services at a large-scale data center consisting of altogether 300 applications servers and multiple database clusters. At the beginning JMeter and the testing approach is introduced before tackling the hard problems of setting up a scalable performance test infrastructure consisting of JMeter, Ant, Hudson and Git. During the course of the project a new reporting backend for JMeter was developed overcoming the limitations of the current XSLT approach because a SLA (Service Level Agreement) performance report was required based on huge JMeter result files (e.g. exceeding 2 gigabytes).

    At 1:30pm to 2:20pm, Friday 11th November

    Coverage slide deck

  • Breaking Down Widget Silos with a friendly Wookie

    by Ross Gardler

    Widgets/gadgets are mini applications written in HTML + Javascript. They offer cool and dynamic content that can be placed on any page on the web and, in some cases, on your desktop or your mobile device. Unfortunatley, there is not just one way to create and package widgets. we have Google Gadgets, W3C Widgets, OpenSocial Gadgets and Wave Gadgets to name just a few. Whilst widgets are an important part
    of web content delivery, particularly mobile web, the plethora of available widget/gadget standards could limit innovation by creating incompatible silos. This is where Apache Wookie (Incubating) comes in. Using Wookie we can harmonize all of these widgets/gadget standards behind the W3C Widget specification, thus freeing the user from concerns about implementation details.

    At 2:00pm to 2:50pm, Friday 11th November

    Coverage slide deck

  • Deployment With Apache Karaf and ACE

    by Jean-Baptiste Onofré

    Deployment of applications based on reusable components in a large network environment can become complicated very quickly. In this session, Jean-Baptise Onofré from Talend will describe how to manage deployment tasks using Apache Karaf and Apache ACE.

    Apache Karaf is a flexible, lightweight, enterprise-ready OSGi container that provides a runtime for a wide variety of components, including pure web applications and ESB-oriented services. Karaf's flexible tooling makes it suitable for large-scale deployment, and it supports multiple instances through high-availability and clustering.

    Apache ACE is a software distribution framework that provides centralized management of multi-node component deployments.

    Jean-Baptise will begin with an introduction to Karaf, covering the Karaf shell and the basics of multiple instance management. Then, he will demonstrate how to use ACE to provision applications running inside Karaf.

    At 2:00pm to 2:50pm, Friday 11th November

    Coverage slide deck

  • Instrumenting Hadoop Jobs for Fun and Profit

    by Shevek Mankin

    Instrumentation is a general purpose technique to automatically gather detailed information about the execution of a process.

    The distributed nature of a Hadoop job makes both the engineering of the instrumentation and the presentation of the output harder.

    However, instrumentation can also take advantage of a detailed knowledge of the code paths within Hadoop to build a much deeper insight into the behaviour of the user code.

    We will present our approach to general purpose instrumentation for Hadoop, which uses Hadoop-specific insights to profile, debug and diagnose faults in a job.

    We will describe techniques using attempt success/failure, internal exception rates and differential analysis, amongst others, to help us localize badly performing code or malformed input data without user intervention.

    At 2:00pm to 2:50pm, Friday 11th November

    Coverage slide deck

  • Kafka - A distributed publish/subscribe messaging system

    by Neha Narkhede

    Kafka is a distributed publish-subscribe messaging system aimed at providing a scalable, high-throughput, low latency solution for log aggregation and activity stream processing for LinkedIn. Built on Apache Zookeeper in Scala, Kafka aims at providing a unified stream for both real-time and offline consumption. We provide a mechanism for parallel data load into Hadoop as well as the ability to partition real-time consumption over a cluster of machines. Kafka combines the benefits of traditional log aggregators and messaging systems and has been used successfully in production for 8 months. It provides API similar to that of a messaging system and allows applications to consume log events in real-time. Written by the SNA team at LinkedIn, Kafka is open sourced under the Apache 2.0 License and preparing to be submitted as an Apache incubator project. In this presentation, we will highlight the core design principles for this system, and how this system fits into LinkedIn's data ecosystem as well as some of the products and monitoring applications it supports in our usage.

    At 2:00pm to 2:50pm, Friday 11th November

    Coverage slide deck

  • .NET @ Apache.org

    by Ted Husted

    Like it or not, many open source developers are moving to the Microsoft .NET platform, and we're bringing our favorite tools with us! In this session, we look inside ASF projects that are creating software for .NET and Mono -- like ActiveMQ, Chemistry, Logging, Lucene, QPid, and Thrift -- and show how to create leading-edge ASP.NET applications with ASF open source libraries. We'll also look at integrating other .NET open source projects, like Spring.NET, NVelocity, and JayRock, into your C# application to create a complete open source .NET stack.

    At 3:00pm to 3:50pm, Friday 11th November

    Coverage slide deck

  • Cassandra 1.0 and beyond

    by Jake Luciani

    The Cassandra distributed database has added many new features this year based on real-world needs of developers at Twitter, Netflix, Openwave, and others building massively scalable systems.

    This talk will cover the motivation and use cases behind features such as secondary indexes, Hadoop integration, SQL support, bulk loading, and more.

    Introduction
    ------------
    * Project history and goals

    Recap: Cassandra through 2010
    -----------------------------
    * Bulletproof reliability
    * Best-in-class support for multiple datacenters
    * High-performance storage engine based on Bigtable

    New in Cassandra 1.0
    --------------------
    * Dynamic column indexes
    * Distributed counters for realtime analytics
    * CQL/SQL and JDBC support
    * Bulk loading
    * Off-heap allocation for GC performance
    * Hadoop support

    At 3:00pm to 3:50pm, Friday 11th November

    Coverage slide deck

  • From Dev to DevOps

    by Carlos Sanchez

    The DevOps movement aims to improve communication between developers and operations teams to solve critical issues such as fear of change and risky deployments. But the same way that Agile development would likely fail without continuous integration tools, the DevOps principles need tools to make them real, and provide the automation required to actually be implemented. Most of the so called DevOps tools focus on the operations side, and there should be more than that, the automation must cover the full process, Dev to QA to Ops and be as automated and agile as possible.

    Tools in each part of the workflow have evolved in their own silos, and with the support of their own target teams. But a true DevOps mentality requires a seamless process from the start of development to the end in production deployments and maintenance, and for a process to be successful there must be tools that take the burden out of humans.

    Apache Maven has arguably been the most successful tool for development, project standardization and automation introduced in the last years. On the operations side we have open source tools like Puppet or Chef that are becoming increasingly popular to automate infrastructure maintenance and server provisioning.

    In this presentation we will introduce an end-to-end development-to-production process that will take advantage of Maven and Puppet, each of them at their strong points, and open source tools to automate the handover between them, automating continuous build and deployment, continuous delivery, from source code to any number of application servers managed with Puppet, running either in physical hardware or the cloud, handling new continuous integration builds and releases automatically through several stages and environments such as development, QA, and production.

    At 3:00pm to 3:50pm, Friday 11th November

  • NoSQL at work with JCR and Apache Jackrabbit

    by Carsten Ziegeler

    A Java content repository avoids content silos but also enables persisting of all application data ranging from small objects to audio or movie files. In contrast to relational databases the data is stored in a hierarchical way. This session enables a quickstart into JCR and demonstrates content modeling and handling content by developing a sample application.

    At 3:00pm to 3:50pm, Friday 11th November

  • Provisioning distributed OSGi applications in a cloud

    by Guillaume Nodet

    OSGi has become a key technology for modularity and this very dynamic platform is the best choice for creating containers as proven by the migration of all JEE servers toward OSGi. However, when it comes to deploying and managing huge deployments of OSGi based applications, the tools available are quite limited. This presentation will give you an overview of a solution for provisioning and configuring distributed OSGi based applications in such environments using Apache Karaf and Apache ZooKeeper.

    At 3:00pm to 3:50pm, Friday 11th November

    Coverage slide deck

  • Apache Commons Nabla: on the fly bytecode transformations for algorithmic differentiation

    by Phil Steitz

    4:00 - 4:50pm on Friday, November 11
    This talk presents an innovative use of the Java platform: using bytecode transformations to perform mathematical differentiation. This kind of operation is at the basis of numerous algorithms. It is straightforward for small equations but becomes a daunting task when applied to complex simulation models.
    Nabla (named after the differentiation operator) attempts to do it directly on compiled code and on the fly at runtime.
    The various issues related to these transformations are explained (class creation, instance creation, access to private parts, data sharing between differentiated and primitive instances ...).

    At 4:00pm to 4:50pm, Friday 11th November

    Coverage slide deck

  • Building a state of the art Content Repository with Apache Chemistry and CMIS

    by Stephan Klevenz

    This talk will provide a global overview of Content Repository technology and explain in more detail how implementing the CMIS standard (Content Management Interoperability Services) within the Apache Chemistry project significantly changed the landscape of content management technology and enabled developers to build more sophisticated content repositories.

    This talk will be of interest to any developer or architect curious to discover how a Content Repository can be a great middleware to help develop faster and better applications dealing with structured and unstructured data. It is also recommended to anyone who is interested in standards and who would like to have a better understanding of the OASIS CMIS standard and how it is implemented within the Apache Chemistry project.

    The talk will provide:
    - A global understanding of what a Content Repository is from a functional standpoint: exploring all the services it offers, identifying the main standards and technologies integrated (such as CMIS, which is a key one), and understanding the main technical challenges to be resolved, such as high scalability and high performance.
    - An introduction and presentation of the Apache Chemistry project, which became an Apache top level project earlier this year.
    - A retrospective on the evolution of this project and what it can bring compared to other technologies such as JCR.

    At 4:00pm to 4:50pm, Friday 11th November

    Coverage slide deck

  • Hot HA for Hadoop NameNode

    by Konstantin Shvachko

    Current HDFS design assumes that a single server, NameNode, dedicated to maintaining the file system metadata, controls the work of other cluster nodes, DataNodes, handling actual file data blocks. The system is designed to survive and recover in minutes from a loss of multiple DataNodes. But the NameNode failure makes the entire cluster unavailable, since there is no other place to obtain metadata information immediately. Although this design simplifies overall architecture of HDFS, it also makes the NameNode a single point failure, which is considered a serious deficiency for production grade systems.
    The primary goal of the proposed architecture is to build a highly available NameNode, which can failover to a Standby node in seconds, and which requires minimum changes to the existing code base.
    The architecture introduces a StandbyNode, which is an evolutionary modification of BackupNode already existing in HDFS. This is the only major change required to the current Hadoop code base. The approach further utilizes standard HA software like LinuxHA, and existing functionality of load balancing hardware or software platforms. The system is prototyped on eBay clusters.

    At 4:00pm to 4:50pm, Friday 11th November

    Coverage slide deck

  • Using OSGi to Build Better Software: Lessons from a Telemedicine Software for Smartphones and Desktop Systems

    by Doreen Seider

    OSGi brings benefits to Java applications and can even enable software to meet specified requirements. The talk will show how we applied OSGi to develop a telemedicine software for smartphones and desktop systems. This software captures vital signs from medical devices of patients via Bluetooth and sends them to medical expert centers. With this real-world example the talk will illustrate how we used dependency injection with OSGi Declarative Services (DS) to build an easy-to-use plugin and registry mechanism. It will demonstrate how we used the modularity of OSGi to have different deployments for different platforms without rewriting all of the code or how we used loose coupling between components via services to abstract hardware layers like Bluetooth.
    In this context the talk will also introduce in general the development of OSGi applications for smartphones using an OSGi stack for mobile systems and the appropriate development environment called mBS mobile. Problems we encountered during development on the different mobile systems will be described. In this context the talk will also introduce in general the development of OSGi applications for smartphones using an OSGi stack for mobile systems and the appropriate development environment called mBS mobile. Problems we encountered during development on the different mobile systems will also be described.

    At 4:00pm to 4:50pm, Friday 11th November

    Coverage slide deck