ApacheCon North America 2011 schedule

Thursday 10th November 2011

  • Security Problems (and Solutions) for Service Oriented Applications

    by Daniel Kulp

    One important aspect of Service Oriented Architecture that is often challenging to address and implement is Security. Providing robust and scalable security solutions throughout highly distributed applications is a difficult problem to solve. For traditional WebServices, standards like WS-Security, WS-Trust, WS-SecureConversation, and WS-SecurityPolicy have emerged to ease some of those problems. However, those standards don't always provide the best solutions for modern distributed applications that may include REST based services in addition to traditional SOAP based applications.

    This session will cover various options for providing security to your services and will demonstrate how Apache projects such as CXF, WSS4J, Camel, and Karaf can work together to provide a complete security solution.

    At 5:00pm to 5:50pm, Thursday 10th November

Friday 11th November 2011

  • Becoming a content-driven, modular application: A Case Study

    by Brett Porter

    Further building on the session from ApacheCon NA 2010, see how Apache Archiva revitalised itself by moving to being content-driven, and becoming more modular. The discussion includes moving from a database and ORM solution, how we defined the content model for the application and started migrating the data and the architecture, and the natural benefits we immediately found in the process. We also look at how the application was modularised more effectively, and the possibility at making it more dynamic with other componennt models and OSGi. We will discuss the other technologies such as Apache Jackrabbit and Apache Felix we used and evaluated and what we learned about them on the way.

    At 9:00am to 9:50am, Friday 11th November

  • OSGi for mere mortals

    by Bertrand Delacretaz

    In the last few years, OSGi has become "the" module system for Java - but is OSGi just for gurus, or are mere mortals actually able to use it?
    The simple tutorial RESTful server application presented in this talk aims to demonstrate that the average Java developer can actually use OSGi, and greatly benefit from it. It is built from scratch based on a set of standard and custom OSGi services, in a simple and understandable way.
    Taking advantage of a number of build plugins and runtime tools provided by the Apache Felix and Apache Sling projects allows us to write little code in our example application, while exposing the advantages of an OSGi-based architecture in a simple and convincing way.
    Our walkthrough of the example application will give developers a way to get started with OSGi, without getting bogged in unnecessary details.

    At 9:00am to 9:50am, Friday 11th November

    Coverage slide deck

  • Rapid application development for dynamic cloud applications

    by Marcel Offermans and Bram de Kruijff

    In this presentation we would like to introduce and demonstrate Amdatu, a new open source community project that is building an application platform for dynamic composite service oriented applications in the cloud.

    The Amdatu Platform consists of a set of enabling services and application layer services and can run on public, private or hybrid (cloud-burst) environments. These services provide the foundation for any application built on the platform and consist of an OSGi framework that is cluster-ready, multi-tenant aware and can be accessed via a comprehensive set of REST based services. Applications can be dynamically assembled, monitored and managed.

    The application layer services are available to any application running on the platform and include functionality like software and configuration provisioning, semantic indexing, search and recommendation, authentication and authorization, an open social implementation, extensive user and social network profiling and big data storage.

    Amdatu leverages many Apache projects and hopes to join the Apache Incubator in the near future.

    At 9:00am to 9:50am, Friday 11th November

    Coverage slide deck

  • Serving "Web" Over IPv6

    by Issac Goldstand

    Over the course of 2011, pressure has been steadily rising to move over from IPv4 to IPv6, but many developers, IT engineers and even users don't have a clue of how it all works.

    This presentation will introduce IPv6 to newcomers, and explain basic setup for popular Apache technologies, including the Apache Web Server and Apache Tomcat.

    At 9:00am to 9:50am, Friday 11th November

    Coverage slide deck

  • State of the Elephant: Hadoop yesterday, today and tomorrow

    by Owen O'Malley

    Apache Hadoop is rapidly gaining usage across the enterprise market and has become the primary framework for processing large datasets. It helps companies derive more value from the data that they already have and enables them to collect and analyze more data. Spreading for the early adopters in the internet sites (Yahoo, Facebook, Amazon, LinkedIn and Ebay) to a much wider audience, Hadoop is disrupting the business of analyzing data. The presentation will describe the current state of the project, lessons learned by deploying it at scale, and the roadmap for the future of the project.

    At 9:00am to 9:50am, Friday 11th November

    Coverage slide deck

  • Apache Celix - Universal OSGi?

    by Alexander Broekhuis

    Systems which require dynamics and interoperability need a good architecture and clear design principles. OSGi provides this for Java based systems, but for embedded/native systems currently no alternative is available.

    To fill this gap, Apache Celix is created. Apache Celix is an implementation of the OSGi specification adapted to C, focussed on embedded and distributed platforms. Celix is a new Apache project currently starting up in the Incubator.

    The goal is, to follow the OSGi specification as much as possible, and grow towards an implementation with most of the OSGi Core and Compendium implemented.

    To be able to support distributed platforms, the Remote Services from the Enterprise Specification will be implemented. Using Remote Services also makes it possible to create interoperability with Java OSGi.

    For distributed systems, deployment/distribution is an important aspect. For OSGi Apache Ace can be used to maintain and manage deployment to multiple targets. This makes Apache Ace a perfect candidate for deployment of Celix Bundles and Artifacts.

    This presentation shows how Celix solves the dynamic aspects of OSGi services in C. It will detail differences between the OSGi Specification and the Celix solution.

    At 10:00am to 10:50am, Friday 11th November

    Coverage slide deck

  • Apache Mahout for intelligent data analysis

    by Isabel Drost

    "Searching the internet" has become a common pattern when looking for information. However with current tools finding the relevant piece of data often turns out to be similar to searching for the needle in a haystack of unstructured information: Social networks, corporate content management systems, micro blogging platforms, tend to generate an ever increasing flow of online data.

    This talk gives an introduction to Apache Mahout - a framework of scalable implementations of algorithms for data mining and machine learning. After motivating the need for machine learning the talk gives an overview of the features of Apache Mahout. The talk shows how to integrate how to integrate Mahout into your application. It shows the tremendous improvements that have been implemented in recent past - including the addition of several algorithms, performance improvements and better APIs for integration.

    At 10:00am to 10:50am, Friday 11th November

  • Bridging traditional Open Source Content Management and the Web of Data with the Apache Stanbol Semantic Engine

    by Olivier Grisel

    This talk will introduce the Stanbol project and showcase how it can be integrated in traditional Enterprise Content Management solutions.

    Stanbol is an Open Source project under incubation at the Apache Software Foundation. Its goal is to provide Web and CMS developers with a set of HTTP / RESTful services to help them integrate semantic technologies into their products and web sites.

    The following Stanbol services are currently under active developments:

    - Enhancement engines: use Natural Language Processing tools such as Apache OpenNLP to extract knowledge (topics, named entities, facts) from unstructured content and link it to unambiguous URIs from reference knowledge bases;

    - Entity Hub: a Linked Data indexing cache built on top of Apache Solr, Clerezza and Jena that comes with precomputed indexes and live connectors to popular knowledge bases such as DBpedia, Geonames, YAGO...

    - Content Hub: a faceted search engine based on Solr to search for content using the knowledge automatically extracted by the enhancement engines;

    - CMS bridges to lift the structured content of document repositories using the JCR and CMIS access protocols (using Apache Chemistry) and store the result into a triple store suitable for SPARQL access;

    - Rules engine based on Apache Jena for knowledge refactoring (e.g. convert extracted knowledge into the rich snippet vocabulary for SEO), integrity checks, merging rules, deductive inference...

    The Semantic Web has made significant progress over the last years, and while it always gave a lot of promises, it is now the time where it can concretely be used in Enterprise Solutions.

    If you are curious about the web of data, and want to see how concretely it can be used and integrated today in enterprise solutions thanks to software like the Stanbol projects, this session is for you.

    You should also attend if you are interested in emerging technologies and don't have knowledge about semantic technologies, this will provide a good insight on how they can disrupt the usual way to develop applications.

    At 10:00am to 10:50am, Friday 11th November

    Coverage slide deck

  • Inside the Apache Infrastructure Tream

    by Philip M. Gollucci

    Have you ever wondered how a team of barely 10 rotating volunteers and 2 paid staff can manage a 24x7x365 infrastructure that spans 3 continents and is used by millions of people across the globe? All without any offices and the bare minimum of paper work?
    Well now's your chance! Hear it directly from the camels mouth, the VP of Apache Infrastructure. We'll tell you all the technologies, how staff is managed, how responsibilities are delegated out, how we deal with 3rd party vendors, and best of all how we leverage the Apache Way to accomplish our goals.

    At 10:00am to 10:50am, Friday 11th November

    Coverage slide deck

  • Open Development in the Enterprise

    by Phil Steitz

    In this talk, we will explore the question: what can corporate IT organizations learn from leading OSS communities? We will look at how open development concepts such as transparency, meritocracy and community oversight can be applied in corporate settings and what the quality, speed, flexibility and human resource development benefits can be. We will also discuss how collaborative development infrastructure and processes used by leading OSS communities can be leveraged inside the enterprise. We will discuss challenges and opportunities in establishing open development infrastructure and practices in a corporate setting. Finally, we will discuss strategies for influencing corporate culture to accept and embrace, rather than reject open development concepts.

    At 10:00am to 10:50am, Friday 11th November

    Coverage slide deck

  • Whirr: Open Source Cloud Services

    by Tom White

    Apache Whirr is an Incubator project which provides a way to run distributed systems - such as Hadoop, HBase, Cassandra, and ZooKeeper - in the cloud. Whirr provides a simple API for starting and stopping clusters for evaluation, test, or production purposes. Whirr is cloud neutral, so services may be run on a wide variety of cloud providers (such as Amazon EC2 or Rackspace), simply by changing a configuration property. This talk explains Whirr's architecture and shows how to use it.

    At 10:00am to 10:50am, Friday 11th November

    Coverage slide deck

  • Keynote | Watson, a Reasoning System: based on Apache Inside!

    by David Boloker

    IBM Watson is a reasoning system with a question and answer front end that processes natural language coming from both structured and unstructured data.

    Watson additionally incorporates analytics that the system learns to derive answer confidence and scoring. Boloker will discuss the Watson System and some of its key foundations that came from the Apache Software Foundation.

    At 11:30am to 12:20pm, Friday 11th November

    Coverage slide deck

  • Using JMeter For Testing A Data Center

    by Siegfried Goeschl

    This presentation gives you first-hand information how JMeter was used for testing web applications and web services at a large-scale data center consisting of altogether 300 applications servers and multiple database clusters. At the beginning JMeter and the testing approach is introduced before tackling the hard problems of setting up a scalable performance test infrastructure consisting of JMeter, Ant, Hudson and Git. During the course of the project a new reporting backend for JMeter was developed overcoming the limitations of the current XSLT approach because a SLA (Service Level Agreement) performance report was required based on huge JMeter result files (e.g. exceeding 2 gigabytes).

    At 1:30pm to 2:20pm, Friday 11th November

    Coverage slide deck

  • Breaking Down Widget Silos with a friendly Wookie

    by Ross Gardler

    Widgets/gadgets are mini applications written in HTML + Javascript. They offer cool and dynamic content that can be placed on any page on the web and, in some cases, on your desktop or your mobile device. Unfortunatley, there is not just one way to create and package widgets. we have Google Gadgets, W3C Widgets, OpenSocial Gadgets and Wave Gadgets to name just a few. Whilst widgets are an important part
    of web content delivery, particularly mobile web, the plethora of available widget/gadget standards could limit innovation by creating incompatible silos. This is where Apache Wookie (Incubating) comes in. Using Wookie we can harmonize all of these widgets/gadget standards behind the W3C Widget specification, thus freeing the user from concerns about implementation details.

    At 2:00pm to 2:50pm, Friday 11th November

    Coverage slide deck

  • Deployment With Apache Karaf and ACE

    by Jean-Baptiste Onofré

    Deployment of applications based on reusable components in a large network environment can become complicated very quickly. In this session, Jean-Baptise Onofré from Talend will describe how to manage deployment tasks using Apache Karaf and Apache ACE.

    Apache Karaf is a flexible, lightweight, enterprise-ready OSGi container that provides a runtime for a wide variety of components, including pure web applications and ESB-oriented services. Karaf's flexible tooling makes it suitable for large-scale deployment, and it supports multiple instances through high-availability and clustering.

    Apache ACE is a software distribution framework that provides centralized management of multi-node component deployments.

    Jean-Baptise will begin with an introduction to Karaf, covering the Karaf shell and the basics of multiple instance management. Then, he will demonstrate how to use ACE to provision applications running inside Karaf.

    At 2:00pm to 2:50pm, Friday 11th November

    Coverage slide deck

  • Instrumenting Hadoop Jobs for Fun and Profit

    by Shevek Mankin

    Instrumentation is a general purpose technique to automatically gather detailed information about the execution of a process.

    The distributed nature of a Hadoop job makes both the engineering of the instrumentation and the presentation of the output harder.

    However, instrumentation can also take advantage of a detailed knowledge of the code paths within Hadoop to build a much deeper insight into the behaviour of the user code.

    We will present our approach to general purpose instrumentation for Hadoop, which uses Hadoop-specific insights to profile, debug and diagnose faults in a job.

    We will describe techniques using attempt success/failure, internal exception rates and differential analysis, amongst others, to help us localize badly performing code or malformed input data without user intervention.

    At 2:00pm to 2:50pm, Friday 11th November

    Coverage slide deck

  • Kafka - A distributed publish/subscribe messaging system

    by Neha Narkhede

    Kafka is a distributed publish-subscribe messaging system aimed at providing a scalable, high-throughput, low latency solution for log aggregation and activity stream processing for LinkedIn. Built on Apache Zookeeper in Scala, Kafka aims at providing a unified stream for both real-time and offline consumption. We provide a mechanism for parallel data load into Hadoop as well as the ability to partition real-time consumption over a cluster of machines. Kafka combines the benefits of traditional log aggregators and messaging systems and has been used successfully in production for 8 months. It provides API similar to that of a messaging system and allows applications to consume log events in real-time. Written by the SNA team at LinkedIn, Kafka is open sourced under the Apache 2.0 License and preparing to be submitted as an Apache incubator project. In this presentation, we will highlight the core design principles for this system, and how this system fits into LinkedIn's data ecosystem as well as some of the products and monitoring applications it supports in our usage.

    At 2:00pm to 2:50pm, Friday 11th November

    Coverage slide deck

  • Turning the Knobs - Real world Java, server and application performance tuning

    by Filip Hanik

    Where do you start troubleshooting a performance problem? At the network level, database queries, slow code?
    Join Apache Tomcat committer Filip Hanik, Staff Systems Engineer at VMWare, in a informative session on how to tackle and resolve performance problems using a systematic approach.

    Performance problems can manifest themselves at all layers of an application and infrastructure. Troubleshooting and tuning efforts are often focused on the components that an administrator or developer feel a core competency in, resulting in less than optimal results. In this session, we will focus our efforts in detecting a problem, identifying the source of the problem and correctly solve the problem with predictable results.

    After this session, you will have a better understanding of the process of identifying problems with hardware, network topologies, load balancers, web servers, Java applications and database technologies using tools available in various parts of the system as well as correctly identify the knobs to turn and how to interpret the results of various performance tuning metrics.

    At 2:00pm to 2:50pm, Friday 11th November

  • .NET @ Apache.org

    by Ted Husted

    Like it or not, many open source developers are moving to the Microsoft .NET platform, and we're bringing our favorite tools with us! In this session, we look inside ASF projects that are creating software for .NET and Mono -- like ActiveMQ, Chemistry, Logging, Lucene, QPid, and Thrift -- and show how to create leading-edge ASP.NET applications with ASF open source libraries. We'll also look at integrating other .NET open source projects, like Spring.NET, NVelocity, and JayRock, into your C# application to create a complete open source .NET stack.

    At 3:00pm to 3:50pm, Friday 11th November

    Coverage slide deck

  • Cassandra 1.0 and beyond

    by Jake Luciani

    The Cassandra distributed database has added many new features this year based on real-world needs of developers at Twitter, Netflix, Openwave, and others building massively scalable systems.

    This talk will cover the motivation and use cases behind features such as secondary indexes, Hadoop integration, SQL support, bulk loading, and more.

    Introduction
    ------------
    * Project history and goals

    Recap: Cassandra through 2010
    -----------------------------
    * Bulletproof reliability
    * Best-in-class support for multiple datacenters
    * High-performance storage engine based on Bigtable

    New in Cassandra 1.0
    --------------------
    * Dynamic column indexes
    * Distributed counters for realtime analytics
    * CQL/SQL and JDBC support
    * Bulk loading
    * Off-heap allocation for GC performance
    * Hadoop support

    At 3:00pm to 3:50pm, Friday 11th November

    Coverage slide deck

  • From Dev to DevOps

    by Carlos Sanchez

    The DevOps movement aims to improve communication between developers and operations teams to solve critical issues such as fear of change and risky deployments. But the same way that Agile development would likely fail without continuous integration tools, the DevOps principles need tools to make them real, and provide the automation required to actually be implemented. Most of the so called DevOps tools focus on the operations side, and there should be more than that, the automation must cover the full process, Dev to QA to Ops and be as automated and agile as possible.

    Tools in each part of the workflow have evolved in their own silos, and with the support of their own target teams. But a true DevOps mentality requires a seamless process from the start of development to the end in production deployments and maintenance, and for a process to be successful there must be tools that take the burden out of humans.

    Apache Maven has arguably been the most successful tool for development, project standardization and automation introduced in the last years. On the operations side we have open source tools like Puppet or Chef that are becoming increasingly popular to automate infrastructure maintenance and server provisioning.

    In this presentation we will introduce an end-to-end development-to-production process that will take advantage of Maven and Puppet, each of them at their strong points, and open source tools to automate the handover between them, automating continuous build and deployment, continuous delivery, from source code to any number of application servers managed with Puppet, running either in physical hardware or the cloud, handling new continuous integration builds and releases automatically through several stages and environments such as development, QA, and production.

    At 3:00pm to 3:50pm, Friday 11th November

  • NoSQL at work with JCR and Apache Jackrabbit

    by Carsten Ziegeler

    A Java content repository avoids content silos but also enables persisting of all application data ranging from small objects to audio or movie files. In contrast to relational databases the data is stored in a hierarchical way. This session enables a quickstart into JCR and demonstrates content modeling and handling content by developing a sample application.

    At 3:00pm to 3:50pm, Friday 11th November

  • Provisioning distributed OSGi applications in a cloud

    by Guillaume Nodet

    OSGi has become a key technology for modularity and this very dynamic platform is the best choice for creating containers as proven by the migration of all JEE servers toward OSGi. However, when it comes to deploying and managing huge deployments of OSGi based applications, the tools available are quite limited. This presentation will give you an overview of a solution for provisioning and configuring distributed OSGi based applications in such environments using Apache Karaf and Apache ZooKeeper.

    At 3:00pm to 3:50pm, Friday 11th November

    Coverage slide deck

  • Apache Commons Nabla: on the fly bytecode transformations for algorithmic differentiation

    by Phil Steitz

    4:00 - 4:50pm on Friday, November 11
    This talk presents an innovative use of the Java platform: using bytecode transformations to perform mathematical differentiation. This kind of operation is at the basis of numerous algorithms. It is straightforward for small equations but becomes a daunting task when applied to complex simulation models.
    Nabla (named after the differentiation operator) attempts to do it directly on compiled code and on the fly at runtime.
    The various issues related to these transformations are explained (class creation, instance creation, access to private parts, data sharing between differentiated and primitive instances ...).

    At 4:00pm to 4:50pm, Friday 11th November

    Coverage slide deck

  • Building a state of the art Content Repository with Apache Chemistry and CMIS

    by Stephan Klevenz

    This talk will provide a global overview of Content Repository technology and explain in more detail how implementing the CMIS standard (Content Management Interoperability Services) within the Apache Chemistry project significantly changed the landscape of content management technology and enabled developers to build more sophisticated content repositories.

    This talk will be of interest to any developer or architect curious to discover how a Content Repository can be a great middleware to help develop faster and better applications dealing with structured and unstructured data. It is also recommended to anyone who is interested in standards and who would like to have a better understanding of the OASIS CMIS standard and how it is implemented within the Apache Chemistry project.

    The talk will provide:
    - A global understanding of what a Content Repository is from a functional standpoint: exploring all the services it offers, identifying the main standards and technologies integrated (such as CMIS, which is a key one), and understanding the main technical challenges to be resolved, such as high scalability and high performance.
    - An introduction and presentation of the Apache Chemistry project, which became an Apache top level project earlier this year.
    - A retrospective on the evolution of this project and what it can bring compared to other technologies such as JCR.

    At 4:00pm to 4:50pm, Friday 11th November

    Coverage slide deck

  • FreeBSD + ASF software & philosphy + ZFS == large $$ bonuses from your boss

    by Philip M. Gollucci

    A record holing TCP/IP stack, *the* cutting edge File System, and most respected Software collection..... What do you get? A much happier system admin and a much better work place. The cloud might be hot stuff, but its not for everyone or every task. How can you take your existing Data Center infrastructure and make it better?

    At 4:00pm to 4:50pm, Friday 11th November

  • Hot HA for Hadoop NameNode

    by Konstantin Shvachko

    Current HDFS design assumes that a single server, NameNode, dedicated to maintaining the file system metadata, controls the work of other cluster nodes, DataNodes, handling actual file data blocks. The system is designed to survive and recover in minutes from a loss of multiple DataNodes. But the NameNode failure makes the entire cluster unavailable, since there is no other place to obtain metadata information immediately. Although this design simplifies overall architecture of HDFS, it also makes the NameNode a single point failure, which is considered a serious deficiency for production grade systems.
    The primary goal of the proposed architecture is to build a highly available NameNode, which can failover to a Standby node in seconds, and which requires minimum changes to the existing code base.
    The architecture introduces a StandbyNode, which is an evolutionary modification of BackupNode already existing in HDFS. This is the only major change required to the current Hadoop code base. The approach further utilizes standard HA software like LinuxHA, and existing functionality of load balancing hardware or software platforms. The system is prototyped on eBay clusters.

    At 4:00pm to 4:50pm, Friday 11th November

    Coverage slide deck

  • Using OSGi to Build Better Software: Lessons from a Telemedicine Software for Smartphones and Desktop Systems

    by Doreen Seider

    OSGi brings benefits to Java applications and can even enable software to meet specified requirements. The talk will show how we applied OSGi to develop a telemedicine software for smartphones and desktop systems. This software captures vital signs from medical devices of patients via Bluetooth and sends them to medical expert centers. With this real-world example the talk will illustrate how we used dependency injection with OSGi Declarative Services (DS) to build an easy-to-use plugin and registry mechanism. It will demonstrate how we used the modularity of OSGi to have different deployments for different platforms without rewriting all of the code or how we used loose coupling between components via services to abstract hardware layers like Bluetooth.
    In this context the talk will also introduce in general the development of OSGi applications for smartphones using an OSGi stack for mobile systems and the appropriate development environment called mBS mobile. Problems we encountered during development on the different mobile systems will be described. In this context the talk will also introduce in general the development of OSGi applications for smartphones using an OSGi stack for mobile systems and the appropriate development environment called mBS mobile. Problems we encountered during development on the different mobile systems will also be described.

    At 4:00pm to 4:50pm, Friday 11th November

    Coverage slide deck