Sessions at ApacheCon North America 2011 with audio

Your current filters are…

Thursday 10th November 2011

  • The Secret Life of Open Source

    by Ted Husted

    Apache, GNU, Mozilla, Ubuntu, PHP, LibreOffice, Wikipedia -- Today, there are hundreds of open source groups, each with its own culture, methodology, and governance model.

    • How are these groups alike?
    • How are they different?
    • Is there one true path to open source enlightenment, or do many paths converge around a common singularity?

    Join open source insider Ted Husted as we look behind the curtain to see who's pulling strings that steer your favorite open source projects.

    At 10:00am to 10:50am, Thursday 10th November

  • Apache Tika: 1 point Oh!

    by Chris Mattmann

    Apache Tika, since April 2010 an ASF top level project, and a thriving Apache community has made tremendous strides over the past 4 years to grow and mature into a leading text extraction library, and content detection framework. Tika is used in a number of search projects, in a number of data management systems, and in a number of domains.

    Those domains span from the technical industry to domains of science and within the federal government.

    Tika has been used as a teaching platform for computer science graduate students, has been used to unlock information from NASA images, and from the National Cancer Institute, and has also been used to provide rich meaning and information representation of content captured in pervasive document repositories and warehouses. These are only some of Tika's broad applications.

    In November, we hope to have released Tika 1.0. This will coincide with a number of other properties that demonstrate Tika has reached the point of a mature community, including:

    1. Concrete, stable features, and core interfaces.
    2. Tika's use in multiple programming languages and environments.
    3. Our growth in Apache, and election of new committers and PMC members (and ASF members).
    4. Developer articles appearing quite frequently on Tika.
    5. The culmination of a wealth of knowledge in the form of a book that will be published on Tika at the time of the ApacheCon meeting.

    This talk will focus on how we got here, and what's next for this thriving Apache community.

    At 11:30am to 12:20pm, Thursday 10th November

  • mod_lua for beginners

    by Eric Covener

    I will present a beginners guide to using the Lua scripting language inside of Apache HTTP Server, a new mod_perl-like module that will accompany the release of Apache HTTP Server 2.4.

    At 11:30am to 12:20pm, Thursday 10th November

  • Scaling Hadoop Applications

    by Milind Bhandarkar

    Apache Hadoop makes it extremely easy to develop parallel programs based on MapReduce programming paradigm by taking care of work decomposition, distribution, assignment, communication, monitoring, and handling intermittent failures. However, developing Hadoop applications that linearly scale to hundreds, or even thousands of nodes, requires extensive understanding of Hadoop architecture and internals, in addition to hundreds of tunable configuration parameters. In this talk, I illustrate common techniques for building scalable Hadoop applications, and pitfalls to avoid. I will explain the seven major causes of sublinear scalability of parallel programs in the context of Hadoop, with real-world examples based on my experiences with hundreds of production applications at Yahoo! and elsewhere. I will conclude with a scalability checklist for Hadoop applications, and a methodical approach to identify and eliminate scalability bottlenecks.

    At 11:30am to 12:20pm, Thursday 10th November

  • Talking people into creating patches

    by Isabel Drost

    "Contributing to open source projects is trivial: Make a change, create a patch, review and revise, have it accepted." When heavily involved with open source projects it's easy to forget what developers interested in contributing have to learn before even making the smallest first change.

    The talk summarises some of the issues and questions students, long time developers, researchers have when faced with free software development. The talk mainly focuses on the technical issues, touching only briefly the (at least) equally large space of cultural differences of open development communities vs. corporate or even research environments.

    Instead of providing pre-baked solutions to filling this gap the goal of the talk is to initiate a discussion on how to best talk your friends and colleagues into creating patches: Which strategies did work for you, which failed? Which resources do you generally use when mentoring interested peers? Where do you see most problems?

    At 11:30am to 12:20pm, Thursday 10th November

  • ManifoldCF for Content Acquisition

    by Karl Wright

    I'll introduce ManifoldCF, and describe the general enterprise content acquisition and indexing problem which led to its development. I will discuss accessing multiple repositories, enforcing repository security, and incrementally keeping indexes up to date. I'll give an overview of its architecture, and demonstate simple crawls and a secure integration with Apache Solr.

    At 2:30pm to 3:20pm, Thursday 10th November

  • Navigating the Apache Incubator

    by Brett Porter

    Looking to bring an open source project to the Apache Software Foundation? Already a member of a podling? Looking to get involved?

    Inspired by the popular Q&A session at BarCampApache Sydney, this session will walk through all aspects of navigating the Apache Incubator, including:
    * bringing a project to Apache
    * is Apache the right home for a project?
    * the Incubator's procedures and requirements
    * when and how to graduate
    * what makes a successful Apache project
    * examples of successful and less successful podlings

    At 2:30pm to 3:20pm, Thursday 10th November

  • The Power of the mod_proxy Modules

    by Paul Weinstein

    This presentation reviews the concepts of web proxies and load balancing, covers the creation and maintenance of proxies (forward and reverse) for HTTP, HTTPS and FTP using Apache and mod_proxy and how mod_proxy_balancer can be used to provide a basic load balancing solution. Configuration Examples of implementing proxies and load balancer will be discussed including; how and when mod_proxy modules can help, configuring mod_proxy for forward or reverse proxy and configuring mod_proxy_balancer for one or more backend web servers.

    At 2:30pm to 3:20pm, Thursday 10th November

  • Hardening Enterprise Apache Installations Against Attacks

    by Sander Temme

    Enterprise installations of Apache are particularly attractive targets for malicious attacks including Denial of Service, defacement, theft of data or service and installation of zombies or viruses.

    Hardening your deployment against such attacks calls for some special techniques and tactics.

    Come to this session to learn about attack detection techniques, server protection, secure deployment of multiple servers, configuration of firewall "demilitarized zones" and judicious use of SSL encryption.

    How do you deploy an off-the-shelf application that insists on writing to the file system?

    And what steps do you take to securely deploy Apache on Windows or UNIX?

    This presentation will explore solutions to these very real situations.

    At 4:00pm to 4:50pm, Thursday 10th November

  • Interoperability with CMIS and Apache Chemistry

    by Florian Müller

    Content Management Interoperability Services (CMIS) is a specification for improving interoperability between Enterprise Content Management systems. The standard has been ratified in May 2010 and is now supported by many ECM vendors.

    The number of applications using CMIS to access and manage documents of all kinds is steadily growing. Many applications use libraries provided by Apache Chemistry, which provides four implementations of CMIS: OpenCMIS (Java), cmislib (Python), phpclient (PHP) and DotCMIS (.NET).

    This presentation will explain the standard, its relevance and acceptance in the ECM industry and, of course, it will present Apache Chemistry and its role in the increasing use of CMIS.

    At 4:00pm to 4:50pm, Thursday 10th November

  • Life in Open Source communities

    by Bertrand Delacretaz

    Open Source communities often seem to have their own unwritten rules of operation and communication, their own jargon and their own etiquette, which sometimes make them appear obscure and closed to outsiders. In this talk, we'll provide recommendations on how to get touch with, and how to join, Open Source communities. Based on ten years of experience in various Open Source projects, we will provide practical information on how to communicate effectively on mailing lists, how to formulate questions in an effective way, how to contribute in ways that add value to the project, and generally how to interact with Open Source communities in ways that are mutually beneficial. This talk will help Open Source beginners get closer to the communities that matter to them, and help more experienced community members understand how to welcome and guide newcomers.

    At 4:00pm to 4:50pm, Thursday 10th November

  • Chefs with Feathers: The Sakai Project

    by Carl Hall

    The Sakai Project is not only a collaboration of learning institutions but also a marriage of many open source projects. It is this ecosystem of software systems that drives the success of Sakai's products. The interactivity with the various Apache communities that we build upon have strengthened our product and have given us the chance to contribute back. We would like to present our experiences building a large collaborative learning environment on top of open-source software from the ASF.

    At 5:00pm to 5:50pm, Thursday 10th November

  • Dr. Mahout: Analyzing clinical data using scalable and distributed computing

    by Shannon Quinn

    Of the few realms cloud computing has not solidly taken root, one in which it has great potential is medicine. Clinicians generate massive amounts of data during the diagnostic process, the analysis of which, whether manual or computational, can take a great deal of time. For example, the rare genetic disease primary ciliary dyskinesia (PCD) affects the cilia on cells, causing them to behave erratically and leading to breathing problems at best, necessitating lung transplants at worst. Cutting-edge diagnostic tools capture the ciliary motions with high-speed video and use automated methods to quantitatively describe the motion patterns. These methods, however, are computing-intensive and would benefit from parallelization. Here we propose using the Mahout framework to efficiently learn models that capture the motion patterns observed in the videos and aiding in objective diagnoses. Additionally, Hadoop's storage system will allow us to construct and preserve libraries of these motion models in the cloud for later comparison. The library will be in constant flux as new patterns are added and existing patterns are retrained, requiring a scalable and distributed architecture to handle the data and integrate it into the existing library. Ultimately this framework will be a boon for clinicians: they need only take biopsies, gather data as images or videos, upload them to a Mahout/Hadoop cluster, and wait for the results. Patient privacy is maintained by perpetuating only the low-dimensional motion models, computational time is reduced by parallelizing the model learning and comparison process, and models are available to clinicians everywhere through the cloud.

    At 5:00pm to 5:50pm, Thursday 10th November

  • Handling RDF data with Apache Jena

    by Paolo Castagna

    Apache Jena, currently in incubation, is a Java framework for building semantic web applications. It provides developers with a library to handle RDF, RDFS, RDFa, OWL and SPARQL in line with the relevant W3C recommendations.

    Jena has been developed by researchers at HP Labs, Bristol (UK) starting back in 2000. It has been an open source project since its beginning and it is extensively used within the semantic web community.

    This talk introduces the fundamentals of the RDF data model and SPARQL query language as well as the basic usage patterns of Jena: how to parse and write data in RDF format, how to store it using TDB, Jena's native RDF database, querying with SPARQL using ARQ and how to integrate free text searches with SPARQL using Apache Lucene or Solr.

    At Talis we use Apache Jena, in particular TDB, ARQ, LARQ and Apache Hadoop in our ingestion pipeline, as well as many other open source projects, to process RDF data, store it, implement our services and APIs.

    At 5:00pm to 5:50pm, Thursday 10th November

  • Out and About with Apache Traffic Server

    by Leif Hedstrom

    Apache Traffic Server is an ASF Open Source project implementing a fast, scalable and feature rich HTTP proxy and caching server. We will examine the technical details behind TS, what it is good for, and how you can configure it to accelerate your web traffic, and make complex problems easier to solve. Traffic Server was originally a commercial product from Inktomi corporation, and has been actively used inside Yahoo! for many years, as well as by many other large web sites. As of 2009, Traffic Server is an Open Source project under the Apache umbrella, and is rapidly being developed and improved on by an active community. The community is vibrant, with well over 150 active users, contributors and committers.

    This talk will explain the details behind the Traffic Server technology; What makes it fast? Why is it scalable? And how is it different compared to other HTTP proxy servers? We will discuss several use cases, and show how to configure and operate TS for common tasks. Being an HTTP proxy server and cache, there are many use cases, in the areas of forward, reverse and transparent proxying.

    Traffic Server is designed using a hybrid processing model, combining an event driven engine (state machine), with a multi-threaded process approach. This allows Traffic Server to scale on modern multi-core systems, taking advantage of available CPUs. From our perspective, we've combined the best features traditionally used, solving many difficult problems and at the same time we avoid running into some of the pitfalls associated with existing solutions. This approach gives us

    • Scalability on SMP
    • Predictable and low latency characteristics
    • Lightweight on system resources (few threads, little memory wasted)
    • Efficient and reliable disk I/O

    After introducing the technical details behind TS, we will discuss the common applications of a proxy and cache, when and why they would be applicable, and how to configure and use Apache Traffic Server effectively. Focusing on how to use Traffic Server in a production environment, we'll walk the audience through

    • Installation process
    • Configuration files
    • Operations and monitoring

    The goal is to give a solid foundation of web proxying and caching, and why Apache Traffic Server is a contender in this space. No previous experience with Apache Traffic Server is necessary, but familiarity in the general areas of HTTP and HTTP servers is beneficial to follow the presentation.

    At 5:00pm to 5:50pm, Thursday 10th November

Friday 11th November 2011

  • Apache Mahout for intelligent data analysis

    by Isabel Drost

    "Searching the internet" has become a common pattern when looking for information. However with current tools finding the relevant piece of data often turns out to be similar to searching for the needle in a haystack of unstructured information: Social networks, corporate content management systems, micro blogging platforms, tend to generate an ever increasing flow of online data.

    This talk gives an introduction to Apache Mahout - a framework of scalable implementations of algorithms for data mining and machine learning. After motivating the need for machine learning the talk gives an overview of the features of Apache Mahout. The talk shows how to integrate how to integrate Mahout into your application. It shows the tremendous improvements that have been implemented in recent past - including the addition of several algorithms, performance improvements and better APIs for integration.

    At 10:00am to 10:50am, Friday 11th November