by Hilary Mason
Machine learning has come a long way in recent years — from a long-marginalized field so old it still has the word “machine” in the name, to the last, best hope for making sense of our massive flows of data.
The art of ‘data science’ is asking the right questions; the answers are generally trivial or impossible. This talk will focus more on questions than on answers. I’ll give a brief history of the field with a focus on the fundamental math and algorithmic tools that we use to address these kinds of problems, then walk through several descriptive and predictive scenarios.
Finally, I’ll show one example system using bit.ly data in-depth, from the backend infrastructure through the algorithms and data processing layer to show a functioning product.
Attendees should expect to hear some good stories of data gone right and data gone awry, and walk away with a few new clever tricks.
by Yehuda Katz
Ruby on Rails has shown the world that it's possible to build an open-source project that is optimized for developer happiness. In the five years since it was originally announced, Rails has gone through the typical hype cycle: enthusiastic early adopters followed by negative press attention followed by a quieter period of productivity.
What challenges did Rails face as it made its way to being a successful, well-regarded framework, and how did it overcome them? In this talk, Yehuda will talk about how the Rails team faced its challenges, and what lessons you can take away when working on your own open source project.
by Kyle Cordes
This talk will show how and WHY to use Lua (as opposed to the zillion other scripting languages) for embedded scripting inside of larger, non-Lua projects. Lua is safe, fast, simple, learning, and more popular that you might expect.
by Chris Houser
If you've done much work in any language with polymorphism, you've probably encountered the expression problem, whether you knew its name or not. Chances are even come up with a solution or two yourself. I'll define the expression problem, demonstrate some common solutions, then dig into how Clojure's multimethods and protocols each solve the problem while avoiding weaknesses of other solutions. Along the way you'll get a sense of how Clojure's approach to datatypes differs from classic object-oriented languages.
Square enables users to accept card payments on Android devices. Square reads magnetic stripe data through the microphone port using a free reader and sends receipts via email or SMS. Square has been featured in the Android Market and at Google I/O.
Bob and Eric, the programmers behind Square, will demonstrate how magnetic stripe decoding works. They’ll describe Square's unique approach to Intent-based APIs used in the point-of-sale API. They’ll share tips for taming the activity stack and building device-independent user interfaces.
Finally, they’ll give a sneak peak into their upcoming open source Android library Retrofit. Retrofit provides utilities for dependency injection (using Google Guice), simple and fast persistence, REST communication and dialog management.
by Mike Malone
Recently a new class of database technologies has developed offering massively scalable distributed hash table functionality. Relative to more traditional relational database systems, these systems are simple to operate and capable of managing massive data sets. These characteristics come at a cost though: an impoverished query language that, in practice, can handle little more than exact-match lookups at scale.
This talk will explore the real world technical challenges we faced at SimpleGeo while building a web-scale spatial database on top of Apache Cassandra. Cassandra is a distributed database that falls into the broad category of second-generation systems described above. We chose Cassandra after carefully considering desirable database characteristics based on our prior experiences building large scale web applications. Cassandra offers operational simplicity, decentralized operations, no single points of failure, online load balancing and re-balancing, and linear horizontal scalability.
Unfortunately, Cassandra fell far short of providing the sort of sophisticated spatial queries we needed. We developed a short term solution that was good enough for most use cases, but far from optimal. Long term, our challenge was to bridge the gap without compromising any of the desirable qualities that led us to choose Cassandra in the first place.
The result is a robust general purpose mechanism for overlaying sophisticated data structures on top of distributed hash tables. By overlaying a spatial tree, for example, we’re able to durably persist massive amounts of spatial data and service complex nearest-neighbor and multidimensional range queries across billions of rows fast enough for an online consumer facing application. We continue to improve and evolve the system, but we’re eager to share what we’ve learned so far.
by Steve Harris
In talking to our users it is clear that applications are getting more and more data hungry. According to IDC, data requirements are growing at an annual rate of 60 percent. There is good news though. Server class machines purchased this year have a minimum of 8 Gig of RAM and likely have 32 Gig. Cisco is now selling mainstream UCS boxes with over 380 Gig of RAM. Memory has gotten big and extremely cheap compared to things like developer time and user satisfaction.
Unfortunately a problem exists as well. For Java/JVM applications it is becoming an ever increasing challenge to use all that data and memory due to GC Pauses.
In this talk I'm going to cover the problems we identified and the technology we built to solve those problems.
A bit about it's history and the history of the problem
The where, when why of BigMemory
Throughput, latency, Garbage Collection, SLA and scaling characteristics
Configuration of Ehcache with BigMemory in an existing application with just a few lines of config code
Ehcache's tiered storage architecture: MemoryStore, the OffHeapMemoryStore and the DiskStore
Ehcache BigMemory with scale-out
Implications for your caching architecture
by Eben Hewitt
The Cassandra database is distributed, highly-available, fault-tolerant, and offers an elastic scaling model—all of which make Cassandra a powerful proposition for mission-critical applications. It’s used by many of the world’s biggest web properties, including Facebook, Twitter, Digg, StumbleUpon, Reddit, Cisco, and others.
This is all fantastic, but there’s no free lunch—Cassandra is not a relational database, but rather follows in the footsteps of columnar data stores such as Google BigTable and Amazon’s Dynamo. As such, getting your head around how Cassandra works can be daunting to say the least: there’s a lot of new terminology (what’s a Hinted Handoff? What’s a SuperColumn?? What do I need to know about Vector Clocks??? Argh!). There are some complex algorithms in Cassandra, and new ways of handling basic operations in order to achieve the benefits mentioned above. Cassandra only recently emerged from Incubator status, and there aren’t a lot of tools available yet to smooth your path toward adoption. This talk can help you understand everything you need to know to get started using Cassandra. We’ll sort out all the terminology and foundational concepts, and then dive into a practical set of ways to get started putting Cassandra to work in your applications today.
System provisioning is all too often a cardboard house held together by duct tape. Different versions of the same software live on each machine, leading to difficult to diagnose, "not on my machine" bugs. Without automation, system provisioning is an impossibly hard problem that can destroy a business. Thankfully, Chef helps solve this problem through declarative build scripts that bring all machines into alignment. In this talk, we'll discuss how to get started with chef-solo, what to do when it's not enough, and how to fit it into your organization. Come hear how Chef can make your life better, and how easy it is to use.
Hibernate is the most popular object-relational mapping library for Java (and for most other JVM-based languages), providing a framework for mapping an object-oriented domain model to a traditional relational database. An application of hibernate to simple models is straightforward and almost effortless. However, in case of more complex models we usually run into some issues relating to performance or correctness.
We will show several flaws in the demo from 'Java Persistence with Hibernate' - CaveatEmptor (yes, it has several bugs including a serious locking-related issue!) and other open-source projects.
The hibernate-related flaws will be accompanied by alternative solutions and best practices, which help to improve performance and quality of both, database and object-oriented, models. We will explore patterns and practices mainly in the context of object-oriented model, specifically how to meet object-oriented principles, yet to ensure correctness and efficiency of hibernate mappings.
Additionally, we will present a free online tool that helps in automated discovery of concurrency-related issues with hibernate and database transactions. The tool uses static analysis to analyze the bytecode of any JVM-based application and to find bugs related to hibernate.
Upon completion of this presentation, attendees should better understand the potential hibernate issues along with patterns to use hibernate in a correct and elegant way. Moreover, attendees will learn how to automatically discover a certain class of hibernate-related bugs.
by Jeff Brown
In this session Jeff Brown, core member of the Grails development team and a senior engineer at SpringSource, will demonstrate how the basics of Twitter can be built using Grails and JMS in only 40 minutes. A fast paced and code-driven presentation, Jeff will build a Twitter-like application from scratch using Grails and its rapid application development capability. By bringing together Spring, JMS and Java persistence techniques, Jeff will also provide advanced tips and techniques for constructing Grails applications that can be deployed on to the Java EE platform.
Attendees will learn:
How to construct a basic Grails project with Spring-based domain objects
How to incorporate messaging and persistence into your Grails application
How to adapt basic configuration to suit the needs of your application
by Ted Neward
Android is a new mobile development platform, based on the Java language and tool set, designed to allow developers to get up to speed writing mobile code on any of a number of handsets quickly. In this presentation, we'll go over the basic setup of the Android toolchain, how to deploy to a device, and basic constructs in the Android world.
Attendees should be intermediate to advanced Java developers, as no time will be spent on Java basics, just the Android parts. Attendees are encouraged to bring laptops to the session (and your Android-based device, if you have one) to fill out code as we go, but the limited time frame means a focus on fast delivery of content and example code; have your fingers warmed up (and the SDK downloaded!) before you get here.
How can software developers change cities, states, and countries for the better? Last year, we saw an explosion of interest around government transparency. The Open Government movement, spearheaded by open source developers, seeks to make government more accountable and responsible by turning open government data into citizen-focused, civic-minded applications. This talk will guide you through the Gov 2.0 landscape. You'll learn about the data sets and APIs available freely available for your use, the tools and skills you'll need to be a successful civic hacker, and you'll get a thorough overview of the current civic apps out there. Civic hacking will enhance your open source portfolio while making a difference in your community and country.
by Justin Love
by Bryan Weber
This talk is not only for Ninjas! One of the most frequently cited benefits of Clojure is being able to take advantage of the Java libraries and ecosystem. This talk will cover calling Java from Clojure and just as importantly calling Clojure from Java. Any ninjas that attend will come out of the session knowing how to use Clojure on a Java project without being detected... well, almost anyway.
by Tim Berglund
Some systems are too large to be understood entirely by any one human mind, and canʼt readily be modeled using traditional mathematical tools. They are composed of a diverse array of individual components capable of interacting with each other and adapting to a changing environment. As systems, they produce behavior that differs in kind from the behavior of their components. Complexity Theory is an emerging discipline that seeks to describe such phenomena previously encountered in biology, sociology, economics, and other disciplines.
Beyond new ways of looking at ant colonies, fashion trends, and national economies, complexity theory promises powerful insights to software development. The Internet— perhaps the most valuable piece of computing infrastructure of the present day—may fit the description of a complex system. Large corporate organizations in which developers are employed have complex characteristics. Even the code base youʼre working on right now may share characteristics with a complex system. In this session, we'll explore what makes a complex system, what advantages complexity has to offer us, and how to harness these benefits in the systems we build.
by Jim Duey
Writing applications that are distributed across multiple machines implies sending messages between the different logical portions of the code. The book "Enterprise Integration Patterns" went a long way towards documenting the various standard ways this message passing could be envisioned. Libraries like Apache Camel provide concrete implementations of these ideas, but have limitations that come from the languages they are implemented in or target.
I introduce a library, called Conduit, that provides a clean conceptual framework for thinking about and composing distributed applications. EIP patterns can easily be constructed, reasoned about and connected using a small number of basic operators that hide the complexity of sending and receiving messages across various transports. The library can be extended easily to implement any transport that a user might require. An AMQP transport will be demonstrated and methods to extend to other transports explained. Establishing a foundation for thinking about distributed applications is the primary thrust of the talk so that developers will have a different perspective to approach such problems with.
You should attend if you want to stop doing distributed and multi-threaded apps the "hard way". This talk will show you a better way of thinking about and then implementing your designs.
by Kyle Simpson
Whether you know it or not, every web application platform has UI Architecture, the stuff between the front-end and the back-end (aka, the "middle-end"). You know -- things like Templating, URL Routing, Data Validation/Formatting, Ajax, Compression/Optimization, etc. The problem is, you probably didn't realize it was there, and worse, you probably have no exposure to or control over those pieces.
by Billy Newport
NoSQL has become the latest darling technology. We will examine its roots, why it became popular in that context, and whether it can extend its reach into mainstream enterprise applications.
Come learn all about Behavior Driven Development and see it action to help define system behavior from the top level down to the unit level, describing the need for code to exist and then writing the code to meet that need. This will include a live demo of creating a complete feature from the outside and working our way in one piece at a time. Concepts demonstrated can be applied in other languages.
by Tim Berglund
You love Groovy and you're a believer in cloud computing. For a larger project you might choose Grails and hosting on Amazon EC2, but what if you want to take advantage of the nearly massless deployments of a cloud provider like the Google App Engine? You could make Grails work, but it's not always the best fit. Enter Gaelyk.
Gaelyk is a lightweight Groovy web application framework built specifically for the Google App Engine. In this session, we'll talk through the simple abstractions it offers, then show how easy it is to code and deploy a useful application to the cloud.
The GoLightly library is a toolkit for building flexible virtual machines. Instead of limiting you to a particular execution model it defines basic primitives which are useful in implementing stack-, register- and vector-based execution models along with simple communications models for multicore designs.
GoLightly started as a research tool to help design a high-performance virtual machine for Ruby written in Go but its scope has since expanded to be a general-purpose library with the aim of allowing any high-level language runtime to easily exploit multicore concurrency and the vector-processing features of modern consumer processors.
In this session we'll draw on the GoLightly and related codebases to explore how Go supports concurrency as well as examining its approach to object-orientation and type safety - including the dirty tricks occasionally required to override it.
Here we'll also encounter the Go testing and benchmarking framework and use code examples to illustrate the many trade-offs involved in VM design including some or all of: dispatch models; bytecode and threaded interpretation; control flow and activation records; interrupts; memory allocation; stack- and register- architectures; vector-processing (SIMD) extensions; JIT and AOT compilation; system calls, blocking and processor-inspired weirdness.
By the end of the session attendees should be comfortable reading Go source code, have a basic feel for developing with the language and the necessary background to write their own VMs in whatever language they happen to prefer.
by Ken Sipe
In the Java build space, first there was ANT, which provided a reliable way to build without an IDE. Then there was Maven, which provided standardization in build life cycles and dependency management. Yet there still seems to be frustrations with maintaining a good build system... whether it is just too much XML or too many POMs. Frankly XML is just limiting as a DSL for describing a build for anything that falls outside of what the original builders of the framework envisioned. Gradle provides a solution that provides convention over configuration approach to the build process and an approach at building that isn't based XML.
This session assumes no familiarity with Gradle as it introduces this new approach at building projects. It is very helpful to be able to read and understand groovy to get the most from the session. This session will look at multi-language or polyglot projects, as well as integration to ANT and Maven. It will conclude with building custom plugins for the Gradle build process.
by Paul King
This talk looks at using Groovy for multi-threaded, concurrent and
grid computing. It covers everything from using processes, multiple
threads, the concurrency libraries ear-marked for Java 7, functional
programming, actors including GPars, as well as map/reduce, grid
and cloud computing frameworks. We'll look at leveraging Java techniques
as well as Groovy specific approaches.
Multiple Processes with Ant, Java and Groovy
Multiple threads - Java and Groovy support
The java.util.concurrent APIs, Fork/Join, Atomicity and more
Useful Java libraries: Google collections and others
Actor/Dataflow libraries: Jetlang, GPars
Polyglot solutions with Scala and Clojure
Grid computing and cloud solutions
Testing multi-threaded programs
by Douglas Crockford
by Scott Davis
The hard line between web pages (pure presentation) and web services (pure data) is finally beginning to blur. Companies as varied as Best Buy, Twitter, Facebook, LinkedIn, Flickr, TripIt, O'Reilly, and even People magazine have decorated their web pages with hidden, semantic metadata. The results are impressive: a 30% increase in traffic for Best Buy, a 15% increase in click-through rate reported byYahoo!, and dramatic Google PageRank improvements.
In this talk, we'll explore popular microformats such as hCard (the HTML equivalent of vCard) for contact information, hCalendar (the equivalent of iCalendar) for events, hAtom for syndication, and much more. We'll use Java and Groovy to tease out the hidden data in plain old HTML pages for use in everyday applications. You'll also see how Firefox and Safari plug-ins integrate the browser with your address book and your calendar in unprecedented ways.
This is not yet another staid, academic discussion of the future of the semantic web -- this is a pragmatic discussion of how the technology is being used right now to deliver real web services AND web pages at the same time.
Scala is an intensely powerful language. One of the most obvious ways in which this manifests is the syntax, which is wonderfully amenable to internal DSLs and flexible APIs (not to mention endless reams of obfuscated sources and fanciful operators). However, despite the superficial flash of Scala's syntactic skin, its true power lies in the type system and in the language's deep semantic constructs.
This talk will dive into some of the more remote regions of the kingdom of Scala. Specifically, we will cover the following topics:
Higher-Kinds (what they are and how they can be applied)
Type-Level Encodings (*really* exploiting Scala's type system)
Typeclasses (just like Haskell...except not)
Delimited Continuations (and you thought kinds were confusing!)
Please note that this is an advanced talk targeted at the Scala practitioner who is already fairly comfortable with the language. With that said, we hope the talk will remain reasonably accessible to the Scala beginner - so long as they don't object to the presentation of odd and esoteric language features with disturbing enthusiasm.
by Guy Steele
Anyone remember the old days, when for good performance you had to worry carefully about which register should hold which variable, and when? Sometimes we still do this to get extremely high performance from critical inner loops, especially when using specialized processing hardware such as GPUs.
On the other hand, we have been able to write ever more complex and ever more capable software systems only by sacrificing such micromanagement and using general-purpose tools and abstractions for coding the bulk of our software. Along the way, we have discovered that code generated by automated tools often does a better job than hand-crafted code.
And we learn to code in such a way that the behavior of our code does not depend critically on the detailed optimization decisions that we have delegated to the tools. If we want to let a compiler's register allocator have the freedom to put variables in registers, we stop writing code that takes the address of a variable, as in the C expression &myvar . If we want to allow an automatic storage allocator to do its job, we must write code that works properly independently of where an object or array happens to have been allocated, and perhaps independently of whether the object or array happens to be automatically relocated in the middle of a computation. Once we do this, we don't have to think about memory placement. Good programming language design can get us from the place where we must remember "don't use this difficult feature" to the place where it's not even on the radar screen because the language provides other, better ways to think and get things done. (Example: Java doesn't even have a way to take the address of a variable.)
Likewise, the best way to write code for multiple processors is not to have to think about multiple processors. We need to get to the point where we worry about the assignment of tasks to processors just about as much as we worry about the assignment of data to memory---which is to say, only for truly critical portions of the code---and for the most part leave such decisions to automated tools.
This will require further adjustments in our programming habits---adjustments that, we argue, in the end will make programs easier to understand and maintain as well as easier to run on parallel processors. The key is not to focus on a particular technology but on useful invariants. Here, as in the past, good programming language design can help to encourage good programming habits.
While the problem of handling massive amounts of data has been at the forefront of database research both in industry and in academia, addressing the complexity of domain models has remained solely a concern of application architects forced to align often highly incompatible problem and solution domains.
HyperGraphDB is a database with a unique memory/data model based on generalized hypergraphs. Those are graphs where edges can point to an arbitrary number of nodes and even to other edges. Thus higher order relationships are expressed naturally which automatically solves most headaches related to domain data modeling. Entities (nodes and edges) have arbitrary values managed by a comprehensive type system embedded as a hypergraph itself.
In a sense, HyperGraphDB is a dynamic-schema database general enough to easily accommodate any meta-model and integrate entities of different formal representations while maintaining high performance through aggressive indexing. In that respect, it is as much a knowledge management system suitable for AI applications as it is database for conventional enterprise systems. Key to such capability are its open-architecture and extremely general formal basis.
In this talk, I will present some of the more interesting aspects of the HyperGraphDB architecture and discuss some of the subtleties in balancing generality, practicality and efficiency in such an open-ended, yet highly organized memory model. I will compare it to other graph databases and put in the larger context of the recent NOSQL movement.
Maintaining state is a common place among today's complex systems, and choosing how systems interact with this state is one of the earliest design decisions that is made. With the rise of the multi-core processor, concurrency is becoming more and more common place and dealing with state transforms into a potential debugging nightmare. In this session we will discuss the difference between mutable and immutable state; how your systems behave when dealing with mutable versus immutable state; as well as learn when and where the best fits are for mutable and immutable state. Finally we will finish up with some common mutable (and immutable) anti-patterns and learn how to avoid them.
14th–15th October 2010