•  

NoSQL matters 2012 schedule

Tuesday 29th May 2012

  • Keynote - Scalable NoSQL – Past, Present and Future

    by Doug Judd

    Over the past five years, there has been an explosion of new NoSQL database technologies whose primary innovation has been the ability to break capacity barriers by harnessing the collective power and resource of large clusters of server class PCs. In this keynote address, Doug Judd, the original creator of Hypertable, will present a past, present, and future look at scalable NoSQL databases, providing insights into the forces that led to their creation, a review of the current state of the field, and factors that will influence their future evolution.

    At 10:00am to 11:30am, Tuesday 29th May

    In KOMED

  • Joomla, MongoDB and You

    by Mitch Pirtle

    Itching to try your favorite new NoSQL technology with your long-time favorite CMS? What‘s this going to do to your code? How will this affect your development process, and what will this look like in the data center? Joomla founder Mitch Pirtle goes through several case studies to demonstrate some unexpected impact of plugging NoSQL in to relational applications.

    At 11:50am to 12:35pm, Tuesday 29th May

    In KOMED

  • MySQL Cluster: Scaling to Billion Database Queries per Minute

    by Bernd Ocklin

    MySQL Cluster is the high availability, low latency storage engine with autosharding and real-time capabilities, used worldwide to ensure that hundreds of millions of mobile phone users are always reachable.

    This mature and proven technology is gaining rapid adoption in the web due to its simple scaling model, performance/and integrated HA that are needed to ensure that web services can evolve rapidly.

    MySQL Cluster has many attributes that make it ideal for new generations of dynamic, highly scalable applications, including:

    • Auto-sharding across commodity hardware for write-scalability
    • Cross-data center geographic synchronous and asynchronous replication with eventual consistency
    • Online scaling and schema evolution
    • SQL and NoSQL interfaces

    This session looks at the existing NoSQL access methods for MySQL as well as the latest developments for the MySQL Cluster storage engine. You can get the best of both worlds – persistence, consistency, rich queries, high availability, scalability and simple, flexible APIs and schemas for agile development.

    Learn how cluster as a globally distributed database engine is used in mobile phone networks, how web services profit from its scaleable fully event driven architecture.

    At 11:50am to 12:35pm, Tuesday 29th May

    In KOMED

  • NoSQL. A Technology for Real Time Enterprise Applications?

    by Dirk Bartels

    The session focuses on the evolving value proposition NoSQL-database types such as key-value, document, and graph/object stores have and the influence of enterprise application architectures and requirements. After a brief introduction about the principles and the evolution of NoSQL, the presentation discusses critical requirements to make NoSQL work for enterprise application development in contrast to web applications.

    • Evaluating the technological foundation that led to the sucess of NoSQL technologies,
    • Considering the boundaries and trade-offs of NoSQL compared to typical enterprise requirements, and
    • Discussing additional capabilities for NoSQL stores for successful data management in real time enterprises.

    The session addresses project managers, software architects and database developers.

    At 11:50am to 12:35pm, Tuesday 29th May

    In KOMED

  • Building Hybrid Applications with MongoDB, RDBMS & Hadoop

    by Chris Harris

    Business moves fast, very fast. It’s driven by making decision based on facts, but how do you deal with facts that are hours or days old? Periodic data warehouse loads and ETL process no longer scale to meet the increasing business needs in the web economy. Today, modern organizations are combining high performance operational data stores like MongoDB with traditional business intelligence RDBMS solutions to create a hybrid than solves the business need to have fast accurate data for decision making with the ability to interpret and analyze data in every way.

    This talk will try to answer the questions

    • What are the business benefits using this approach?
    • Which problem can now be solved using this approach?
    • How can this solution meet the Needs of Business?
    • Customer use cases

    At 1:40pm to 2:25pm, Tuesday 29th May

    In KOMED

    Coverage slide deck

  • Designing for Concurrency with Riak

    by Mathias Bacon Meyer

    Riak provides a solid storage foundation to build highly concurrent applications. But its simplistic approach to resolving and detecting concurrent modifications to data leave quite a few people scratching their heads. We’ll look at how you can build applications and data structures to fully utilize Riak‘s strengths while still allowing for eventually consistent data.

    At 1:40pm to 2:25pm, Tuesday 29th May

    In KOMED

  • Neo4J, Gremlin, Cypher: Graph Processing for Everybody

    by Peter Neubauer

    With property graph databases and NoSQL coming into fashion over the last years, the interest in graph algorithms, graph processing and recommender systems has increased considerably. In this talk, Peter Neubauer is outlining the differences in approach between imperative scripting of graph traversals with Gremlin and nested iterators over a data structure, and declarative techniques like Cypher, a pattern matching language over Neo4j.

    Peter will show basic recommendation algos, spreading activation and min/max flow calculations on live demo graphs. Be aware that things might break, but fun and pun is guaranteed.

    At 1:40pm to 2:25pm, Tuesday 29th May

    In KOMED

  • Hypertable - The Storage Infrastructure behind One of the World‘s Largest Email Services

    by Doug Judd

    Rediff.com India (Nasdaq: REDF) is one of India‘s top Internet portals, providing email, search, news, entertainment and shopping services to India and the global Indian community. With over 100 million registered users, Rediff.com is one of the largest Indian Internet portals and one of the top email providers worldwide. Rediffmail is Rediff.com‘s popular email service and has experienced steady growth ever since it was launched in 1998. Architected for high availability, Rediffmail is a geo-distributed application served out of three separate data centers. By 2011, the request load generated by the application had begun to overwhelm the underlying storage system, causing frequent outages. Metadata updates in connection with inbox management turned out to be the primary culprit, generating over 75 percent of the storage system request volume. In late 2011, the Rediffmail engineering team re-architected the system on top of Hypertable, solving the inbox management problem and eliminating the associated outages. In this talk, Doug Judd, the CEO of Hypertable Inc., will present the new Rediffmail architecture and describe how it leverages the Hypertable scalable database as a core component.

    At 2:45pm to 3:30pm, Tuesday 29th May

    In KOMED

  • Structr – A CMS Implementation based on a Graph Database

    by Axel Morgner

    This session offers a close look at the implementation and benefits of utilizing a graph database, namely Neo4j, to serve as a CMS back-end. As we will see, the classic CMS ORM approach has several drawbacks, such as the inherent complexity of mapping CMS content to tables, or the inflexibility of relational databases for semi-structured CMS data. Graph databases address these challenges by directly storing CMS data as a graph structure in a natural and trivial way, and allowing to alter the CMS schema at runtime. Along with a live Demonstration the talk will provide a short introduction of general CMS use-cases and requirements, and will cover the specific strategy and data model used to build open-source CMS structr, which can be applied to other cases as well.

    At 2:45pm to 3:30pm, Tuesday 29th May

    In KOMED

  • The Apache Cassandra Storage Engine

    by Sylvain Lebresne

    Apache Cassandra is a distributed database built to handle massive amounts of data on large clusters of commodity servers. This talk will present the storage engine at the core of Cassandra, motivating the use of a structure similar to a Log-Structured Merge Tree rather than of a usual B-Tree and its implications for the data model. We will also introduce most of the current features of that engine (secondary indexes, integrated caching, TTL etc.) including recent developments introduced in Cassandra 1.0 like compression/checksumming and the new leveled compaction.

    At 2:45pm to 3:30pm, Tuesday 29th May

    In KOMED

  • ArangoDB

    by Martin Schönert

    This talk will give an introduction to ArangoDB. ArangoDB is a new open source NoSQL database, whose goal it is to be a powerful universal database. It allows flexible data modelling: as key-value pairs, documents or graph data. It allows complex queries through an elegant query language. An embedded JavaScript interpreter allows you to add functionality to the database for transactions, complex manipulations of the objects, treating AvocadoDB as a complete application server – database stack. ArangoDB guarantees durability through MVCC with append-only journals and zero-administration availibility through synchronous replication. It utilizes main memory and SSDs to deliver maximal performance. Even though ArangoDB is still a young project, it has already attracted international attention, especially in Japan and the USA. The community is currently working on APIs for Python, D, Ruby, Java and node.js.

    At 4:00pm to 4:45pm, Tuesday 29th May

    In KOMED

  • From Tables to Graph. Recommendation Systems, a Graph Database Use Case Analysis

    by Pere Urbón-Bayes

    Recommendation engines have changed a lot during the last years and the last big change is NoSQL, especially Graph Databases. With this presentation we intend to show how to build a Graph Processing technology, based on our experience in doing that for environments like Digital Libraries and Movies and Digital Media.

    First, we will introduce the state of the art on context aware Recommendation Engines, with special interest on how people are using Graph Processing, NoSQL, systems to scale this kind of solutions. After an introduction to the ecosystem, the next step is to have something to work with. So we will show the audience how to build a Recommendation Engine with a few steps.

    The demonstration part will be made using the next technology stack: Sinatra as a simple web framework. Ruby as a programming language. OrientDB, Neo4j, Redis, etc. as a NoSQL technology stack. The result of our demonstration will be a simple engine, accessible through a REST API, to play and extend, so that attendants can learn by doing.

    In the end our audience will have a full introduction to the field of Recommendation Engines, with special interest on Graph Processing, NoSQL, systems. Based on our experience making this technology for large scale architectures, we think the best way to learn this is by doing it and having an example to play with.

    At 4:00pm to 4:45pm, Tuesday 29th May

    In KOMED

  • Wakanda: NoSQL for Modeldriven Web Applications

    by Alexandre Morgaut

    Developing a business web application is still a long process in 2012. Model-Driven Development is at the heart of:

    • Requirements design for the contractor and the product manager
    • Productivity for the developer
    • Consistency and security for the end-user
    • Evolution toward future applications

    The Wakanda platform – with its NoSQL object datastore - intends to let you create such model-driven applications. The presentation will explain and show how to create the application model, with its business and security rules, coded once, then made available everywhere without being “bypassable”. You will see enhancements to the maintenance and debug process, as everything is centralized in the model on the server-side (while maintaining potential pre-validation on the client to decrease the load on the server).

    To add even more consistency, the same language is used everywhere: JavaScript.

    At 4:00pm to 4:45pm, Tuesday 29th May

    In KOMED

  • Welcome to Redis 2.6

    by Salvatore Sanfilippo

    Salvatore Sanfilippo will present the new Redis 2.6 released for the first time.

    The new Lua scripting feature and the other major changes in this release will be explained showing use cases and comparing the solutions with the ones available in Redis 2.4.

    At 5:05pm to 5:50pm, Tuesday 29th May

    In KOMED

    Coverage slide deck

Wednesday 30th May 2012

  • NoSQL Adoption – What’s the Next Step?

    by Luca Garulli

    Today, many companies are already using a NoSQL solution to handle a piece of data. What is the next step? Probably the adoption of NoSQL as a total replacement of Relational DBMS. To reach this Goal, NoSQL solutions have to support multiple models and features, that are already available in Relational DBMS. The challenge has started.

    At 9:00am to 10:30am, Wednesday 30th May

    In KOMED

    Coverage slide deck

  • Data Access 2.0? Please Welcome: Spring Data!

    by Oliver Gierke

    Spring always provided sophisticated support for various Java data access technologies. The lately invented Spring Data project now takes the next step and introduces a consistent programming model for non-relational data stores and helps implementing data access layers in a consistent and easy-to grasp fashion - for both the NoSQL stores as well as more traditional APIs like JPA.

    The talk introduces the umbrella project, foundational concepts and abstractions and dives down into specialties of particular modules using MongoDB and Neo4J as examples.

    At 10:50am to 11:35am, Wednesday 30th May

    In KOMED

    Coverage slide deck

  • NoSQL – Not Only a Fairy Tale

    by Sebastian Cohnen and Timo Derstappen

    Whether IT is CouchDB, Redis, Riak or MongoDB we have investigated, implemented, tested and run several NoSQL databases at Adcloud. Although we are happy with most of our Decisions, we have learned our lessons, which we would like to share. We try to choose the right database for the right job. But sometimes it is hard to define the job.

    At 10:50am to 11:35am, Wednesday 30th May

    In KOMED

    Coverage slide deck

  • NoSQL: Back to the Future or is it Simply yet Another Database Feature?

    by Martin Scholl

    NoSQL is about to reach the tip of the hype cycle. Therefore, it is time to take a break and thoroughly analyze the NoSQL trend.

    This talk will trace NoSQL back to the early days of database approaches and will then reflect on NoSQL‘s current state and future: Is NoSQL as a technological approach to data storage just a feature or is it a unique category of its own? Is NoSQL a fundamentally new thing that will stay as a trend? What are current NoSQL systems lacking? Will there be a technology or trend that could disrupt NoSQL?

    At 10:50am to 11:35am, Wednesday 30th May

    In KOMED

  • Crazy NoSQL Data Integration with Pentaho

    by Matt Casters

    With the arrival of a new armada of NoSQL databases chances Are increasing constantly that you will have to integrate data from one of them. So in this talk I‘ll be going over crazy data integration between various relational and NoSQL databases.

    • Parallel extraction and load of MySQL data into Hadoop
    • Real-time update of PostgreSQL tables based on changes in CouchDB
    • Extracting JSON information from MongoDB to load it into MySQL
    • Populating elasticsearch results with product information

    Obviously, I‘ll be giving a short introduction Pentaho Data Integration (Kettle), the graphical programming and design involved. I‘ll also be giving a short overview of the NoSQL landscape so that you can get an idea of what‘s going on in that space.

    At 11:55am to 12:40pm, Wednesday 30th May

  • NewSQL Database for New Real-time Applications

    by Peter Idestam-Almquist

    New real-time applications require databases that can process larger volumes of transactions than what is possible with current market-leading relational databases. And while more recent NoSQL databases are able to scale out so that they can handle extremely large volumes of data, they cannot guarantee consistency. While this is acceptable for some types of data such as social network bulletin boards, it does not work for most business critical data such as stock quantities or money. But what if there was a happy medium? A database that could process millions of transactions on a single machine, while maintaining the consistency that these business critical applications require? There is! In this presentation, Peter will describe a different kind of a NewSQL database, which will process millions of ACID-compliant transactions while scaling in on a single machine to fully ensure consistency. He will also delve into the technical differentiators between traditional relational databases, NoSQL and NewSQL databases and what applications they are best-suited for.

    At 11:55am to 12:40pm, Wednesday 30th May

  • The No-Marketing Bullshit Introduction to Couchbase Server 2.0

    by Jan Lehnardt

    Couchbase Server 2.0 is a love child of memcached and Apache CouchDB. This talk introduces the Couchbase architecture, and features it also demonstrates integration challenges as well as our solutions. Couchbase is an Apache 2.0 licensed open-source project, based on the existing memcached, Membase and CouchDB technologies, that aims to solve your data storage problems in a flexible, fast and reliable manner. In this talk, you will learn how it all works.

    Couchbase provides clustered storage management, in-memory, disk-based and hybrid working sets, a high-performance key-value store with dynamic query capabilities as well as cross-cluster replication this allows you to mix and match what aspects of CAP your application needs. All which a familiar memcached-compatible APIs and no need for you to adjust your applications.

    At 11:55am to 12:40pm, Wednesday 30th May

  • Apache Cassandra: Real-World Scalability, Today

    by Jonathan Ellis

    The Cassandra distributed database has added many new features over the last year based on real-world needs of developers at Twitter, Netflix, Openwave, and others building massively scalable systems. This talk will cover the motivation and use cases behind features such as secondary indexes, Hadoop integration, SQL support, bulk loading, and more.

    At 1:40pm to 2:25pm, Wednesday 30th May

  • Into The Void: From MySQL to NoSQL to “Nothing”

    by Tim Lossen

    Wooga creates casual social games, and we approach each new game as a greenfield development project. This gives us the chance to experiment with different approaches and to try out promising new technologies.

    In this talk, I will explain how our back-end architecture has evolved over time, and how our search for the "perfect backend stack"" has (so far) progressed through three distinct phases: from sharded relational database, over various NoSQL databases, to the question: “Do we really need a database, after all?”

    I will share what worked well, what didn‘t, what lessons we have learned on the way - and which other directions we would like to explore in the future.

    At 1:40pm to 2:25pm, Wednesday 30th May

    Coverage slide deck

  • Rocket U2 Databases & The MultiValue Model

    by Daniel McGrath

    Automatic encryption of data at rest, RESTful web services, JSON support: The Rocket U2 databases (UniData & UniVerse) provide an enterprise NoSQL solution already in use everywhere from emergency systems to major financial institutions. Come learn about this proven, high-performance database technology running in thousands of organizations around the globe.

    At 1:40pm to 2:25pm, Wednesday 30th May

  • Design your Application Using Persistent Graphs and OrientDB

    by Luca Garulli

    This talk presents the OrientDB NoSQL Open Source project and its documentgraph capabilities. NoSQL products promise big performance and scalability at the cost of many compromises like tran-21 Abstracts Wed 30 May sactions, aneasy query language and constraints. OrientDB offers a flexible model where it can be used in different ways depending on the use case.

    This presentation deals with the OrientDB features and some different use cases where it can be applied.

    At 2:45pm to 3:30pm, Wednesday 30th May

    Coverage slide deck

  • NoNoSQL@Google

    by Olaf Bachmann

    Relational Database Management Systems were hardly ever used at Google to store logs data. But SQL as a data exploration language has been becoming increasingly popular over the last years. In fact, nowadays the fast majority of data analysis jobs at Google are expressed by SQL queries. In this talk I give a brief history of data storage and exploration techniques deployed at Google and describe the current state of the art. I furthermore explore why SQL became so popular and illustrate my main conclusion: SQL‘s simplicity and conciseness rules over all other (currently available) alternatives.

    At 2:45pm to 3:30pm, Wednesday 30th May

  • Tarantool/Box: Transactional NoSQL Database for most volatile Web data

    by Konstantin Osipov

    Tarantool/Box is an open-source, transactional database server. Its key property is high efficiency (hundreds of thousands of requests per second on a commodity server) combined with high level of customization, achievable with Lua stored procedures.

    Tarantool utilizes lock-free cooperative multi-tasking environment and asynchronous I/O to achieve minimal overhead per request. Each request, which can be either a simple GET/PUT, classical for NoSQL systems, or a custom data access script in Lua, is run in a multi-version transactional environment.

    The key advantage of Tarantool/Box, compared to existing open-source NoSQL solutions, is the ability to execute parts of application business logic automatically on the server, with sufficiently high performance. Sufficiently means that you may never need to shard.

    Tarantool is written in C and Objective C. The server code and Drivers are available under terms of simplified BSD license. In the talk I’ll demonstrate key features of the server using examples in Lua, Ruby and Erlang.

    At 2:45pm to 3:30pm, Wednesday 30th May

  • Maximize your Data with Real-time Big Data Analytics using NoSQL and Graph Technologies

    by Brian Clark

    Join Brian Clark, the VP of Product Management at Objectivity, in a discussion of the latest trends in Big Data Analytics, defining Big Data and understanding how to maximize your existing architectures by utilizing NOSQL technologies to improve functionality and provide real-time results. There will be a focus on relationship analytics as well as an introduction to NOSQL data stores, object and graph databases, such as the architecture behind Objectivity/DB and InfiniteGraph.

    At 4:00pm to 4:45pm, Wednesday 30th May

  • Theoretical Aspects of Distributed Systems, Playfully Illustrated

    by Pavlo Baron

    If you want to know about the principles and pitfalls of building distributed systems, you can either make your way through a whole pile of pretty prosaic theoretical books or you can just come to this talk and understand some of them real fast by watching a little playfull demostration. It‘s your choice.

    At 4:00pm to 4:45pm, Wednesday 30th May

    Coverage slide deck

  • UML as a Schema Candidate for Graph Databases

    by Vincent Delfosse

    During the lifetime of a building, a huge quantity of information will be captured by multiple actors: owners, managers, users, architects, electricians, plumbers, etc. As most of this information is not shared between these persons, the same analysis, verifications or measurements have to be done multiple times. SpatioData is a research project aiming at the development of a collaborative platform to support the effective sharing of such diverse data.

    In order to reach this ambitious goal, it is important to provide the users with devices suiting their needs (mobile or station) and interfaces adapted to their specific activities. These various client applications will communicate to a centralized database through WebServices exhibiting a common data model. The data model of such an application has to address many problems. Amongst those are:

    • Complexity of the data model to represent the different aspects of the buildings, their actors, and the activities of the actors.
    • Communicability, to ensure that developers willing to build client applications will understand and use the provided model in the same way.
    • Extensibility, to provide client-side developers with a mechanism to add their own data schemes in the database.
    • Flexibility, required in a research environment, where multiple questions are being co-resolved in parallel.

    UML has been chosen for the representation of our data model, as it is a formalized and well-known format in the developers community. But instead of fully developing the data model, then implementing it and providing dedicated services to it, an original approach has been adopted, in the form of UML-oriented WebServices independent of any specific model. This solution has been built on top of a graph database (Neo4J) in a simple but powerful way, providing answers to the given problems above.

    This communication will detail the complete system architecture and the process designed on top of it, to make sure thirdparty developers can take full advantage of this platform.

    At 4:00pm to 4:45pm, Wednesday 30th May