Your current filters are…
Four defining characteristics of a social application are (1) people and things are both present in the data model, (2) a person can have a nearly unbounded number of relations between people and things, (3) the creation of a new relationship can be initiated at any time and from many sources, and (4) users will have the expectation that the network of relations will be exposed in fine detail as well as in the aggregate. In combination, these properties introduce read and write bottlenecks making scaling and performance more challenging compared to non-social applications.
At Clipboard, we’ve tackled this exact problem and have found a solution that begins with Riak. We model our social network in such a way that both nodes and edges are realized as Riak objects and we use Riak search to retrieve arbitrary slices of the network. To improve performance and scalability, we use a custom method for indexing objects, have multiple caching layers, and rely on four different methods for reducing write contention. In combination, we have a fast and robust service that is easy to maintain and evolve. This presentation describes our architecture and techniques with the goal of making the primary lessons applicable to other domains.
Since the beginning, Riak has supported high write-availability using Dynamo-style multi-valued keys – also known as conflicts or siblings. The tradeoff for this type of availability is that the application must include logic to resolve conflicting updates. While it is convenient to say that the application can reason best about conflicts, ad hoc resolution is error-prone and can result in surprising anomalies, like the reappearing item problem in Dynamo’s shopping cart.
What is needed is a more formal and general approach to the problem of conflict resolution for complex data structures. Luckily, there are some formal strategies in recent literature, including Conflict-Free Replicated Data Types (CRDTs) and BloomL lattices. We’ll review these strategies and cover some recent work we’ve done toward adding automatically-convergent data structures to Riak
Replicating data inside a Riak cluster and outside of it may sound like the same problem, but they aren't. Latency, bandwidth, security, and other factors make it a significantly different challenge. Basho has invested significant time and effort into building the masterless multi datacenter replication support that is part of Riak Enterprise. This talk will cover the problems, solutions, and evolution of Riak's Multi Data Center support as well as our plans for continued development and enhancement.
Each of us operates distributed systems. Some of us operate traditional infrastructure with database, web, and load-balancing tiers. Others require infrastructure that is more bespoke and may incorporate non-traditional storage solutions (such as Riak). Regardless of where each of us falls on this spectrum, the network closely describes the behavior of our applications. Furthermore, it is the only place we can look to understand emergent behavior of applications working together in concert. In this talk, we take a radiological view of network-derived imagery and discuss what it can tell us about our systems as a whole.
Riak’s highly-available nature makes it ideal for cloud environments where any of your resources may disappear without notice but your database must still be up. Unfortunately databases need I/O and the cloud, where everything is virtualized, is not the most performant place to run one. "Riak in the Cloud" brings a different set of assumptions than hardware based installations and we’ll go through them together.
Come hear the tale of two engineers who have no hardware but decided to try Riak for a production-critical application. We’ll tell you how we ported our application from MySQL, rolled it to production, and the lessons we learned while running Riak in the Cloud.
With regard to the CAP Theorem, Riak is an eventually-consistent database with AP semantics. But, this may soon change.
This talk will present on-going research and development to add true strongly-consistent/CP semantics to Riak. When discussing Dynamo-inspired datastores (Riak, Cassandra, Voldemort), people often use the term "strong consistency" to describe accesses where R + W > N. But, this is not true strong consistency. Concurrent requests still generate non-deterministic results, while node failures and network partitions can lead to partial write failures that provide no guarantees on value consistency.
This work aims to enable true sibling-free access to Riak that is both immediately consistent as well as tolerant to node failures and network partitions. This work enables use cases such as atomic counters and the use of non-monotonic/non-convergent data types with Riak. Finally, unlike previous work in this area (riakual/riak_zab), the work does not require two-phase commit for every operation, but instead relies upon optimistic commits and read-time consistency resolution in the case of partial write failures. In the common case, the same number of round-trip messages are needed for both the existing AP semantics and new CP semantics.
This talk will present an in-depth, but easy-to-understand discussion on this new approach, and is the first time this topic has been presented outside of Basho.
by Matt Ranney
We've put Riak through its paces at Voxer. Anything we do that ends up on a disk drive is put there by Riak. We store data of all types, from simple JSON to crazy JSON and even raw audio bytes. We've got over 60 physical machines dedicated just to Riak.
In this talk, we'll go over the Voxer architecture a bit and dig into some of the ways that we use Riak in our system, including some interesting data structures. In spite of our deep, tender love and commitment to Riak, we'll also talk about some of the things that Riak hasn't done so well for us.
by Pat Helland
For a number of decades, I've been saying "Computing Is Like Hubble's Universe, Everything Is Getting Farther Away from Everything Else". It used to be that everything you cared about ran on a single database and the transaction system presented you the abstraction of a singularity; your transaction happened at a single point in space (the database) and a single point in time (it looked like it was before or after all other transactions).
Now, we see a more complicated world. Across the Internet, we put up HTML documents or send SOAP calls and these are not in a transaction. Within a cluster, we typically write files in a file system and then read them later in a big map-reduce job that sucks up read-only files, crunches, and writes files as output. Even inside the emerging many-core systems, we see high-performance computation on shared memory but increasing cost to using semaphores. Indeed, it is clear that "Shared Memory Works Great as Long as You Don't Actually SHARE Memory".
There are emerging solutions which are based on immutable data. It seems we need to look back to our grandparents and how they managed distributed work in the days before telephones. We realize that "Accountants Don't Use Erasers" but rather accumulate immutable knowledge and then offer interpretations of their understanding based on the limited knowledge presented to them. This talk will explore a number of the ways in which our new distributed systems leverage write-once and read-many immutable data..
Legacy RDBMS solutions such as Oracle provide almost prophetic levels of instrumentation around systems usage and performance. With the growth of adoption of distributed databases for both availability and performance reasons, the demand for robust, mature instrumentation grows with it. In this session, we'll talk about a real-time, highly available social application's use of Riak and how the eyes of the engineering team were opened with precision instrumentation and new visualization techniques. We'll discuss both the need for and the approach to visualization of the sea of granular performance data that comes from distributed database infrastructure.
by Eric Brewer
The NoSQL movement is essentially about giving developers more control at the expense of less pre-packaged functionality. Over time the missing functionality of full relational databases will partially or completely return, but in a new way that is driven bottom up with a layered architectural rather than a top-down with a tightly integrated monolithic architecture. We take a look at what's missing and discuss this evolution using Bitcask as a starting point.
10th–11th October 2012