Tuesday 29th October, 2013
11:40am to 12:40pm
With the proliferation of data sources and growing user bases, the amount of data generated requires new ways for storage and processing. Hadoop opened new possibilities, yet it falls short of instant delivery. Adding stream processing using Nathan Marz’s Storm, can overcome this delay and bridge the gap to real-time aggregation and reporting. On the Batch layer all master data is kept and is immutable. Once the base data is stored a recurring process will index the data. This process reads all master data, parses it and will create new views out of it. The new views will replace all previously created views. In the Speed layer data is stored not yet absorbed in the Batch layer. Hours of data instead of years of data. Once the data is indexed in the Batch layer the data can discarded in the Speed layer. The Query Service merges the data from the Speed and Batch layers. This presentation focuses on the Lambda architecture, which combines multiple technologies to be able to process vast amounts of data, while still being able to react timely and report near real-time statistics.
Nathan Bijnens is a developer with a passion for great code, the web and Big Data.
Nathan Bijnens is a developer with a passion for great code, the web and Big Data. He is interested in programming and system administration, especially where they meet, from scaling platforms to designing the architecture of new and existing products and everything in between. He is focused on data analysis and building Big Data Applications. Using Hadoop, in combination with Hadoop Pig, Hive and Cascading. He follows the rise of real-time big data closely, actively developing applications on top of Storm. And designing Lambda-like architectures. He advises on Big Data Strategies and evangelises Big Data to clients and at conferences. Nathan is a Big Data consultant for DataCrunchers.
Sign in to add slides, notes or videos to this session