by David McNeil
The foundation of our query processing engine is a concurrent data stream processor. This processor is characterized by the need to efficiently perform parallel, non-blocking processing of multiple data streams which are too large to fit in memory. Many such executions need to be executed simultaneously and fairly. The ideas in this talk are relevant to those who work with large scale, parallel data processing within the scope of a single process. A central theme of the talk is the creation of layers of abstractions to eventually create a language tailored to the problem. The talk discusses characteristics of the concurrent stream processor including: core data structures to represent processing nodes connected by data streams, processing plans represented as s-expressions, compiling s-expressions into processing nodes and streams, processing plan optimizations via s-expression manipulations, concurrent processing via a fork/join pool, facilities for debugging and cancelling executions and using the data stream processor as the core of a federated query processor.
10th–12th November 2011