When you want to measure fractions of a millimeter, you get a micrometer. When you want to measure centimeters, you get a ruler. When you want to measure kilometers, you might use a laser beam. The abstract task is the same in all cases, but the tools differ significantly based on the size of the measurement.
Likewise, there are some computations that can be done quickly on data structures that fit into memory. Some can't fit into memory, but will fit on the direct-attached disk of a single computer. But when you've got many terabytes or even petabytes of data, you need tooling adapted to the scale of the task. Enter Hadoop.
Hadoop is a widely-used open source framework for storing massive data sets in distributed clusters of computers and efficiently distributing computational tasks around the cluster. Come learn about the Hadoop File System (HDFS), the MapReduce pattern and its implementation, and the broad ecosystem of tools, products, and companies that have grown up around this ground-breaking project.
Full-stack JVM generalist. Passionate teacher. GitHubber. Husband of one, father of three. Believer in Christ. bio from Twitter
Sign in to add slides, notes or videos to this session