Securiting Hadoop HA Yarn: Part 1 TLS & PKI
First lets go over the basics of the Hadoop stack. The storage or filesystem HDFS is separated from the compute Yarn. The hadoop-yarn handles container orchestration. It has two key components the ResouceManager (RM) which is the leader , and the NodeManager which is the worker . The design is a common tried and true approach of having the leader handle handle a small number of discrete tasks and scale out the workers do hundreds or thousands of nodes. RM <- - - [ NM1, NM2, RM3, NM4, NM... ] The design is not peer-to-peer. The NodeManager nodes startup and announce themselves to the RM. When an Application or Job is submitted to the group the RM enforces quota's and ultimately provides the mechanism for Application to request containers and use them for computation. I have been working hard an a series of compositionsfor running Hadoop on containers (docker/k8s) and on bare metal (ansible). You can find all the material (githu...