Posts

Showing posts from December, 2025

Securiting Hadoop HA Yarn: Part 1 TLS & PKI

Image
 First lets go over the basics of the Hadoop stack. The storage or filesystem HDFS is  separated from the compute Yarn. The hadoop-yarn handles container orchestration. It has two key components the ResouceManager (RM) which is the leader , and the NodeManager which is the worker . The design is a common tried and true approach of having the leader handle handle a small number of discrete tasks and scale out the workers do hundreds or thousands of nodes.  RM <- - - [ NM1, NM2, RM3, NM4, NM... ]   The design is not peer-to-peer. The NodeManager nodes startup and announce themselves to the RM. When an  Application  or Job  is submitted to the group the RM enforces quota's and ultimately provides the mechanism for Application to request containers and use them for computation.  I have been working hard an a series of compositionsfor running Hadoop on containers (docker/k8s) and on bare metal (ansible). You can find all the material (githu...

Testing resillence4j retry with mockable fault injection

 One thing that separates a well engineered project from a so-so one is attention payed to error handling and retry. This isn't always easy as it sounds, first you need to taxonomize Exceptions, then come up with appropriate retry and backoff strategies. To that end, I created some interfaces and classes to make a good showing of how to do this.  The link to the code is here but rthe blog  we will walk it step by step. Resilence4j (https://resilience4j.readme.io/docs/getting-started) helps with a great deal of this by making a small purpose build librrary with nice building blocks like retry, bulkhead, and circuit breaker.. If you read about hystrix from nextflix years back this library grew from that one. You can not go in "half-baked": Some things are useless to retry such as a NullPointerException based on bad input. One of my favorite ways to design APIs is to introduce a clear exception hierarchy, things you can retry and things you can not. Below I used Interf...