Securiting Hadoop HA Yarn: Part 1 TLS & PKI
First lets go over the basics of the Hadoop stack. The storage or filesystem HDFS is separated from the compute Yarn. The hadoop-yarn handles container orchestration. It has two key components the ResouceManager (RM) which is the leader, and the NodeManager which is the worker. The design is a common tried and true approach of having the leader handle handle a small number of discrete tasks and scale out the workers do hundreds or thousands of nodes.
RM <- - - [ NM1, NM2, RM3, NM4, NM... ]
The design is not peer-to-peer. The NodeManager nodes startup and announce themselves to the RM. When an Application or Job is submitted to the group the RM enforces quota's and ultimately provides the mechanism for Application to request containers and use them for computation.
I have been working hard an a series of compositionsfor running Hadoop on containers (docker/k8s) and on bare metal (ansible). You can find all the material (github) here. This tutorial bounces back and forth, but ultimately the configuration only requires altering a few files which a "pro" can change in one shot.
Transport Security starts though cryptography SSL/TLS, however this brings us to the first challenge PKI. It is really hard to do a distributed tech with self signed certificates. You effectively spend more type finding work-a rounds in every component e.g. 'curl --insecure https://bla bla' then figuring it out.
Edgy has a nice Certificate Authority concept edgy-simple-ca. For the blog we have built the keystores, truststores, etc and checked them in. Your org probably has a better way to make the material, but my goal is to give you something that works and is repeatable easily. To that end here is how the material in generated.
- https://github.com/edwardcapriolo/edgy-ansible/blob/main/examples/create_ca.yml
- https://github.com/edwardcapriolo/edgy-ansible/blob/main/examples/install-edgy-client.sh
Great next we did mention this was an High Availability setup so lets revise our architecture. The Yarn resource manager has many options for redundancy but a tried and true approach is to use Apache Zookeeper for a leader election. This makes the new architecture look like this:
[zoo1, zoo2, zoo3]
^
[rm1 <-> rm2]
Lets start from the yarn-site.xml, Hadoop has a clever way of allowing multiple things to be configured in the "flat" configuration format.
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>rm1</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>rm1:8088</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>rm2</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>rm2:8088</value>
</property>
<property>
<name>hadoop.zk.address</name>
<value>zoo1:2182,zoo2:2182,zoo3:2182</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>rmhazk</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
One thing to watch out for: Do not put _ is the cluster-id. I got burned by this once as i tend to use _ for space, it blows up later in a weird way that is hard to understand.
One of the slightly frustrating part of SSL/PKI is that you (almost always) end up jumping between multiple configuration sources because some settings sit at the Java Virtual Machine. Anyway we have a "half configured" resource manager now, ready to talk to a secure zookeeper, lets set one (or three) up. Code here.
zoo2:
image: zookeeper:3.9.4-jre-17
restart: always
hostname: zoo2
volumes:
- ./pki/zoo2:/mnt/pki-node
- ./pki/shared:/mnt/pki-shared
environment:
ZOO_MY_ID: 2
ZOO_SERVERS: "server.1=zoo1:2888:3888;2182 server.2=zoo2:2888:3888;2182 server.3=zoo3:2888:3888;2182"
ZOO_CFG_EXTRA: >
secureClientPort=2182
serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
ssl.enabled=true
ssl.keyStore.location=/mnt/pki-node/zoo2.jks
ssl.keyStore.password=ssshhh
ssl.trustStore.location=/mnt/pki-shared/myTruststore.jks
ssl.trustStore.password=itssecret
ssl.trustStore.type=jks
sslQuorum=true
ssl.quorum.keyStore.location=/mnt/pki-node/zoo2.jks
ssl.quorum.keyStore.password=ssshhh
ssl.quorum.trustStore.location=/mnt/pki-shared/myTruststore.jks
ssl.quorum.trustStore.password=itssecret
Notice here there are two separate sets of SSL settings those starting with (ssl. and those starting with ssl.quorum. The settings for quorum are relatively new they dictate the ssl between the zk nodes:
[zoo1 <- ssl.quorum. -> zoo2 ]
While the original set diate the settings from the clients:
clientapp -> ssl. [zoo1 <-> zoo2 <-> zoo3]
For this setup we shared the same material across both. Designing PKI with nested sub ca's for the zookeeper peer list is a way you could do it, but it is probably not the best way. For our secure setup PKI is giving us encryption and trust that the client is "who they say they are". Limiting which applications can connect isn't a focus for this tutorial.
Now we hop back to our resource manager. Remember up above we said that not all the SSL settings go in the same place? Here is an example, the zookeeper properties are Java System properties. Luckily we can place them in the HADOOP_OPTS environment variable.
rm1:
image: tiny-yarn:latest
hostname: rm1
container_name: rm1
environment:
JAVA_HOME: "/usr"
HADOOP_LOG_DIR: "/yarn-root/rm1logs"
HADOOP_CONF_DIR: "/hd_conf"
HADOOP_OPTS: >
-Dzookeeper.client.secure=true
-Dzkclient.ssl.protocol=TLSv1.3 -Dzookeeper.ssl.protocol=TLSv1.3
-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
-Dzookeeper.ssl.keyStore.location='/mnt/pki-node/rm1.p12'
-Dzookeeper.ssl.keyStore.password=ssshhh
-Dzookeeper.ssl.trustStore.location=/mnt/pki-shared/myTruststore.jks
-Dzookeeper.ssl.hostnameVerification=true
-Dzookeeper.ssl.trustStore.password=itssecret
YARN_RESOURCEMANAGER_OPTS: >
-DOFFjavax.net.debug=all
entrypoint: "/opt/hadoop/bin/yarn"
command: [ "resourcemanager" ]
ports:
- "8088:8088"
- "8090:8090"
volumes:
- ./hd_conf:/hd_conf
- rm1-data:/yarn-root
- ./pki/rm1:/mnt/pki-node
- ./pki/shared:/mnt/pki-shared
- ./pki/rm1:/mnt/pki-server
Great, so now you should be able to kick everything on and reach the YARN HTTP ui on 8088. Our configuration has two ResourceManagers rm1 and rm2. They will use zookeeper to elect a leader, and the one not the leader will do an HTTP redirect to the other.
Lets look at the transport security checklist remaining.
- ZK <-> ZK: Done
- Yarn -> ZK: Done
- User -> RM: ToDo
- RM <-> NM: ToDo
The users of yarn are those who submit appliations and those that look at the UI. Remember this blog is not addressing authorization thus the work remaining is to ensure SSL/TLS for communications channels and PKI.
The settings go below sit in core-site.xml, This can be challenging in cases where you wish to share the configuration on multiple hosts. We tackle this by using different volume mounts in the docker but this is an implementation detail.
We can ask for two-way ssl where even the client has been issued a certificate you trust. No for now:
<property>
<name>hadoop.ssl.require.client.cert</name>
<value>false</value>
</property>
To achieve "two-way trust" you with to have the CommonName of the cert match the host making the connection. This isn't always easy and can involve adding one or more SAN (Subject Alternate Name) to the certs.
<property>
<name>hadoop.ssl.hostname.verifier</name>
<value>DEFAULT</value>
</property>
The important bits are here:
<property>
<name>hadoop.ssl.server.conf</name>
<value>ssl-server.xml</value>
</property>
<property>
<name>hadoop.ssl.client.conf</name>
<value>ssl-client.xml</value>
</property>
This properties delegate to other files.
<property>
<name>ssl.server.keystore.type</name>
<value>jks</value>
</property>
<property>
<name>ssl.server.keystore.location</name>
<value>/mnt/pki-server/server.jks</value>
</property>
<property>
<name>ssl.server.keystore.password</name>
<value>ssshhh</value>
</property>
<property>
<name>ssl.server.truststore.type</name>
<value>jks</value>
</property>
<property>
<name>ssl.server.truststore.location</name>
<value>/mnt/pki-shared/myTruststore.jks</value>
</property>
<property>
<name>ssl.server.truststore.password</name>
<value>itssecret</value>
</property>
Because these are in core-site both the ResourceManager and the NodeManager will use them. We swich back to yarn-site.xml. The setting yarn.http.policy is somewhat of an uber setting, in any place yarn supports HTTPS/SSL it typically disables the unsecured HTTP port and enables the HTTPS port
<property>
<name>yarn.resourcemanager.application-https.policy</name>
<value>LENIENT</value>
<!-- move to strict -->
</property>
<property>
<name>yarn.http.policy</name>
<value>HTTPS_ONLY</value>
</property>
The service will move to port, but should server the same content
rm1:
...
ports:
- "8088:8088"
- "8090:8090"
nm1:
...
ports:
- "8042:8042"
- "8044:8044"
There you have it! I have to say I think that is a pretty sweet setup. You get scale out computing and even when running locally the security knobs are on so it feels prod like.
Next up we will get into the authorization half of security, which helps us control who can do what.


Comments
Post a Comment