Does HDFS Use Zookeeper?
If you are new to Hadoop you might not be familiar with Zookeeper. You have probably heard of it but aren’t quite sure why it is needed or what it is used for. You might be wondering specifically, “Does HDFS use ZooKeeper?”.
Answer: Yes, in newer versions. It uses it but doesn’t need it unless you are using a cluster.
Older versions of Hadoop do not use Zookeeper. This includes 1.x. Starting with version 2.0, Hadoop does use Zookeeper for clustering. You don’t absolutely need Zookeeper if you aren’t running a cluster. You can create a non-clustered version of Hadoop ( for example in a lab environment ). When you do this it is completely possible to use HDFS without Zookeeper.
See our Zookeeper guide HERE.
Why does Hadoop need ZooKeeper?
Hadoop itself doesn’t necessarily need Zookeeper unless you are running a cluster. Hadoop does use Zookeeper for clustering though. Other components that operate within the Hadoop ecosystem, such as HBase, do need Zookeeper. You will need Zookeeper if you plan on running HBase. Zookeeper is relatively easy to set up but it does take a bit of admin work.
ZooKeeper vs YARN
YARN and Zookeeper do somewhat different things. Comparing them is like comparing apples and oranges. They are both fruit but different. YARN actually uses Zookeeper for clustering.
YARN handles resource allocation and scheduling for a cluster of nodes. It handles job requests and scheduling. It manages priorities, fairness, scheduling strategies, resource constraints, and rules.
ZooKeeper is a cluster by itself. It provides distributed synchronization and naming. It can store bits of information. It is often used for leader election. While ZooKeeper is a cluster itself it is used by other services for their clustering.