Apache ZooKeeper Tutorial
Welcome to our Apache Zookeeper tutorial. We’re going to cover the key Zookeeper concepts. We’re also going to cover installation and configuration. This is a great place to get started. This document should get you up to speed quickly.
2xF+1
Terms and Concepts
- ensemble - a Zookeeper cluster
- quorum - minimum number of replicated servers needed to vote
- znode - data register
- servers are either a follower or leader
- ephemeral node - lasts as long as the process that created them
- client can connect to any host, if disconnected, will reconnect to a different host
- data is kept in memory
- ordered - transactions have a number for order ( good for synchronization primitives )
- writes to disk before memory
- for performance - dedicated transaction log disk, lots of RAM and heap
- self healing - server rejoins on start needs a majority, usually clusters have an odd number
- Sequential Consistency - Updates from a client will be applied in the order that they were sent.
- Atomicity - Updates either succeed or fail. No partial results.
- Single System Image - A client will see the same view of the service regardless of the server that it connects to.
- Reliability - Once an update has been applied, it will persist from that time forward until a client overwrites the update.
- Timeliness - The clients view of the system is guaranteed to be up-to-date within a certain time bound.
Apache ZooKeeper Setup and Installation
- get JDK
- set heap
- unpack Zookeeper
conf/zoo.cfg
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
Manual run:
java -cp zookeeper.jar:lib/*.jar:conf \
org.apache.zookeeper.server.quorum.QuorumPeerMain zoo.cfg
Control Script - Start/Stop:
bin/zkServer.sh start
bin/zkServer.sh stop
bin/zkServer.sh status
Don’t really need this - sets up dirs if they aren’t configured to be created automatically:
bin/zkServer-initialize.sh
Replicated Apache ZooKeeper Cluster
If running multiple instances on one host:
- different config file for each instance
- different datadir
- different ports ( all three )
For each instance add a ‘myid’ file in its data dir. This file holds the id for the instance. It needs to be a number between 1 and 255. For example, you might have three files that look like this:
- /var/lib/zookeeper1/myid
- /var/lib/zookeeper2/myid
- /var/lib/zookeeper3/myid
Each host in ensemble needs to know each other host.
conf/zoo.cfg
tickTime=2000
dataDir=/var/lib/zookeeper/
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888
Config Details
Ports:
TCP 2888 | quorum port, followers connect to leader |
TCP 3888 | leader election port |
TCP 2181 | client port |
Test Client
Java Test Client
Run it using the zkCli.sh script:
bin/zkCli.sh -server 127.0.0.1:2181
OR run the JAR directly:
java -cp zookeeper.jar:lib/slf4j-api-1.7.5.jar:lib/\
slf4j-log4j12-1.7.5.jar:lib/log4j-1.2.16.jar:conf:src/java/lib/jline-2.11.jar \
org.apache.zookeeper.ZooKeeperMain -server 127.0.0.1:2181
C Test Client
Build and run the C test client:
make cli_st # build singlethreaded
make cli_mt # build mulithreaded
cli_mt 127.0.0.1:2181
LD_LIBRARY_PATH=. cli_mt 127.0.0.1:2181
ZK Shell
Use the test client described above to connect and use the ZK Shell.
ZK Shell - Example 1:
help
create /zk_test my_data # create znode that holds "my_data"
get /zk_test # check data for this znode
set /zk_test junk # set znode value
delete /zk_test # delete znode
ZK Shell - Example 2:
create /node_a 123 # assign '123' to /node_a
get /node_a # show it
set /node_a 321 # assign new value '321'
get /node_a # show it again
ZooKeeper Commands ( The Four Letter Words )
Simple four letter commands can be sent to the client port (2181):
echo ruok | nc 127.0.0.1 2181
Here is a listing of the four letter client port commands:
conf | New in 3.3.0: Print details about serving configuration. |
cons | New in 3.3.0: List full connection/session details for all clients connected to this server. |
crst | New in 3.3.0: Reset connection/session statistics for all connections. |
dump | Lists the outstanding sessions and ephemeral nodes. This only works on the leader. |
envi | Print details about serving environment |
ruok | Tests if server is running in a non-error state. May or may not have joined the quorum. |
srst | Reset server statistics. |
srvr | New in 3.3.0: Lists full details for the server. |
stat | Lists brief details for the server and connected clients. |
wchs | New in 3.3.0: Lists brief information on watches for the server. |
wchc | New in 3.3.0: Lists detailed information on watches for the server, by session. WARNING - could be **expensive (impact performance) |
wchp | New in 3.3.0: Lists detailed information on watches for the server, by path. WARNING - could be **expensive (impact performance) |
mntr | New in 3.4.0: Outputs a list of variables that could be used for monitoring the health of the cluster. |
Admin Server
This is an embedded Jetty server. It acts as a GUI for the four letter Zookeeper commands. It was new as of version 3.5.0 which was quite a while ago.
You can access commands like this: http://localhost:8080/commands/stat
You can see available commands listed here: http://localhost:8080/commands
The admin server is enabled by default, disable by setting this to false: zookeeper.admin.enableServer.
The following options are available to configure the admin server
- zookeeper.admin.enableServer
- zookeeper.admin.serverPort
- zookeeper.admin.commandURL
Simple ZooKeeper API
Zookeeper has a simple API. These commands are available.
create | creates a node at a location in the tree |
delete | deletes a node |
exists | tests if a node exists at a location |
get data | reads the data from a node |
set data | writes data to a node |
get children | retrieves a list of children of a node |
sync | waits for data to be propagated |
Maintenance, Monitoring, More Configs
Some dirs to be aware of:
- dataDir - data snapshots, myid file, transaction logs ( default )
- dataLogDir - different dir for transaction logs ( for low latency )
Clear old snapshots and log files ( retain ‘count’ backups ):
java -cp zookeeper.jar:lib/slf4j-api-1.7.5.jar:lib/slf4j-log4j12-1.7.5.jar\
:lib/log4j-1.2.16.jar:conf org.apache.zookeeper.server.PurgeTxnLog \
<dataDir> <snapDir> -n <count>
Some snapshot management settings you might want to use:
- autopurge.snapRetainCount - auto purge with these
- autopurge.purgeInterval - auto purge with these
Logging:
- conf/log4j.properties - log rolling can be setup here, this file needs to be in the bin dir or on the class path
Tips
- monitor with JMX and with command port ( 4 letter words )
- if db is corrupt, delete files in these dirs: datadir/version-2 and datalogdir/version-2
- recovery: combine latest snapshot with transaction log
Observers
A node can be setup as an observer. Here are some of the features of observers.
- same as a normal member but doesn’t vote ( fewer hosts for faster voting / writing )
- only hear the results of votes, not the agreement protocol that leads up to them
- clients may connect to them and send read and write requests to them
- helps with scaling but the cluster availability won’t be harmed when they fail
- instances in other datacenters could create false positives if they vote
To configure a node as an observer first add this only in the observers config file:
peerType=observer
Then add an entry like this in every config file:
server.1:localhost:2181:3181:observer
Cluster Setup Example - Single Host with ActiveMQ
We’re setting this up under the assumption that it will be runon a single host. If run on separate hosts you wouldn’t need to number the config files or create unique paths for the dataDir.
Unpack Zookeeper and ActiveMQ:
tar xvfz zookeeper-3.4.6.tar.gz
tar xvfz apache-activemq-5.11.1-bin.tar.gz
Edit each config file:
vi conf/zoo_instance1.cfg
vi conf/zoo_instance2.cfg
vi conf/zoo_instance3.cfg
conf/zoo_instance1.cfg
tickTime=2000
dataDir=/var/tmp/zookeeper1/
clientPort=2181
initLimit=5
syncLimit=2
server.1=localhost:2881:3881
server.2=localhost:2882:3882
server.3=localhost:2883:3883
conf/zoo_instance2.cfg
tickTime=2000
dataDir=/var/tmp/zookeeper2/
clientPort=2182
initLimit=5
syncLimit=2
server.1=localhost:2881:3881
server.2=localhost:2882:3882
server.3=localhost:2883:3883
conf/zoo_instance3.cfg
tickTime=2000
dataDir=/var/tmp/zookeeper3/
clientPort=2183
initLimit=5
syncLimit=2
server.1=localhost:2881:3881
server.2=localhost:2882:3882
server.3=localhost:2883:3883
Setup directories and ID files:
mkdir /var/tmp/zookeeper1
mkdir /var/tmp/zookeeper2
mkdir /var/tmp/zookeeper3
echo 1 > /var/tmp/zookeeper1/myid
echo 2 > /var/tmp/zookeeper2/myid
echo 3 > /var/tmp/zookeeper3/myid
Start up the servers:
bin/zkServer.sh start conf/zoo_instance1.cfg
bin/zkServer.sh start conf/zoo_instance2.cfg
bin/zkServer.sh start conf/zoo_instance3.cfg
Check their status:
echo; echo stat | nc 127.0.0.1 2181; echo;echo
echo; echo stat | nc 127.0.0.1 2182; echo;echo
echo; echo stat | nc 127.0.0.1 2183; echo;echo
ActiveMQ Setup
WARNING - LevelDB is deprecated, we are keeping this example for now but don’t use this to setup ActiveMQ.
Fire up an initial instance to test it:
cd apache-activemq-5.11.1
chmod 755 bin/activemq
bin/activemq start
bin/activemq stop
Configure it:
<replicatedLevelDB
directory="activemq-data"
replicas="3"
bind="tcp://0.0.0.0:0"
zkAddress="zoo1.example.org:2181,zoo2.example.org:2181,zoo3.example.org:2181"
zkPassword="password"
zkPath="/activemq/leveldb-stores"
hostname="broker1.example.org"
/>
Create instances:
bin/activemq create instance1
bin/activemq create instance2
bin/activemq create instance3
Configure web ports:
vi instance1/conf/jetty.xml
<property name="port" value="8161"/>
vi instance2/conf/jetty.xml
<property name="port" value="8162"/>
vi instance3/conf/jetty.xml
<property name="port" value="8163"/>
Startup the instances:
cd instance1/
bin/instance1 start
cd ../instance2/
bin/instance2 start
cd ../instance3/
./bin/instance3 start
References
NOTE - Version 3.6.0 is current as of March 6, 2020.