Apache Zookeeper

Terms and Concepts

  • ensemble - a Zookeeper cluster
  • quorum - minimum number of replicated servers needed to vote
  • znode - data register
  • servers are either a follower or leader


  • ephemeral nodes last as long as the process that created them
  • client can connect to any host, if disconnected, will reconnect to a different host
  • in memory
  • ordered - transactions have a number for order ( good for synchronization primitives )
  • writes to disk before memory
  • performance: dedicated transaction log disk, lots of RAM and heap
  • self healing - server rejoins on start
  • needs a majority, usually clusters have an odd number


  • Sequential Consistency - Updates from a client will be applied in the order that they were sent.
  • Atomicity - Updates either succeed or fail. No partial results.
  • Single System Image - A client will see the same view of the service regardless of the server that it connects to.
  • Reliability - Once an update has been applied, it will persist from that time forward until a client overwrites the update.
  • Timeliness - The clients view of the system is guaranteed to be up-to-date within a certain time bound.

Setup

  • get JDK
  • set heap
  • unpack Zookeeper
conf/zoo.cfg tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
Manual run: java -cp zookeeper.jar:lib/slf4j-api-1.7.5.jar:lib/slf4j-log4j12-1.7.5.jar:lib/log4j-1.2.16.jar:conf org.apache.zookeeper.server.quorum.QuorumPeerMain zoo.cfg Control Script - Start/Stop: bin/zkServer.sh start
bin/zkServer.sh stop
bin/zkServer.sh status

Don't really need this - sets up dirs if they aren't configured to be created automatically
bin/zkServer-initialize.sh


!!!!! there is an example in here of using JMX !!!!!!!!!!!!!!!!!



Replicated ZooKeeper

If running multiple instances on one host:
  • different config file for each instance
  • different datadir
  • different ports ( all three )
Add this for each instance: /var/lib/zookeeper/myid # file holds server id for this instance, put this file in dataDir
# ID between 1 and 255
conf/zoo.cfg tickTime=2000
dataDir=/var/lib/zookeeper/
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888

Config Details

server.id=host:port:port # each host in ensemble needs to know each other host


- first TCP port 2888 # quorum port, followers connect to leader
- second TCP port 3888 # leader election port


Test Client

Java test client java -cp zookeeper.jar:lib/slf4j-api-1.7.5.jar:lib/slf4j-log4j12-1.7.5.jar:lib/log4j-1.2.16.jar:conf:src/java/lib/jline-2.11.jar \ org.apache.zookeeper.ZooKeeperMain -server 127.0.0.1:2181
OR
bin/zkCli.sh -server 127.0.0.1:2181
C test client make cli_st # build singlethreaded
make cli_mt # build mulithreaded
cli_mt 127.0.0.1:2181

LD_LIBRARY_PATH=. cli_mt 127.0.0.1:2181

ZK Shell

- connect with the client above
help
create /zk_test my_data # create znode that holds "my_data"
get /zk_test # check data for this znode
set /zk_test junk # set znode value
delete /zk_test # delete znode

ZooKeeper Commands ( The Four Letter Words )

echo ruok | nc 127.0.0.1 2181
  • conf New in 3.3.0: Print details about serving configuration.
  • cons New in 3.3.0: List full connection/session details for all clients connected to this server.
  • crst New in 3.3.0: Reset connection/session statistics for all connections.
  • dump Lists the outstanding sessions and ephemeral nodes. This only works on the leader.
  • envi Print details about serving environment
  • ruok Tests if server is running in a non-error state. May or may not have joined the quorum.
  • srst Reset server statistics.
  • srvr New in 3.3.0: Lists full details for the server.
  • stat Lists brief details for the server and connected clients.
  • wchs New in 3.3.0: Lists brief information on watches for the server.
  • wchc New in 3.3.0: Lists detailed information on watches for the server, by session. WARNING - could be expensive (impact performance)
  • wchp New in 3.3.0: Lists detailed information on watches for the server, by path. WARNING - could be expensive (impact performance)
  • mntr New in 3.4.0: Outputs a list of variables that could be used for monitoring the health of the cluster.

adminserver

  • embedded Jetty server, GUI for 4 letter zookeeper commands
  • New in 3.5.0 ( ALPHA version )
  • enabled by default, disable by setting this to false: zookeeper.admin.enableServer
  • use commands like this: http://localhost:8080/commands/stat
  • commands listed here: http://localhost:8080/commands
  • options: zookeeper.admin.enableServer zookeeper.admin.serverPort zookeeper.admin.commandURL

Simple API

  • create # creates a node at a location in the tree
  • delete # deletes a node
  • exists # tests if a node exists at a location
  • get data # reads the data from a node
  • set data # writes data to a node
  • get children # retrieves a list of children of a node
  • sync # waits for data to be propagated

Maintenance, Monitoring, more configs

dataDir # data snapshots, myid file, transaction logs ( default )
dataLogDir # different dir for transaction logs ( for low latency )




Clear old snapshots and log files ( retain 'count' backups ):
java -cp zookeeper.jar:lib/slf4j-api-1.7.5.jar:lib/slf4j-log4j12-1.7.5.jar:lib/log4j-1.2.16.jar:conf org.apache.zookeeper.server.PurgeTxnLog <dataDir> <snapDir< -n <count>


autopurge.snapRetainCount # auto purge with these
autopurge.purgeInterval # auto purge with these
conf/log4j.properties # log rolling can be setup here, this file needs to be in the bin dir or on the class path


Tips

  • monitor with JMX and with command port ( 4 letter words )
  • if db is corrupt, delete files in these dirs: datadir/version-2 and datalogdir/version-2
  • recovery: combine latest snapshot with transaction log

Observers

  • same as a normal member but doesn't vote ( fewer hosts for faster voting / writing )
  • only hear the results of votes, not the agreement protocol that leads up to them
  • clients may connect to them and send read and write requests to them
  • helps with scaling but the cluster availability won't be harmed when the fail
  • note: instances in other datacenters could create false positives if they vote
peerType=observer # add this only in the observers config file
server.1:localhost:2181:3181:observer # add this in every config file

Cluster Setup Example - Single Host with ActiveMQ

Unpack:

tar xvfz zookeeper-3.4.6.tar.gz
tar xvfz apache-activemq-5.11.1-bin.tar.gz
Edit each config file:

vi conf/zoo_instance1.cfg
vi conf/zoo_instance2.cfg
vi conf/zoo_instance3.cfg


conf/zoo_instance1.cfg
tickTime=2000
dataDir=/var/tmp/zookeeper1/
clientPort=2181
initLimit=5
syncLimit=2
server.1=localhost:2881:3881
server.2=localhost:2882:3882
server.3=localhost:2883:3883


conf/zoo_instance2.cfg
tickTime=2000
dataDir=/var/tmp/zookeeper2/
clientPort=2182
initLimit=5
syncLimit=2
server.1=localhost:2881:3881
server.2=localhost:2882:3882
server.3=localhost:2883:3883


conf/zoo_instance3.cfg
tickTime=2000
dataDir=/var/tmp/zookeeper3/
clientPort=2183
initLimit=5
syncLimit=2
server.1=localhost:2881:3881
server.2=localhost:2882:3882
server.3=localhost:2883:3883


Setup directories and ID files:
mkdir /var/tmp/zookeeper1
mkdir /var/tmp/zookeeper2
mkdir /var/tmp/zookeeper3
echo 1 > /var/tmp/zookeeper1/myid
echo 2 > /var/tmp/zookeeper2/myid
echo 3 > /var/tmp/zookeeper3/myid



Start up the servers:

bin/zkServer.sh start conf/zoo_instance1.cfg
bin/zkServer.sh start conf/zoo_instance2.cfg
bin/zkServer.sh start conf/zoo_instance3.cfg



Check their status:

echo; echo stat | nc 127.0.0.1 2181; echo;echo echo; echo stat | nc 127.0.0.1 2182; echo;echo echo; echo stat | nc 127.0.0.1 2183; echo;echo

ActiveMQ Setup

WARNING - LevelDB is deprecated, we are keeping this example for now but don't use this to setup ActiveMQ



Fire up an initial instance to test it:

cd apache-activemq-5.11.1
chmod 755 bin/activemq
bin/activemq start
bin/activemq stop



Configure it:

 <replicatedLevelDB
    directory="activemq-data"
    replicas="3"
    bind="tcp://0.0.0.0:0"
    zkAddress="zoo1.example.org:2181,zoo2.example.org:2181,zoo3.example.org:2181"
    zkPassword="password"
    zkPath="/activemq/leveldb-stores"
    hostname="broker1.example.org"
    / >



Create instances:

bin/activemq create instance1
bin/activemq create instance2
bin/activemq create instance3



Configure web ports:

vi instance1/conf/jetty.xml
<property name="port" value="8161"/>
vi instance2/conf/jetty.xml
<property name="port" value="8162"/>
vi instance3/conf/jetty.xml
<property name="port" value="8163"/>



Startup the instances:

cd instance1/
bin/instance1 start
cd ../instance2/
bin/instance2 start
cd ../instance3/
./bin/instance3 start