Cluster Setup (Replication)

CDAP replication relies on the cluster administrator setting up replication on HBase, HDFS, and Kafka.

  • It is assumed that CDAP is only running on the master cluster.

  • It is assumed that you have not started CDAP before any of these steps.

HBase

  • Install the relevant cdap-hbase-compat package on all HBase nodes of your cluster in order to use the replication status coprocessors. Note that due to HBase limitations, these coprocessors cannot be used on HBase 0.96 or 0.98.

    Available "compat" packages are:

    • cdap-hbase-compat-1.0

    • cdap-hbase-compat-1.0-cdh

    • cdap-hbase-compat-1.0-cdh5.5.0

    • cdap-hbase-compat-1.1

    • cdap-hbase-compat-1.2-cdh5.7.0

  • Modify hbase-site.xml on all HBase nodes to enable HBase replication, and to use the CDAP replication status coprocessors:

    <property> <name>hbase.replication</name> <value>true</value> </property> <property> <name>hbase.coprocessor.regionserver.classes</name> <value>io.cdap.cdap.data2.replication.LastReplicateTimeObserver</value> </property> <property> <name>hbase.coprocessor.wal.classes</name> <value>io.cdap.cdap.data2.replication.LastWriteTimeObserver</value> </property>
  • Modify hbase-env.sh on all HBase nodes to include the HBase coprocessor in the classpath:

    export HBASE_CLASSPATH="$HBASE_CLASSPATH:/<cdap-home>/<hbase-compat-version>/coprocessor/*" # <cdap-home> will vary depending on your distribution and installation # # <hbase-compat-version> is the HBase package compatible with the distribution
  • Restart HBase master and regionservers.

  • Enable replication from master to slave:

    master_hbase_shell> add_peer '[slave-name]', '[slave-zookeeper-quorum]:/[slave-zk-node]' # For example: master_hbase_shell> add_peer 'slave', 'slave.example.com:2181:/hbase'
  • Enable replication from slave to master:

  • Confirm that HBase replication is working:

HDFS

Set up HDFS replication using the solution provided by your distribution. HDFS does not have true replication, but it is usually achieved by scheduling regular distcp jobs.

Kafka

Set up replication for the Kafka brokers you are using. Kafka MirrorMaker is the most common solution. See Mirroring data between clusters and Kafka mirroring (MirrorMaker) for additional information.

Created in 2020 by Google Inc.