Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

TerminologyDefinition
CDAP Active/Standby
  • CDAP is only running in one cluster (active cluster)
    • CDAP in all other clusters shouldn't be running (standby cluster)
    • User applications can only be running in the active cluster
  • Data is being replicated via means outside of CDAP control
    • HBase replication
    • HDFS copy
    • Kafka mirror-maker
  • Data is available on all clusters
    • Data can be read on Standby clusters outside of CDAP
  • High level failover steps
    1. Stop all running apps in active cluster
    2. Stop CDAP in active cluster
    3. Wait for all data replication to settle
      1. Check via tools
    4. Pick a standby cluster to be the next active cluster and start CDAP
    5. Start applications on the new active cluster
  • Strictly speaking, CDAP is not aware of the replication at all
    • Since CDAP is not aware of the replication, all data needs to be replicated.
    • Otherwise inconsistency could occur when restarting CDAP on a different cluster than the previously active one.
  • Already doable in CDAP 3.5
  • CDAP 4.1 added extra API and tools to assist
    • API for externalizing HBase table creation
    • Tools for checking replication status
Hot/Cold replicationSame as CDAP Active/Standby
Active/Passive replicationSame as CDAP Active/Standby
CDAP Active/Active
  • CDAP is running in all clusters
  • CDAP is aware of the state replications between all CDAP instances
  • Relies on external means for data replication
    • CDAP system tables in HBase
    • Kafka topics for log collection
  • User data replication is outside of CDAP control
    • Depends on what user applications use
      • E.g. HBase replication, HDFS copy and Kafka mirror-maker
  • It is still active/standby from the application point of view
    • User can declares declare which cluster is the active one at the namespace level
    • User can declares declare which namespace need replications needs replication and which one does not
    • User can change the active cluster for a namespace
      • To switch the active cluster, the following will happen
        1. Stopping all running applications in that namespace in the active cluster
        2. Pick another cluster as the new active cluster for the namespace involved
        3. Start applications in that namespace again in the new active cluster
      • CDAP will provide easy switch to perform the three steps described above on behalf of the user
Hot/Hot replicationSame as CDAP Active/Active
CDAP Active/Active with Application Master/Slaves
  • Have everything described in CDAP Active/Active
  • User data is still being replicated via means outside of CDAP control
  • For a namespace
    • The active cluster for that namespace is the "master"
      • Writes only happen in the master cluster
    • All other clusters are the "slave" clusters
      • Receives user data updates from the "master" cluster via replication
      • Can start applications in that namespace for "read-only" operation
        • The same application that is already running in the "master" cluster cannot be started on any "slave" clusters

...