Make Transaction Pruning to work on a replicated cluster
Description
In 4.1 CDAP supports hot-cold replication. CDAP relies on HBase replication (that uses WAL replication) to replicate transactional data. In case of a failover, the transaction snapshots from master are copied over to the slave and the slave CDAP is then started.
However this causes issues when transaction pruning is enabled. Invalid data gets removed during compaction, and this removal does not get added to the WAL. So each cluster will have to run compactions independently. However, the latest transaction snapshot is coped over during the failover. This means there can be invalid data in the slave cluster but the invalid ids may have already been pruned in the latest transaction snapshot, which leads to invalid data becoming visible on the slave cluster after the failover.
In 4.1 CDAP supports hot-cold replication. CDAP relies on HBase replication (that uses WAL replication) to replicate transactional data. In case of a failover, the transaction snapshots from master are copied over to the slave and the slave CDAP is then started.
However this causes issues when transaction pruning is enabled. Invalid data gets removed during compaction, and this removal does not get added to the WAL. So each cluster will have to run compactions independently. However, the latest transaction snapshot is coped over during the failover. This means there can be invalid data in the slave cluster but the invalid ids may have already been pruned in the latest transaction snapshot, which leads to invalid data becoming visible on the slave cluster after the failover.