- Improving performance of single Tephra Transaction Server
- Make Tephra Transaction server scale horizontally
- Make Tephra Transaction server Highly Available (HA) with Isolation
- Improve operational aspects of Transaction Server
- Improve performance of Workflow scheduling to schedule 1000s of jobs / second
- Transaction Invalid List Management
- Tephra Single Server Performance Improvements
- Isolation and Scalability of Tephra Transaction Server
- Improving scheduling performance of Workflow system
- System should automatically handle pruning of the transaction invalid list
- Reduce operational complexity for running manual steps to prune invalid transaction list
- Applied during major and striped compaction
- Metrics around the current invalid list size
- Tool to inspect and report progress on pruning
- Single Tephra Server should be able to support up-to ~ 10K transactions/second
- Support read-only and hierarchical conflict detection
- Run multiple instances of Tephra Transaction Server in active-active in single DC or multiple DC
- Isolation at namespace level
Currently, the invalid list keeps growing over time, if it's not pruned periodically using the manual process ( which is very tricky, time consuming and hard to operationalize ) the performance of transaction server gets affected. If we remove the manual process and make the list pruning automatic, it would reduce the operational complexity and also help improve the performance of the transactions. This will be implemented as hook into major compaction, meaning that the invalid list pruning would be triggered during major compaction.
A tool and user interface that can show the progress of pruning when running, impact of pruning on invalid list, show any regions that are behind preventing pruning on invalid transaction list.
This will be on focused on improving the performance of single transaction server. As part of this we will be improving on the locking granularity during conflict detection, support read-only transactions, improve group commit efficiency, hierarchical conflict detection, transmit only latest snapshot of invalid list and more.
Instead of using a single transaction server across all namespace, we would like to be able to have multiple instance of transaction servers supporting isolation for namespaces. So, there could be one instance of transaction server that would be supporting a namespace and that responsibility could be rotate among different instances of transaction server.
Running multiple instance of transaction server in active-active mode to shed load or for disaster recovery. This will also tie in with many of the stories for replication.
This will provide the capability to prune the invalid list during the stripped compaction or minor compaction depending on flexibility to do so.