Today, we check program run status before updating it. We enforce that the STARTING state must exist before any transition can happen. Once we get to an end state, we enforce that the state cannot change.
However, consider a situation where a program is started, but before it can write it's starting state to TMS, it is killed or it dies. We will eventually get a 'killed' event, but the state will not be stored because we can't find any existing run record.
I don't think that 'starting' check should be done at all.
I think currently we blocked on the successfully writing the Starting state to TMS before the start sequence can proceed? If that’s the case, we should fix it as we rely on the ordering of events in TMS for the state transition to be correct.
This was opened due to errors in the CDAP master log about ignored 'killed' events. Originally I thought it was because 'STARTING' was somehow skipped, but it turns out those errors were due to CDAP-13218, so closing this.