Change instances logic for Flow is not risk free

Description

The logic for changing number of instances for Flowlet is risky in multiple ways:

1. It needs to wait for all containers being up and running before change could take place, however, the record in MDS is already updated. Hence if the change actually failed (due to whatever reason) and the Flow get restarted, the queue state may be corrupted due to inconsistency between MDS record and queue config table.
2. There can be race condition and state corruption if a second change request comes in when the first one is still processing
3. The whole logic is drive from app-fabric, meaning if the app-fabric is down during the change (which can take a while to acquire enough containers from YARN), it can runs into some weird state

Release Notes

None

Activity

Show:

Terence Yim April 25, 2019 at 8:18 PM

Flow is removed as of 6.0

Todd Greenstein August 8, 2016 at 8:35 PM

Moved to 4.1, per discussion with

Won't Fix
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Priority

Created March 31, 2015 at 6:00 PM
Updated April 25, 2019 at 8:18 PM
Resolved April 25, 2019 at 8:18 PM