Scheduler max concurrent runs constraint is not honored

Description

I have a pipeline scheduled to run every minute with a max concurrent runs of 1. Each run of the pipeline takes about 5-6 minutes.

Due to the max concurrent runs constraint, I expect to see only one active run of the pipeline at any given point of time. However, the pipeline gets executed multiple times concurrently. I have attached the app fabric logs that show this behavior. 

The issue is likely due to a race condition between the check for active runs and the starting of the program.

 

Release Notes

None

Attachments

1

Activity

Show:

Yaojie FengMarch 28, 2019 at 9:42 PM

By some investigation, the UI already set the waitUntilMet to false, so the behavior is already aborting on failure. The job queue table is also empty when the pipeline is running, which means the trigger actually gets deleted when the constraint not met. Still need to investigate why a pipeline can start multiple runs.

Sreevatsan RamanMarch 28, 2019 at 5:45 PM

There are more issues than this. Any failed schedules keep getting retried upto a day. This can potentially cause a lot of retry attempts and fill up Job Queue. 

 

As a quick fix, the default behavior could be abort on failure, which is one of the options for schedules that is exposed in programmatic APIs already.

Andreas NeumannMarch 28, 2019 at 5:42 PM

I think there may be two issues here: 

  • current code does not consider all states as active that it should 

  • even if it did, because of aync (TMS-based) update of program lifecycle status, there can always be a race where a program is starting but that status update is still in TMS and not visible yet. 

Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Affects versions

Components

Priority

Created March 25, 2019 at 9:30 PM
Updated December 7, 2020 at 6:47 PM