Handle container delegation token update failures

Description

We have seen cases where the delegation tokens on containers expire when there is an exception while updating delegation tokens. This needs some more investigation, but this is likely during master becoming follower and not updating delegation token at the scheduled time.

The other case to handle is exception being thrown when fetching the delegation tokens from each server. We need to have a way to retry the token update if the delegation tokens cannot be acquired for some reason in a given update run.

Release Notes

None

Activity

Show:

Ali AnwarDecember 15, 2016 at 3:17 AM

There are still other failure scenarios that need handling, but the case that Andreas described is fixed.
Moving the other handling to 4.2.

Ali AnwarDecember 14, 2016 at 8:20 PM

Increasing the delegation token update margin to 1 hour instead of 5 minutes:
https://github.com/caskdata/cdap/pull/7354

Andreas NeumannNovember 18, 2016 at 11:03 PM

For 4.0, we can change it so that it renews an hour before expiration. That will deal with the case where the master fails over at that time, and by the time the secondary master is active, the token has expired. With an hour that should not happen.

Poorna ChandraNovember 3, 2016 at 9:59 PM

We update delegation tokens 5 mins before they are set to expire. It would be good to have a bigger margin so that we can handle delegation token update failures.

Sreevatsan RamanAugust 16, 2016 at 7:34 PM

Moving this to 4.0 based on

Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Labels

Affects versions

Components

Fix versions

Priority

Created August 11, 2016 at 5:50 AM
Updated July 7, 2020 at 6:01 PM