With impersonation enabled, there is failure to stop YARN applications after certain duration

Description

With impersonation enabled, there is a failure to stop/kill a program that has been started more than X hours ago. This value X is the kerberos ticket lifetime.
The reason is that the YarnClient we use to launch the Yarn application is the same one we use to try to kill/stop the application.

Release Notes

Avoid the caching of YarnClient in order to fix a problem that occurred in namespaces with impersonation configured.

Activity

Show:
Ali Anwar
October 8, 2016, 12:58 AM
Edited

This can be reproduced by:
1. Set the principal's maxlife to a short duration (5 minutes). Restart CDAP, just to be sure the change is picked up.
2. Start a program (I used Flow).
3. Wait 5+ minutes, stop the flow. The above exception and stack trace will be in master logs.

Ali Anwar
October 11, 2016, 12:21 AM
Edited

PR to fix this by avoiding caching the YarnClient and performing impersonation upon program stop/kill:
https://github.com/caskdata/cdap/pull/6926

Edit: pull/6926 was closed in favor of https://github.com/caskdata/cdap/pull/6931

Ali Anwar
October 11, 2016, 4:51 AM

When CDAP Master restarts, it creates ProgramControllers for each of the running programs. However, it does not do this under impersonation, and so each YarnClient used in the ProgramControllers has the UGI of the cdap system principal.

If there's a Yarn Application that the cdap system principal does not have VIEW access to, this YarnClient will return an ApplicationReport that returns "NA" for the getHost method, and -1 for the getPort method, but the Application Status is still correct (likely a gap in YARN security):
https://github.com/apache/hadoop/blob/branch-2.3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java#L460-L528

Because of that, the URL that we create is malformed:
https://github.com/caskdata/cdap/blob/f80013ec41991eccc765290dea96ee1ba5dc1c83/cdap-app-fabric/src/main/java/org/apache/twill/yarn/YarnTwillController.java#L142

This results in the following logs periodically appearing for each such app. The fix would be to perform impersonation while creating these YarnClients.

Fixed

Assignee

Ali Anwar

Reporter

Ali Anwar

Labels

Docs Impact

None

UX Impact

None

Components

Fix versions

Affects versions

Priority

Blocker
Configure