Remote hadoop provisioner to support kerberized cluster

Description

Remote hadoop provisioner should support running pipelines on kerberized hadoop clusters for batch and realtime.

Release Notes

Added support for Kerberos Hadoop cluster in the Remote Hadoop Provisioner

Activity

Show:
Terence Yim
December 23, 2020, 9:33 PM

6.3

Terence Yim
December 21, 2020, 6:55 PM

6.4

Albert Shau
October 9, 2019, 6:07 PM

Alternatively we could add kerberos specific logic to CDAP's remote twill classes (https://github.com/cdapio/cdap/blob/release/6.1/cdap-app-fabric/src/main/java/io/cdap/cdap/internal/app/runtime/distributed/remote/RemoteExecutionTwillPreparer.java and friends), and add some way for provisioners to specify the principal and password for a run.

Albert Shau
October 9, 2019, 6:02 PM

This requires a kinit before submitting the yarn job, similar to how somebody would run a job manually. I don't think there is a quick way to do this.

Most of the work would be in expanding the provisioner's responsibilities to include actually launching the job, since right now there isn't any hook that would let the provisioner run kinit. We have discussed doing this already, as we want to submit dataproc jobs through the dataproc apis instead of requiring ssh access to a cluster node. But this involves some non trivial design and refactoring.

Fixed
Your pinned fields
Click on the next to a field label to start pinning.

Assignee

Terence Yim

Reporter

Sreevatsan Raman