Remote hadoop provisioner to support kerberized cluster
Description
Remote hadoop provisioner should support running pipelines on kerberized hadoop clusters for batch and realtime.
Release Notes
Added support for Kerberos Hadoop cluster in the Remote Hadoop Provisioner
Activity
6.3
6.4
Alternatively we could add kerberos specific logic to CDAP's remote twill classes (https://github.com/cdapio/cdap/blob/release/6.1/cdap-app-fabric/src/main/java/io/cdap/cdap/internal/app/runtime/distributed/remote/RemoteExecutionTwillPreparer.java and friends), and add some way for provisioners to specify the principal and password for a run.
This requires a kinit before submitting the yarn job, similar to how somebody would run a job manually. I don't think there is a quick way to do this.
Most of the work would be in expanding the provisioner's responsibilities to include actually launching the job, since right now there isn't any hook that would let the provisioner run kinit. We have discussed doing this already, as we want to submit dataproc jobs through the dataproc apis instead of requiring ssh access to a cluster node. But this involves some non trivial design and refactoring.