Impersonation

Impersonation allows users to run programs and access datasets and other resources as pre-configured users (a principal). Currently, CDAP supports configuring impersonation at a namespace and at an application level, with application-level configuration having a higher precedence than namespace level.

Namespace-level impersonation means that every namespace has a single principal that all programs in that namespace run as, and that resources are accessed as.

Application-level impersonation means that every application has a single principal that all programs in that application run as, and that resources are accessed as. Any datasets created by the application would be owned by that user.

Datasets created outside of an application can be created with a principal. Otherwise, they would be owned by the principal defined for the namespace in which they are created.

Requirements

To utilize this feature, Kerberos must be enabled on the cluster and configured in cdap-site.xml, using the parameter kerberos.auth.enabled.

To configure a namespace to have impersonation, specify the Kerberos principal and keytabURI in the namespace configuration. The keytab file (the "keytab") must be readable by the CDAP user and can be on either the local file system of the CDAP Master or on HDFS. If the keytab is on HDFS, prefix the path with hdfs://. If CDAP Master is HA-enabled, and the local file system is used, the keytab must be on all local file systems used with the CDAP Master instances.

If these are not specified, the principal and keytab of the CDAP Master user will be used instead. These are defined by the properties cdap.master.kerberos.principal and cdap.master.kerberos.keytab respectively in the cdap-site.xml file.

The configured Kerberos principal must have been granted permissions for the operations that will occur in that namespace. For instance, if a custom HBase namespace is configured, the configured principal must have privileges to create tables within that HBase namespace. If no custom HBase namespace is specified, the configured principal must have privileges to create namespaces.

Because of this, it is simplest to specify a custom mapping for root.directory and hbase.namespace when using impersonation so that the privileges granted to the configured principal can be kept to a minimum.

HDFS Permissions

In the case of impersonation, every user who can be impersonated will need access to their corresponding HDFS /user/<username> directory. The commands for this are described in the installation section packages.

Note that you can use the HDFS command hdfs groups [username ...] to confirm that the groups are set correctly, and that external security services such as LDAP are configured correctly.

Application-level Impersonation

To use application-level impersonation in CDAP, where applications and datasets have their own owner and the operations performed in CDAP impersonate their respective owners, CDAP needs to have access to the owner principal and their associated keytabs.

For user's keytab access, CDAP uses these conventions:

  • All keytabs must be present on the local filesystem of nodes on which the CDAP Master is running.

  • These keytabs must be present under a path which can be in one of these formats and the cdap system user should have read access to all of the keytabs:

    /<dir-1>/<dir-2>/${name}.keytab /<dir-1>/<dir-2>/${name}/${name}.keytab
  • The above path is provided to CDAP as a configuration parameter in the cdap-site.xml file, such as:

    <property> <name>security.keytab.path</name> <value>/etc/security/keytabs/${name}.keytab</value> </property>

    where ${name} will be replaced by CDAP by the short user name of the Kerberos principal CDAP is impersonating.

    Note: You will need to restart CDAP for this configuration change to take effect.

Owner principal of an entity is provided either when an entity is created using the CDAP CLI or the Microservices or when an application creates them.

CDAP Authorization

Impersonation works with CDAP Authorization, and if it is enabled, it will be enforced. For details, see the sections on enabling on enabling authorization in CDAP and managing privileges.

Limitations

The configured HDFS delegation token timeout must be larger than the log saver's maximum file lifetime (log.saver.max.file.lifetime.ms), which has a value of six hours (21600000).

Created in 2020 by Google Inc.