Security Runbook

Overview

This is a runbook for Security features in CDAP 3.5. It contains instructions for setting up tests for various scenarios described in Security-Impersonation-Namespace Mapping Scenarios

Namespace Mapping

  • Use the namespace creation REST API to create a namespace with custom mapped storage provider namespaces.
  • Namespaces in the underlying storage providers specified in the above API must be managed  by users. CDAP does not create/delete them.
  • One or more of, HDFS directory, HBase namespace and Hive database, can be mapped to custom ones whose lifecycle is managed by the user (but the entities should exist before the attempt to create the CDAP Namespace).
  • In order to create a namespace with custom mapping :
    curl -v -X PUT hostname:port/v3/namespaces/<ns-name> -d '{"name" : "<ns-name>", "description" : "", "config" : { "root.directory" : "/some/dir/on/hdfs", "hbase.namespace" : "custns", "hive.database" : "custhive" }}'
  • Care should be taken to not map more than one CDAP namespace to the same hdfs, hbase, hive entities. A check to prevent this will be added soon.
  • Once a namespace mapping has been set during the creation of the namespace, it can't be changed. It can't be added later after the namespace has been created as well.
  • The custom mapped entities should allow 'cdap' user (when impersonation is not enabled) to perform admin read, write, execute operations on those entities.

Impersonation

  • Impersonation allows CDAP to launch CDAP programs, hive queries as the user (configured at the namespace level). The user principal and keytabURI can be provided when creating the namespace
  • Keytab file should be put in HDFS or in the local filesystem on the master node and at the minimum, 'cdap' user should be able to read that file.
  • When impersonation is used all the operations related to that namespace, such as - namespace creation (including custom mapping), program run, hive query, hbase table read/write/create/delete, hdfs directory create/read/write are all performed as that user.
  • Since hdfs root directory creation happens by default under the /cdap/namespaces directory, impersonation will not work unless i) custom mapping for hdfs is set and RWX permissions are set for that user on that directory or ii) the user has RWX permissions for the /cdap/namespaces directory. Option i) is recommended!
  • If authorization is also enabled, when the programs (such as flows, mapreduce) try to create/read/write datasets, permissions authorization permission for that user is checked to make sure they have the sufficient privileges to do such operations. This also applies when dataset in other namespaces are accessed (the impersonated user should have sufficient access to access dataasets in the other namespace).

Setting up a namespace, with impersonation, hbase-mapping and file system-mapping

Note: Hive impersonation is not yet complete. All Hive (Explore) operations are still done as the cdap user.

To set up a namespace with impersonation configured, an hbase namesace mapping, and a file system mapping, make an HTTP request to create namespace as below.
Note that the configured "root.directory" must be an existing directory (on HDFS, if running distributed CDAP) and "hbase.namespace" must also be a pre-existing HBase namespace.

CDAP REST
PUT <HostAndPort>/v3/namespaces/<namespace-id>
with the following body. Make sure to replace the attributes within <>, such as <principal>.
{"name":"<namespace-id>","description":"<namespace description>","config":{"principal":"<principal>","keytabURI":"<path-to-keytab-file>", "root.directory":"<file-system-dir>", "hbase.namespace":"<hbase-namespace>"}}


As an example (using curl):

CDAP REST
curl -v -X PUT <HostAndPort>/v3/namespaces/foo -d '{"name":"foo","description":"My foo namespace","config":{"principal":"<PRINCIPAL>","keytabURI":"<KEYTAB_TOKEN>", "root.directory":"/tmp/foo", "hbase.namespace":"foo_ns"}}' -H "Authorization: Bearer <TOKEN>"


Now, operations within that namespace should happen with the configured principal and keytab.

Authorization

  • Authorization allows admin to restrict operations that can be performed by a user in CDAP - i.e., explicit permissions need to be provided to users to perform ADMIN, READ, WRITE operations on CDAP entities.
  • Authorization without impersonation has limited usage since the programs, when started, run as 'cdap' user and thus any authorization check for dataset access is done for 'cdap' user even though a different user started the program. That is, any operation performed outside of CDAP programs are authorization-enforced as that user.
  • Apache Sentry can be integrated as the ACL management and enforcer tool for CDAP entities.
  • Note: Stream authorization is still WIP and thus authorization won't be enforced for any operations on streams.

Enabling Authorization

  • Set security.enabled, and security.authorization.enabled to true
  • To install Sentry, set up a CDAP cluster using Cloudera Manager, and add the Sentry Service to it.
  • Follow instructions to setup CDAP with Apache Sentry as the authorization backend.

Secure Key Management 

 

Created in 2020 by Google Inc.