Secure Impersonation Specification

 

Introduction 

Implement configuring impersonation at the Application level. 

Goals

  • Application Impersonation: As a part of, CDAP-6131 - Ability for CDAP to run programs as a particular user. ( In Progress) we implemented impersonation for programs and data operations, but this could only be configured at the namespace level. We need the ability to configure this at the application level, so that we can run programs as different users, without having to manage additional namespaces for each app.
  • Entity ownership: CDAP-8065 - Entity Ownership: Entities in CDAP should have an owner and a group ( Open) Entities created by applications should be owned by the application owner.
  • Explore Impersonation (Stretch Goal): As a part of, CDAP-6587 - Impersonate users when performing hive operations ( Resolved) we implemented impersonation in Hive for Explore queries to impersonate the namespace user if one was provided. For better security measures we will like to run explore queries as the user who submits them.

 

Scenarios

Scenario 1: App Creation

Alice is a human user and will like to create an app using an artifact. Alice has ADMIN access on the CDAP instance. She specifies a kerberos user Louis as the owner (by providing Louis principal). After the app has been created she will like the following to be true:
  1. Louis should get all the privileges (READ/WRITE/EXECUTE) on all the entities created by the deployed application, with CDAP authorization.
  2. Louis should own all streams/dataset created by deployed app i.e. he will own the HDFS files and HBase tables.
  3. All programs should run with Louis' credentials (e.g. Kerberos ticket) i.e if another user Bob, who has sufficient privileges to run a program (EXECUTE on program and READ on namespace, if CDAP Authorization is turned on), starts the program then the program should run as Louis.
  4. Additionally, during app creation, Alice can also specify a group name. When the app is deployed CDAP will change the group of the HDFS files and/or Hbase/Hive tables so that the specified group users have read access.

Scenario 2: Dataset Creation/Maintenance

  1. Alice is a human user. Alice will like to create a dataset without deploying an application and during creation, she wants to specify an owner who will own the dataset i.e. the HDFS files/HBase tables. She specifies a headless user Louis, whose account she has access to, as the owner.
  2. Alice will like to perform dataset maintenance operations (truncate, delete, update) from REST APIs, CLI, or UI and she will like for these operations to be performed as the dataset owner Louis.
  3. Another user Bob who has sufficient privileges to administer the dataset can perform the maintenance operations, all operations will be performed as the dataset owner Louis.
  4. Additionally, during dataset creation, Alice can also specify a group name. When the dataset is created CDAP will change the group of the HDFS files and/or Hbase/Hive tables so that the specified group users have read access.

Scenario 3: Access Control

  1. Jules is a human user who does not have CDAP credentials and wants to run a Hive query outside of CDAP. Her access to the data can be controlled by group permissions.
  2. Mary is a headless user who owns a CDAP program that reads from a dataset owned by Louis. An admin adds Mary to the group for the dataset. The program owned by Mary can now read the dataset.
  3. (Stretch) Eve is a human user who has both LDAP and kerberos credentials. She logs into the CDAP UI with her LDAP credentials and submits a query. While submitting the query she provides her kerberos tgt. The query should be run as her kerberos principal.

 

Design

  • Impersonation is done using keytabs. All keytabs are accessible by the cdap user on all master nodes
  • Users to be impersonated must be set up outside of CDAP
  • For user principal to keytab management we will use the following conventions:

    • All keytabs are present on the local filesystem on which CDAP Master is running. 
    • These keytabs are present under a path which can be in one of the following formats and cdap has read access on all the keytabs:
      1. /dir1>/<dir2>/${name}.keytab
      2. /dir1>/<dir2>/${name}/${name}.keytab
    • The above path is provided to CDAP as a configuration parameter in cdap-security.xml

      <property>
         <name>security.keytab.dir</name>
         <value>/dir1>/<dir2>/${name}.keytab
      </value>
      </property>
  • User principal to keytab mapping is managed separately
  • Configuring an app for impersonation requires admin on the CDAP instance, with CDAP authorization
  • Without CDAP authorization any user will be able to impersonate any other user
  • After an app is deployed, any user with sufficient privileges on the program can start/stop programs, see its status, see metrics, see logs, etc. All such actions will be impersonated as the owner of the app regardless of the user doing it.
  • Explore will not be impersonated the same way. Explore queries will be run as the Kerberos principal provided by the user submitting the query. 
  • The user submitting the query will specify a Kerberos tgt. The tgt can be obtained by doing a kinit and providing user's Kerberos credentials. By default, it is located in /tmp with the name "krb5cc_<uid of the user>". The location can be controlled by setting KRB5CCNAME.
  • Audit log will show which logged-in user impersonated whom to run a query. 

For detailed design, please see Secure Impersonation - Security 4.1

 

API changes

New REST APIs

Entity Ownership:

Path
Method
Description
Response Code
Response
/v3/namespaces/<namespace>/apps/<app-id>GETGives the application details which will contain owner principal as a field

200 - On success

404 - When the specified app does not exist

500 - Any internal errors

{
 "key1":"value1",
 "key2":"value2",
 "owner":"owner-principal"
}
/v3/namespaces/<namespace-id>/streams/<stream-id>GETGives the stream properties which will contain owner principal as a field

200 - On success

404 - When the specified stream does not exist

500 - Any internal errors

{
  "ttl" : 9223372036854775,
  "format": {
    "name": "text",
    "schema": {
      "type": "record",
      "name": "stringBody",
      "fields": [
        { "name": "body", "type": "string" }
      ]
    },
    "settings": {}
  },
  "notification.threshold.mb" : 1024,
  "description" : "Web access logs",
  "owner" : "owner-principal"
}
/v3/namespaces/<namespace-id>/datasets/<dataset-name>/propertiesGETGives the dataset properties which will contain owner principal as a field

200 - On success

404 - When the specified dataset does not exist

500 - Any internal errors

{
 "key1":"value1",
 "key2":"value2",
 "owner":"owner-principal"
}

 

Entity Creation:

Create APIs for Stream/Datasets and Applications will take two additional JSON properties.

 

  1. Owner name as string specified as:

     

    "owner.principal""user-principal" 
    "allowed.group": "groupname"
    }

CLI Impact or Changes

  • (optional) Create CLI for the above REST APIs

UI Impact or Changes

  • Error rendering macro 'jira' : Unable to locate Jira server for this macro. It may be due to Application Link configuration.
     Provide a way for the user to specify kerberos credentials while launching an Explore query 
  • (optional) Create UI for the above REST APIs

Security Impact 

Authorization will need to be implemented on the new REST APIs (which manage the impersonation metadata and the users and their credentials ). Authorization will also need to be added when programmatically accessing this metadata (such as when launching the programs or performing dataset operations involving impersonation).

Impact on Infrastructure Outages 

This will rely on HBase for storing metadata (Similar to how we store all sorts of other metadata for applications). Without HBase (and dataset service), this will not work.

Releases

Release 4.1.0

Future work

  • Support ACLs for the HDFS files and HBase tables that are created when new CDAP entities are created.
  • Push down ACLs to the storage providers.
  • Support changing entity ownership

 

Created in 2020 by Google Inc.