Authorization - CDAP 3.4

Authorization - CDAP 3.4

Goals

  1. Authorize a subset of operations on CDAP entities using Apache Sentry

  2. Make the authorization system pluggable. Support the following two systems to begin with:

    1. Sentry based

    2. CDAP Dataset based

Checklist

User stories documented (Rohit/Bhooshan)
User stories reviewed (Nitin)
Design documented (Rohit/Bhooshan)
Design reviewed (Andreas)
Feature merged (Rohit/Bhooshan)
Examples and guides (Rohit)
Integration tests (Bhooshan) 
Documentation for feature (Rohit/Bhooshan)
Blog post 

User Stories

  • As a CDAP system, I should be able to integrate with Apache Sentry for fine-grained role-based access controls of select CDAP operations 

  • As a CDAP admin, I should be able to easily configure Sentry to work with CDAP on different type of cluster (ex: CDH, CM cluster etc). 

  • As a CDAP admin, I should be able to create/update/delete roles in Apache Sentry

  • As a CDAP admin, I should be able to add users/groups to roles in Apache Sentry

  • As a CDAP admin, I should be able to turn authorization on/off easily for entire CDAP instance

  • As a CDAP system, I should be able to authorize the following requests

    • Namespace create/update/delete

    • Application deployment

    • Program start/stop

    • Stream read/write  (Not Implemented in 3.4)
      These operations are a subset that represents the various 'kinds' of operations allowed in CDAP

Scenarios

Scenario #1

  • D-Rock is an IT-Admin extra-ordinaire who has just been tasked with adding authorizing access to entities in CDAP on the cluster he manages. 

  • D-Rock is already familiar with Apache Sentry, since he has used it for authorization in other projects like Apache HDFS, Apache Hive, Apache Sqoop, etc. 

  • He would rather not learn a new authorization system. He would instead prefer that Apache Sentry be used to provide Role Based Access Control to CDAP entities as well.

  • As part of this, he would also like a streamlined installation and configuration experience with Apache Sentry and CDAP, including detailed instructions.

Scenario #2

  • D-Rock manages a variety of CDAP clusters in dev/smoke/qa/staging environments along with the prod environment.

  • For these environments, he would like to be able to turn authorization on/off easily with a switch for the CDAP instance, depending on the need at a given time.

Scenario #3

  • Ideally, D-Rock would like to be able to authorize all operations on all entities in CDAP. 

  • However, this can be rolled out in phases. In the initial phase, he would like to control who can:

    • Create/update/delete a namespace

      • Only users with WRITE permission on CDAP instance should be able to perform this operation.

      • A property in sentry-site.xml will decide a set of users who have admin permission on cdap instance. These admins can then later grant permissions to other users.

    • Deploy an application in a namespace

      • Only users with WRITE permission on the namespace should be able to perform this operation

      • One the application is deployed the the user who deployed becomes the ADMIN of the application. 

    • Start/stop a program

      • Only users with READ permission on the namespace and application, and EXECUTE permission on the program should be able to perform this operation

      • Only users with ADMIN permission on the program can set preference for the program

      • Only users with WRITE permission can provide runtime args

    • Read/write to a stream

      • Only users with READ privilege on the namespace and READ permission on the stream should be able to read from the stream

      • Only users with READ privilege on the namespace and WRITE permission on the stream should be able to write to the stream

      • Note: We have decided not to handle views separately. A user have same permission on all views of a stream as what it has on the stream. 

Entities, Operations and Privileges

Entity

Operation

Required Privileges

Resultant Privileges

Entity

Operation

Required Privileges

Resultant Privileges

Namespace

create

ADMIN (Instance)

ADMIN (Namespace)

 

update

ADMIN (Namespace)

 

 

list

READ (Instance)

 

 

get

READ (Namespace)

 

 

delete

ADMIN (Namespace)

 

 

set preference

WRITE (Namespace)

 

 

get preference

READ (Namespace)

 

 

search

READ (Namespace)

 

Artifact

add

WRITE (Namespace)

ADMIN (Artifact)

 

delete

ADMIN (Artifact)

 

 

get

READ (Artifact)

 

 

list

READ (Namespace)

 

 

write property

ADMIN (Artifact)

 

 

delete property

ADMIN (Artifact)

 

 

get property

READ (Artifact)

 

 

refresh

WRITE (Instance)

 

 

write metadata

ADMIN (Artifact)

 

 

read metadata

READ (Artifact)

 

Application

deploy

WRITE (Namespace)

ADMIN (Application)

 

get

READ (Application)

 

 

list

READ (Namespace)

 

 

update

ADMIN (Application)

 

 

delete

ADMIN (Application)

 

 

set preference

WRITE (Application)

 

 

get preference

READ (Application)

 

 

add metadata

ADMIN (Application)

 

 

get metadata

READ (Application)

 

Programs

start/stop/debug

EXECUTE (Program)

 

 

set instances

ADMIN (Program)

 

 

list

READ (Namespace)

 

 

set runtime args

EXECUTE (Program)

 

 

get runtime args

READ (Program)

 

 

get instances

READ (Program)

 

 

set preference

ADMIN (Program)

 

 

get preference

READ (Program)

 

 

get status

READ (Program)

 

 

get history

READ (Program)

 

 

add metadata

ADMIN (Program)

 

 

get metadata

READ (Program)

 

 

emit logs

WRITE (Program)

 

 

view logs

READ (Program)

 

 

emit metrics

WRITE (Program)

 

 

view metrics

READ (Program)

 

Streams

create

WRITE (Namespace)

ADMIN (Stream)

 

update properties

ADMIN (Stream)

 

 

delete

ADMIN (Stream)

 

 

truncate

ADMIN (Stream)

 

 

enqueue
asyncEnqueue
batch

WRITE (Stream)

 

 

get

READ (Stream)

 

 

list

READ (Namespace)

 

 

read events

READ (Stream)

 

 

set preferences

ADMIN (Stream)

 

 

get preferences

READ (Stream)

 

 

add metadata

ADMIN (Stream)

 

 

get metadata

READ (Stream)

 

 

view lineage

READ (Stream)

 

 

emit metrics

WRITE (Stream)

 

 

view metrics

READ (Stream)

 

Datasets

list

READ (Namespace)

 

 

get

READ (Dataset)

 

 

create

WRITE (Namespace)

ADMIN (Dataset)

 

update

ADMIN (Dataset)

 

 

drop

Created in 2020 by Google Inc.