Authorization 4.3

Authorization 4.3

Goals

  • Make CDAP authorization policy consistent across all entity types.

  • Allow admins to set granular privileges on entities. 

  • Ranger integration for CDAP authorization

  • Improve Sentry data model to fix existing issues seen on customer environments

  • Allow admins to use existing role/groups for authorization

User Stories 

Scenario 1

Overview

  • Privileges are managed at the entity level

  • App level impersonation

  • Dataset is owned by the application owner

  • Cross namespace dataset access allowed

Details

  • admin1 creates a CDAP namespace etl with principal etl-owner

  • admin2 deploys an app feed1 with principal feed1-owner in namespace etl

  • During app feed1 configure, dataset gold is created with owner principal feed1-owner

  • ops1 starts workflow in app feed1, that runs as principal feed1-owner

  • During the workflow run, principal feed1-owner reads/writes to dataset gold

  • ops1 can list logs and metrics for workflow in app feed1

  • ops2 can list all apps/programs in namespace etl and view all their logs and metrics

  • ops2 can list all the datasets in namespace etl and view its properties

  • ops2 cannot read any datasets in namespace etl

Scenario 2

Overview

  • Privileges are managed at the namespace level

  • Namespace level impersonation

  • Dataset is owned by the namespace owner

  • Cross namespace dataset access allowed

Details

  • admin1 creates a group etl-group in LDAP

  • admin1 creates namespaces in HDFS, HBase and Hive called etl

  • admin1 grants all privileges on the above namespaces to group etl-group

  • admin1 creates a CDAP namespace etl with principal etl-owner  using the namespaces from HDFS, HBase and Hive. Does etl-owner belong to etl-group

  • admin1 grants all privileges on the CDAP namespace etl, and all entities under it to group etl-group

  • etl-user1 belonging to group etl-group deploys app feed1 in namespace etl

  • During app feed1 configure, dataset gold is created with owner principal etl-owner

  • etl-user2 belonging to group elt-group, starts workflow in app feed1, that runs as principal etl-owner

  • During the workflow run, principal feed1-owner reads/writes to dataset gold

  • etl-user3 belonging to group elt-group can list logs and metrics for workflow in app feed1

  • analyst1 belonging to group analyst-group is given privilege read on namespace etl and all entities under it, using which analyst1 can read dataset gold

Scenario 3

Overview

  • Privileges are managed at the namespace level

  • No impersonation

  • All data is owned by CDAP

  • All programs run as CDAP

  • Cross namespace dataset access is allowed

Details

  • admin1 creates a group etl-group in LDAP

  • admin1 creates namespaces in HDFS, HBase and Hive called etl

  • admin1 grants all privileges to the above namespaces to principal cdap

  • admin1 creates a CDAP namespace etl using the namespaces from HDFS, HBase and Hive.

  • admin1 grants all privileges on the CDAP namespace etl, and all entities under it to group etl-group

  • etl-user1 belonging to group etl-group deploys app feed1 in namespace etl

  • During app feed1 configure, dataset gold is created with owner principal cdap

  • etl-user2 belonging to group elt-group, starts workflow in app feed1, that runs as principal cdap

  • During the workflow run, principal cdap reads/writes to dataset gold

  • etl-user3 belonging to group elt-group can list logs and metrics for workflow in app feed1

  • etl-user3 belonging to group elt-group can also read from dataset gold

  • analyst1 belonging to group analyst-group is given privilege to read from dataset gold

Design

CDAP Authorization Policy

Existing CDAP Authorization Policy

The existing CDAP Authorization policy has the following limitations:

  • Granular privileges

    • Cannot grant a privilege to a user to read only one dataset or one stream in a namespace.

    • Cannot grant a privilege to a user to deploy/create an application/artifact/dataset/stream without granting WRITE on the namespace.

    • Cannot grant a privilege to a user to start/stop a program without granting READ on the namespace.

  • Visibility

    • User who has a privilege on a program cannot see the program in the UI or CLI if the user does not have any privilege on the namespace. 

  • Inconsistency

    • To write to a dataset user needs to have WRITE privilege on the dataset, but to write to a stream user needs to have both WRITE on the the stream and READ on the namespace.

    • ADMIN on an entity allows the user to delete the entity, whereas ADMIN on an entity does not allow user to create it.

    • Dataset read needs namespace READ, but dataset write does not need namespace WRITE.

  • Redundancy

    • Dataset READ and stream READ are redundant because they need namespace READ permission to be useful, and once a user has namespace READ the user can read all datasets and streams in the namespace.

    • List and View operations are equivalent but are listed separately in documentation.

Overview of the Proposed Authorization Policy

The proposed CDAP Authorization policy can be defined by the following principles:

  1. Access

    • Access defines who can perform an action (READ, WRITE, EXECUTE, ADMIN) on an entity. 

    • Access is not enforced in a hierarchical manner in CDAP.

    • Privileges in the authorization provider can be set up in a hierarchical manner (for instance by using wildcard privileges - how will this work in Sentry).

  2. Visibility

    • Visibility defines whether an entity is visible to a user or not.

    • If a user has any privilege on an entity, it is visible to the user.

    • Visibility is hierarchical and flows bottom-up i.e. if a user has any privilege on a program then the user will be able to see the application that contains the program and namespace that contains the application.

  3. Grant

    • Grant is defined as action of giving a privilege on an entity to a user.

    • None of READ, WRITE, EXECUTE, ADMIN defined in CDAP will allow granting of privileges.

    • Only the administrator of the authorization provider can grant privileges to any entity. CDAP will not auto-grant privileges to creators.

  4. Impersonation

    • Impersonation is defined as the ability to -

      • deploy applications whose programs will execute as another user.

      • create a namespace/dataset/stream with a owner principal

      • run explore query in an impersonated namespace

    • alice needs ADMIN privilege on principal bob to deploy an application that can impersonate bob.

      • All operation that happens on the application/program entities are authorized using principal alice

      • All operations done by the running program/query are authorized as principal bob

        • This includes running the configure method and creating datasets from the application.

Decouple entity existence from privilege

In addition, CDAP will now support creating privileges for entities that are yet to be created. This will allow admins to grant fine grained privileges on entities. For example, an admin can grant a user ADMIN on an application before the application is deployed. This will allow the user to deploy only this specific application without having any other access to the namespace.

Changes to the authorization matrix

Instance

ADMIN on an Instance allows user to create Namespaces in the instance. No other operations are defined as of now. Also Instance is not a part of privilege hierarchy.

Note: The privilege marked in bold are the new one which will be added in 4.3

Namespaces

Operation

Privileges Required (Existing)

Privileges Required (Proposed)

Operation

Privileges Required (Existing)

Privileges Required (Proposed)

Create

ADMIN (on the CDAP instance)

ADMIN

Update

ADMIN (on the namespace)

 

Delete

ADMIN (on the namespace)

ADMIN on the namespace, and all entities in the namespace

View/List

Any of READ, WRITE, EXECUTE, or ADMIN

Any privilege on the namespace or any of its descendants.

Get Namespace Meta

 

Any privilege on the namespace or any of its descendants.

 

Artifacts

Operation

Privileges Required (Existing)

Privileges Required (Proposed)

Operation

Privileges Required (Existing)

Privileges Required (Proposed)

Add

WRITE (on the namespace)

ADMIN

Add a property

ADMIN (on namespace) | ADMIN (on artifact)

ADMIN

Remove a property

ADMIN (on namespace) | ADMIN (on artifact)

ADMIN

Use to deploy an app

 

ADMIN | READ | WRITE | EXECUTE

Delete

ADMIN (on namespace) | ADMIN (on artifact)

ADMIN

View/List

Any of READ, WRITE, EXECUTE, or ADMIN (on namespace) | Any of READ, WRITE, EXECUTE, or ADMIN (on artifact)

Any privilege on the artifact

Get artifact info/summary/detail

 

ADMIN | READ | WRITE | EXECUTE

 

Applications

Operation

Privileges Required (Existing)

Privileges Required (Proposed)

Operation

Privileges Required (Existing)

Privileges Required (Proposed)

Add

WRITE (on the namespace) and READ (on the artifact if deployed from an artifact)

ADMIN

*Also see artifact privileges and principal privileges

Delete

ADMIN (on the application) | ADMIN (on the namespace)

ADMIN

View/List

Any of READ, WRITE, EXECUTE, or ADMIN (on namespace) | Any of READ, WRITE, EXECUTE, or ADMIN (on application)

Any privilege on the application or any of its descendants.

Get application detail

 

Any privilege on the application or any of its descendants.

 

Programs

Operation

Privileges Required (Existing)

Privileges Required (Proposed)

Operation

Privileges Required (Existing)

Privileges Required (Proposed)

Start, Stop, or Debug

(EXECUTE (on the program) | EXECUTE (on the application) | EXECUTE (on the namespace)) & READ (on the namespace)

EXECUTE

Set instances

ADMIN (on the namespace) | ADMIN (on the application) | ADMIN (on the program)

ADMIN

Set runtime arguments

ADMIN (on the namespace) | ADMIN (on the application) | ADMIN (on the program)

ADMIN

Retrieve runtime arguments

READ (on the namespace) | READ (on the application) | READ (on the program)

READ | EXECUTE | ADMIN

Retrieve status

Any of READ, WRITE, EXECUTE, or ADMIN

 

View/List

Any of READ, WRITE, EXECUTE, or ADMIN

 

Get program specification

 

READ | WRITE | EXECUTE | ADMIN

Resume/Suspend schedule

 

EXECUTE

 

Datasets

Operation

Privileges Required (Existing)

Privileges Required (Proposed)

Operation

Privileges Required (Existing)

Privileges Required (Proposed)

Create

WRITE (on the namespace)

ADMIN

Read

(READ (on the dataset) and READ (namespace)) | READ (on the namespace)

READ

Retrieving properties

Not Documented

Any of READWRITEADMIN, or EXECUTE

Write

WRITE (on the dataset) | WRITE (on the namespace)

WRITE

Update

(ADMIN (on the dataset) and READ (on the namespace)) | (ADMIN (on the namespace) and READ (on the namespace))

ADMIN

Upgrade

ADMIN (on the dataset) | ADMIN (on the namespace)

ADMIN

Truncate

ADMIN (on the dataset) | ADMIN (on the namespace)

ADMIN

Drop

ADMIN (on the dataset) | ADMIN (on the namespace)

ADMIN

View/List

Any of READ, WRITE, EXECUTE, or ADMIN

 

Get dataset meta

 

READ | WRITE | EXECUTE | ADMIN

 

Dataset Modules

Operation

Privileges Required (Existing)

Privileges Required (Proposed)

Operation

Privileges Required (Existing)

Privileges Required (Proposed)

Deploy

WRITE (on the namespace)

ADMIN

Delete

Created in 2020 by Google Inc.