Authorization 4.3
- 1 Goals
- 2 User Stories
- 2.1 Scenario 1
- 2.2 Scenario 2
- 2.3 Scenario 3
- 3 Design
- 4 CLI Impact or Changes
- 5 Test Scenarios
- 6 Future work
Checklist
Goals
Make CDAP authorization policy consistent across all entity types.
Allow admins to set granular privileges on entities.
Ranger integration for CDAP authorization
Improve Sentry data model to fix existing issues seen on customer environments
Allow admins to use existing role/groups for authorization
User Stories
Scenario 1
Overview
Privileges are managed at the entity level
App level impersonation
Dataset is owned by the application owner
Cross namespace dataset access allowed
Details
admin1creates a CDAP namespaceetlwith principaletl-owneradmin2deploys an appfeed1with principalfeed1-ownerin namespaceetlDuring app
feed1configure, datasetgoldis created with owner principalfeed1-ownerops1starts workflow in app feed1, that runs as principal feed1-ownerDuring the workflow run, principal
feed1-ownerreads/writes to datasetgoldops1can list logs and metrics for workflow in appfeed1ops2can list all apps/programs in namespaceetland view all their logs and metricsops2can list all the datasets in namespaceetland view its propertiesops2cannot read any datasets in namespaceetl
Scenario 2
Overview
Privileges are managed at the namespace level
Namespace level impersonation
Dataset is owned by the namespace owner
Cross namespace dataset access allowed
Details
admin1creates a groupetl-groupin LDAPadmin1creates namespaces in HDFS, HBase and Hive calledetladmin1grants all privileges on the above namespaces to groupetl-groupadmin1creates a CDAP namespace etl with principaletl-ownerusing the namespaces from HDFS, HBase and Hive. Doesetl-ownerbelong toetl-groupadmin1grants all privileges on the CDAP namespaceetl, and all entities under it to groupetl-groupetl-user1belonging to groupetl-groupdeploys appfeed1in namespaceetlDuring app
feed1configure, datasetgoldis created with owner principaletl-owneretl-user2belonging to groupelt-group, starts workflow in appfeed1, that runs as principaletl-ownerDuring the workflow run, principal
feed1-ownerreads/writes to datasetgoldetl-user3belonging to groupelt-groupcan list logs and metrics for workflow in appfeed1analyst1belonging to groupanalyst-groupis given privilege read on namespaceetland all entities under it, using whichanalyst1can read datasetgold
Scenario 3
Overview
Privileges are managed at the namespace level
No impersonation
All data is owned by CDAP
All programs run as CDAP
Cross namespace dataset access is allowed
Details
admin1creates a groupetl-groupin LDAPadmin1creates namespaces in HDFS, HBase and Hive calledetladmin1grants all privileges to the above namespaces to principalcdapadmin1creates a CDAP namespaceetlusing the namespaces from HDFS, HBase and Hive.admin1grants all privileges on the CDAP namespaceetl,and all entities under it to groupetl-groupetl-user1belonging to groupetl-groupdeploys appfeed1in namespaceetlDuring app
feed1configure, datasetgoldis created with owner principalcdapetl-user2belonging to groupelt-group, starts workflow in appfeed1, that runs as principalcdapDuring the workflow run, principal
cdapreads/writes to datasetgoldetl-user3belonging to groupelt-groupcan list logs and metrics for workflow in appfeed1etl-user3belonging to groupelt-groupcan also read from datasetgoldanalyst1belonging to groupanalyst-groupis given privilege to read from datasetgold
Design
CDAP Authorization Policy
Existing CDAP Authorization Policy
The existing CDAP Authorization policy has the following limitations:
Granular privileges
Cannot grant a privilege to a user to read only one dataset or one stream in a namespace.
Cannot grant a privilege to a user to deploy/create an application/artifact/dataset/stream without granting WRITE on the namespace.
Cannot grant a privilege to a user to start/stop a program without granting READ on the namespace.
Visibility
User who has a privilege on a program cannot see the program in the UI or CLI if the user does not have any privilege on the namespace.
Inconsistency
To write to a dataset user needs to have WRITE privilege on the dataset, but to write to a stream user needs to have both WRITE on the the stream and READ on the namespace.
ADMIN on an entity allows the user to delete the entity, whereas ADMIN on an entity does not allow user to create it.
Dataset read needs namespace READ, but dataset write does not need namespace WRITE.
Redundancy
Dataset READ and stream READ are redundant because they need namespace READ permission to be useful, and once a user has namespace READ the user can read all datasets and streams in the namespace.
List and View operations are equivalent but are listed separately in documentation.
Overview of the Proposed Authorization Policy
The proposed CDAP Authorization policy can be defined by the following principles:
Access
Access defines who can perform an action (READ, WRITE, EXECUTE, ADMIN) on an entity.
Access is not enforced in a hierarchical manner in CDAP.
Privileges in the authorization provider can be set up in a hierarchical manner (for instance by using wildcard privileges - how will this work in Sentry).
Visibility
Visibility defines whether an entity is visible to a user or not.
If a user has any privilege on an entity, it is visible to the user.
Visibility is hierarchical and flows bottom-up i.e. if a user has any privilege on a program then the user will be able to see the application that contains the program and namespace that contains the application.
Grant
Grant is defined as action of giving a privilege on an entity to a user.
None of READ, WRITE, EXECUTE, ADMIN defined in CDAP will allow granting of privileges.
Only the administrator of the authorization provider can grant privileges to any entity. CDAP will not auto-grant privileges to creators.
Impersonation
Impersonation is defined as the ability to -
deploy applications whose programs will execute as another user.
create a namespace/dataset/stream with a owner principal
run explore query in an impersonated namespace
alice needs ADMIN privilege on principal bob to deploy an application that can impersonate bob.
All operation that happens on the application/program entities are authorized using principal alice
All operations done by the running program/query are authorized as principal bob
This includes running the configure method and creating datasets from the application.
Decouple entity existence from privilege
In addition, CDAP will now support creating privileges for entities that are yet to be created. This will allow admins to grant fine grained privileges on entities. For example, an admin can grant a user ADMIN on an application before the application is deployed. This will allow the user to deploy only this specific application without having any other access to the namespace.
Changes to the authorization matrix
Instance
ADMIN on an Instance allows user to create Namespaces in the instance. No other operations are defined as of now. Also Instance is not a part of privilege hierarchy.
Note: The privilege marked in bold are the new one which will be added in 4.3
Namespaces
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
|---|---|---|
Create | ADMIN (on the CDAP instance) | ADMIN |
Update | ADMIN (on the namespace) |
|
Delete | ADMIN (on the namespace) | ADMIN on the namespace, and all entities in the namespace |
View/List | Any of READ, WRITE, EXECUTE, or ADMIN | Any privilege on the namespace or any of its descendants. |
Get Namespace Meta |
| Any privilege on the namespace or any of its descendants. |
Artifacts
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
|---|---|---|
Add | WRITE (on the namespace) | ADMIN |
Add a property | ADMIN (on namespace) | ADMIN (on artifact) | ADMIN |
Remove a property | ADMIN (on namespace) | ADMIN (on artifact) | ADMIN |
Use to deploy an app |
| ADMIN | READ | WRITE | EXECUTE |
Delete | ADMIN (on namespace) | ADMIN (on artifact) | ADMIN |
View/List | Any of READ, WRITE, EXECUTE, or ADMIN (on namespace) | Any of READ, WRITE, EXECUTE, or ADMIN (on artifact) | Any privilege on the artifact |
Get artifact info/summary/detail |
| ADMIN | READ | WRITE | EXECUTE |
Applications
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
|---|---|---|
Add | WRITE (on the namespace) and READ (on the artifact if deployed from an artifact) | ADMIN *Also see artifact privileges and principal privileges |
Delete | ADMIN (on the application) | ADMIN (on the namespace) | ADMIN |
View/List | Any of READ, WRITE, EXECUTE, or ADMIN (on namespace) | Any of READ, WRITE, EXECUTE, or ADMIN (on application) | Any privilege on the application or any of its descendants. |
Get application detail |
| Any privilege on the application or any of its descendants. |
Programs
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
|---|---|---|
Start, Stop, or Debug | (EXECUTE (on the program) | EXECUTE (on the application) | EXECUTE (on the namespace)) & READ (on the namespace) | EXECUTE |
Set instances | ADMIN (on the namespace) | ADMIN (on the application) | ADMIN (on the program) | ADMIN |
Set runtime arguments | ADMIN (on the namespace) | ADMIN (on the application) | ADMIN (on the program) | ADMIN |
Retrieve runtime arguments | READ (on the namespace) | READ (on the application) | READ (on the program) | READ | EXECUTE | ADMIN |
Retrieve status | Any of READ, WRITE, EXECUTE, or ADMIN |
|
View/List | Any of READ, WRITE, EXECUTE, or ADMIN |
|
Get program specification |
| READ | WRITE | EXECUTE | ADMIN |
Resume/Suspend schedule |
| EXECUTE |
Datasets
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
|---|---|---|
Create | WRITE (on the namespace) | ADMIN |
Read | (READ (on the dataset) and READ (namespace)) | READ (on the namespace) | READ |
Retrieving properties | Not Documented | Any of READ, WRITE, ADMIN, or EXECUTE |
Write | WRITE (on the dataset) | WRITE (on the namespace) | WRITE |
Update | (ADMIN (on the dataset) and READ (on the namespace)) | (ADMIN (on the namespace) and READ (on the namespace)) | ADMIN |
Upgrade | ADMIN (on the dataset) | ADMIN (on the namespace) | ADMIN |
Truncate | ADMIN (on the dataset) | ADMIN (on the namespace) | ADMIN |
Drop | ADMIN (on the dataset) | ADMIN (on the namespace) | ADMIN |
View/List | Any of READ, WRITE, EXECUTE, or ADMIN |
|
Get dataset meta |
| READ | WRITE | EXECUTE | ADMIN |
Dataset Modules
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
|---|---|---|
Deploy | WRITE (on the namespace) | ADMIN |
Delete |