Checklist
Overview
Details
admin1
creates a CDAP namespace etl
with principal etl-owner
admin2
deploys an app feed1
with principal feed1-owner
in namespace etl
feed1
configure, dataset gold
is created with owner principal feed1-owner
ops1
starts workflow in app feed1, that runs as principal feed1-ownerfeed1-owner
reads/writes to dataset gold
ops1
can list logs and metrics for workflow in app feed1
ops2
can list all apps/programs in namespace etl
and view all their logs and metricsops2
can list all the datasets in namespace etl
and view its properties ops2
cannot read any datasets in namespace etl
Overview
Details
admin1
creates a group etl-group
in LDAPadmin1
creates namespaces in HDFS, HBase and Hive called etl
admin1
grants all privileges on the above namespaces to group etl-group
admin1
creates a CDAP namespace etl with principal etl-owner
using the namespaces from HDFS, HBase and Hive. Does etl-owner
belong to etl-group
admin1
grants all privileges on the CDAP namespace etl
, and all entities under it to group etl-group
etl-user1
belonging to group etl-group
deploys app feed1
in namespace etl
feed1
configure, dataset gold
is created with owner principal etl-owner
etl-user2
belonging to group elt-group
, starts workflow in app feed1
, that runs as principal etl-owner
feed1-owner
reads/writes to dataset gold
etl-user3
belonging to group elt-group
can list logs and metrics for workflow in app feed1
analyst1
belonging to group analyst-group
is given privilege read on namespace etl
and all entities under it, using which analyst1
can read dataset gold
Overview
Details
admin1
creates a group etl-group
in LDAPadmin1
creates namespaces in HDFS, HBase and Hive called etl
admin1
grants all privileges to the above namespaces to principal cdap
admin1
creates a CDAP namespace etl
using the namespaces from HDFS, HBase and Hive.admin1
grants all privileges on the CDAP namespace etl,
and all entities under it to group etl-group
etl-user1
belonging to group etl-group
deploys app feed1
in namespace etl
feed1
configure, dataset gold
is created with owner principal cdap
etl-user2
belonging to group elt-group
, starts workflow in app feed1
, that runs as principal cdap
cdap
reads/writes to dataset gold
etl-user3
belonging to group elt-group
can list logs and metrics for workflow in app feed1
etl-user3
belonging to group elt-group
can also read from dataset gold
analyst1
belonging to group analyst-group
is given privilege to read from dataset gold
The existing CDAP Authorization policy has the following limitations:
Granular privileges
The proposed CDAP Authorization policy can be defined by the following principles:
Access defines who can perform an action (READ, WRITE, EXECUTE, ADMIN) on an entity.
Access is not enforced in a hierarchical manner in CDAP.
Privileges in the authorization provider can be set up in a hierarchical manner (for instance by using wildcard privileges - how will this work in Sentry).
Decouple entity existence from privilege
In addition, CDAP will now support creating privileges for entities that are yet to be created. This will allow admins to grant fine grained privileges on entities. For example, an admin can grant a user ADMIN on an application before the application is deployed. This will allow the user to deploy only this specific application without having any other access to the namespace.
Instance
ADMIN on an Instance allows user to create Namespaces in the instance. No other operations are defined as of now. Also Instance is not a part of privilege hierarchy.
Note: The privilege marked in bold are the new one which will be added in 4.3
Namespaces
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
---|---|---|
Create | ADMIN (on the CDAP instance) | ADMIN |
Update | ADMIN (on the namespace) | |
Delete | ADMIN (on the namespace) | ADMIN on the namespace, and all entities in the namespace |
View/List | Any of READ, WRITE, EXECUTE, or ADMIN | Any privilege on the namespace or any of its descendants. |
Get Namespace Meta | Any privilege on the namespace or any of its descendants. |
Artifacts
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
---|---|---|
Add | WRITE (on the namespace) | ADMIN |
Add a property | ADMIN (on namespace) | ADMIN (on artifact) | ADMIN |
Remove a property | ADMIN (on namespace) | ADMIN (on artifact) | ADMIN |
Use to deploy an app | ADMIN | READ | WRITE | EXECUTE | |
Delete | ADMIN (on namespace) | ADMIN (on artifact) | ADMIN |
View/List | Any of READ, WRITE, EXECUTE, or ADMIN (on namespace) | Any of READ, WRITE, EXECUTE, or ADMIN (on artifact) | Any privilege on the artifact |
Get artifact info/summary/detail | ADMIN | READ | WRITE | EXECUTE |
Applications
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
---|---|---|
Add | WRITE (on the namespace) and READ (on the artifact if deployed from an artifact) | ADMIN *Also see artifact privileges and principal privileges |
Delete | ADMIN (on the application) | ADMIN (on the namespace) | ADMIN |
View/List | Any of READ, WRITE, EXECUTE, or ADMIN (on namespace) | Any of READ, WRITE, EXECUTE, or ADMIN (on application) | Any privilege on the application or any of its descendants. |
Get application detail | Any privilege on the application or any of its descendants. |
Programs
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
---|---|---|
Start, Stop, or Debug | (EXECUTE (on the program) | EXECUTE (on the application) | EXECUTE (on the namespace)) & READ (on the namespace) | EXECUTE |
Set instances | ADMIN (on the namespace) | ADMIN (on the application) | ADMIN (on the program) | ADMIN |
Set runtime arguments | ADMIN (on the namespace) | ADMIN (on the application) | ADMIN (on the program) | ADMIN |
Retrieve runtime arguments | READ (on the namespace) | READ (on the application) | READ (on the program) | READ | EXECUTE | ADMIN |
Retrieve status | Any of READ, WRITE, EXECUTE, or ADMIN | |
View/List | Any of READ, WRITE, EXECUTE, or ADMIN | |
Get program specification | READ | WRITE | EXECUTE | ADMIN | |
Resume/Suspend schedule | EXECUTE |
Datasets
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
---|---|---|
Create | WRITE (on the namespace) | ADMIN |
Read | (READ (on the dataset) and READ (namespace)) | READ (on the namespace) | READ |
Retrieving properties | Not Documented | Any of READ, WRITE, ADMIN, or EXECUTE |
Write | WRITE (on the dataset) | WRITE (on the namespace) | WRITE |
Update | (ADMIN (on the dataset) and READ (on the namespace)) | (ADMIN (on the namespace) and READ (on the namespace)) | ADMIN |
Upgrade | ADMIN (on the dataset) | ADMIN (on the namespace) | ADMIN |
Truncate | ADMIN (on the dataset) | ADMIN (on the namespace) | ADMIN |
Drop | ADMIN (on the dataset) | ADMIN (on the namespace) | ADMIN |
View/List | Any of READ, WRITE, EXECUTE, or ADMIN | |
Get dataset meta | READ | WRITE | EXECUTE | ADMIN |
Dataset Modules
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
---|---|---|
Deploy | WRITE (on the namespace) | ADMIN |
Delete | ADMIN (on the dataset module) | ADMIN (on the namespace) | ADMIN |
Delete-all in the namespace | ADMIN (on the namespace) | ADMIN on all dataset modules in the namespace |
View/List | Any of READ, WRITE, EXECUTE, or ADMIN | |
Get module meta | READ | WRITE | EXECUTE | ADMIN |
Dataset Types
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
---|---|---|
View/List | Any of READ, WRITE, EXECUTE, or ADMIN | |
Get dataset type meta | READ | WRITE | EXECUTE | ADMIN |
Secure Keys
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
---|---|---|
Create | WRITE (on the namespace) | ADMIN |
Delete | ADMIN (on the key) | ADMIN (on the namespace) | ADMIN |
View/List | Any of READ, WRITE, EXECUTE, or ADMIN | |
Read | Not Documented | READ (on the key) |
Streams
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
---|---|---|
Create | WRITE (on the namespace) | ADMIN |
Retrieving events | READ (on the stream) & READ (on the namespace) | READ |
Retrieving properties | Any of READ, WRITE, ADMIN, or EXECUTE | |
Sending events to a stream (sync, async, or batch) | (WRITE (on the stream) and READ (on the namespace)) | WRITE (on namespace & READ (on the namespace)) | WRITE |
Drop | ADMIN (on stream) | ADMIN (on namespace) | ADMIN |
Drop-all in the namespace | ADMIN (on the namespace) | ADMIN (on the stream) | ADMIN on all the streams in the namespace |
Update | ADMIN (on the namespace) | ADMIN (on the stream) | ADMIN |
Truncate | ADMIN (on the namespace) | ADMIN (on the stream) | ADMIN |
View/List | Any of READ, WRITE, EXECUTE, or ADMIN | |
Get stream property | READ | WRITE | EXECUTE | ADMIN |
Principal
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
---|---|---|
Deploy an app to impersonate a principal | ADMIN | |
Create a namespace with owner prinicpal | ADMIN | |
Create a dataset with owner prinicpal | ADMIN | |
Create a stream with owner prinicpal | ADMIN |
CDAP allows privileges to be defined using entities and users. Sentry only allows privileges to be defined using roles and groups. CDAP is not aware of roles and groups, hence every grant made on entity and user has to be translated into a grant on roles and group.
For this translation, CDAP does the following
In addition, revoking all privileges on an entity is expensive since it involves listing of all privileges for all users. This is because Sentry does not have an API to list all privileges for an entity.
Allow admins to use existing roles and groups in Sentry for authorization in CDAP. This means CDAP will not grant/revoke any privileges for entities. (note: this is a stretch goal for 4.3)
However in cases where an admin wants CDAP to grant privileges we propose the following model:
Investigate the new Sentry API (listPrivilegsbyAuthorizable) to list all privileges for a given entity so that we can avoid listing all privileges for all users during an entity deletion.
The above changes will be backward compatible with existing privileges.
We have observed that as the number of entities in CDAP grow, CDAP startup time increases due to authorization (more than 20 mins in some cases). During CDAP startup, CDAP revokes and grant privileges on all system entities. Revoking all privileges on an entity is expensive since it requires listing all privileges for all users.
Note: The underlying systems are still required to have appropriate permissions for cdap.
Currently, CDAP always grants/revokes privileges on an entity creation/deletion. Although this is a convenient feature, it does not work well in enterprise environments. Many enterprises prefer to manage privileges in a centralized authorization provider (like Sentry or Ranger). This will allow them to use existing role/groups to manage the privileges across all systems.
Please see Ranger Integration Design Document
Test ID | Test Description | Expected Results |
---|---|---|