Checklist
Implement configuring impersonation at the application level. Enable impersonation in Explore queries.
Application Impersonation: As a part of, we implemented impersonation for programs and data operations, but this could only be configured at the namespace level. We need the ability to configure this at the application level, so that we can run programs as different users, without having to manage additional namespaces for each additional user.
Explore Impersonation: As a part of, we implemented impersonation in Hive for Explore queries to impersonate the namespace user if one was provided. For better security measures we will like to run explore queries as the user who submits them.
Entity ownership: Entities created by applications should be owned by the application owner. Access permissions to those entities could be given to other users at create time or at a later time.
Currently, whenever we need to perform a data operation or launch a program in YARN, we lookup the namespace that this entity exists in, and based upon the principal mapping for that namespace, we impersonate for that principal. If there is no mapping, we perform actions as the current user (cdap system user). Now, we will need to maintain a mapping from entities such as applications, streams, and datasets.
The ownership information for entities will be stored in a "owner.meta" table. The table will store the Entity to the owners kerberos principal (as a string) mapping. This information along with the permissions on the entity will be pushed down to the storage provider and that will be used to control access (future work).
This will introduce an additional step during entity creation. An entry will need to be made to the owner.meta table.
The table will not be used to store ACLs for this release as that will be handled by the storage provider but in future releases, we can expand this to manage the ACLs. This feature will be useful for storage providers that don't support ACLs. It will also be useful in providing a layer of abstraction over authorization backends like Apache Sentry and Apache Ranger.
Note: If an entity exists with an associated owner and the same entity is being created by some other user then this operation will fail. Also, if this entity creation was triggered by some other operation then the complete operation will fail too. For example, Alice has deployed an app in CDAP which created a dataset called 'employees'. Now if Bob tries to deploy another app which creates the same dataset called 'employee' then the app deployment will fail. If Bob wants to read the employee dataset from his app then he should be get the 'employee' dataset in his program dynamically. Now he should be able to read this dataset if Scenario 3.2 conditions are meet.
Rows in owner.meta will be of the format
The row key will be constructed from the entity id and will capture the Entity hierarchy. e.g. for a stream it will be constructed using the namespace and stream id.
rowkey: {<created from entity id>}, column {'c'}, and the owner's principal as the value
To allow headless users access to the system, other authorized users need to impersonate them. To allow this impersonation we set the following convention:
<property> <name>keytab.path</name> <value>/dir1/dir2/${name}/${name}.keytab</value> </property> |
The permissions assigned for entities will need to be pushed down to storage providers so that access outside the system will have the same restrictions. Both HBase and HDFS support ACLs and they will be used to assign finer grained permissions to the underlying tables or files.
The directory structure will be as follows, CDAP will own the parent directories for the namespace. The directories will be group writable and everyone who has app deployment privileges will be part of that group so that they can create subdirectories. For any cleanup, for example, when the namespace is being deleted, the system user will impersonate the subdirectory owners to do the deletion. With this impersonation in place, the system user will not need access permissions on user directories.
The groups for the directories will be specified while the entry is being created and once the directory is created the system will do a chgrp to change it to the provided group.
e.g.
drwxrwxr-x - cdap supergroup 0 2017-01-16 04:39 /cdap/namespaces/
To be able to create a namespace the user will need to be a part of the "supergroup".
A group can also be specified in cdap-security.xml with property "namespace.creators". If a group is specified for this property then CDAP will change the group of /cdap/namespaces to the specified group allowing users in the existing group to create namespace.
The namespace directory will be owned by the namespace owner
During the creation of namespace a group can be specified and this group will have write and execute permission on the namespace directory allowing the users of this group to deploy application in the namespace. Note: This will require change in our existing namespace creation API.
drwxrwxr-x - accountadmin accountgroup 0 2017-01-16 04:39 /cdap/namespaces/account
To be able to create anything under that namespace the user will have to be a part of the "accountgroup"
Stream:
drwxr-xr-x - account1 accountgroup 0 2017-01-17 02:41 /cdap/namespaces/account/streams/st1
All the directories will be owned by the headless users whose keytabs need to be present so that they can be impersonated. Additionally during the creation of app, stream and dataset the user can specify a group and CDAP will change the group of the the associated files on hdfs and tables on hbase and hive so that the given group have read access.
For explore impersonation we won't be using keytabs. A human user will login using their credentials and to run explore queries they will have to provide a kerberos username and a password. The system will authenticate with KDC on behalf of the user and use the tgt to create a UGI for the user through the static method
static UserGroupInformation | getUGIFromTicketCache(java.lang.String ticketCache, java.lang.String user) |
This UGI will then be used to impersonate the queries.
The RemoteUGIProvider provides methods that are called when a UGI is needed to impersonate a user. During the call to RemoteUGIProvider#createUGI the Kerberos TGT can be obtained from the master through a rest API (/impersonation/credentials)
class ImpersonationInfo currently contains a principal and their keytab. This will change to include the path to the ticket cache for the user.
The explore window shows up when the user clicks on the explore icon on any explorable entity. If kerberos is enabled in the cluster then a modal window will show up the first time the explore icon is clicked. Through this window, the user can provide the Kerberos principal that the explore query should run as and the TGT for that principal.
The UI forwards the principal and the TGT to the router which forwards it to CDAP master. Both these routes support SSL. Once master has the TGT it can be serialized to HDFS with permissions set to 600.
Explore container can then use the TGT on HDFS to create a UserGroupInformation object and use that to impersonate the principal for running the query. The UGI once created will be cached.
The user would need to do a kinit before they would be able to launch an Explore query from the CLI. The CLI would then pick up the TGT and rest of the flow is the same as UI.
For running Explore queries through the REST APIs the user will need to provide the TGT and the principal along with the query.
None
New internal APIs:
Impersonation Store: Stores the user keytab information
public class ImpersonationStore { public void addImpersonationInfo(final ImpersonationInfo impersonationInfo) throws IOException { } public ImpersonationInfo getImpersonationInfo(final String principal) throws IOException, ImpersonationInfoNotFound { } // idempotent public void delete(final String principal) throws IOException { } |
Permission Store: Stores the entity ownership information.
public class PermissionStore { public void addOwner(final EntityId entityId, final String principal) throws IOException { } public ImpersonationInfo getOwner(final EntityId entityId) throws IOException, NotFoundException { } // idempotent public void deleteOwner(final EntityId entityId) throws IOException { } } |
public final class ImpersonationInfo { private final String principal; private final String keytabURI; } |
Potential new external APIs (TBD):
Allowing group and permissions for FileSets/Streams/(other?)
Please see Secure Impersonation Specification#EntityOwnership
We need a Remote implementation of OwnerAdmin so that the program container or cdap service container which performs request under impersonation (which can be either namespace/app/dataset/stream owner) can look up owner information internally if needed.
For example, a explore query on a stream is handled by ExploreQueryExecutorHttpHandler. The handlers here does impersonation as the namespace owner. Now when the query actually runs its might need to look up other cdap resources (for example say the stream configuration). This call in itself does impersonation by doing a doAs for the resource involved (in this case the stream). The Impersonator which is responsible for providing the UGI to be impersonated for this call tries to look up owner information for the resource and will fail since it tries to access owner.meta table which is a system table and cannot be accessed under user impersonation.
This requires adding a Remote implementation of OwnerAdmin which program container and cdap service container can use to get the owner information. We will also need to add a handler in cdap-app-fabric which will serve the requests from the remote client. Since this handler will reside inside cdap master it can query owner store through owner admin since it will be running as cdap user.
We will expose the following endpoints: (Note: Currently, we only support owner for namespace, app, artifact, stream, dataset)
Path | Method | Request Body | Response Code | Response | ||
---|---|---|---|---|---|---|
Adding Owner | ||||||
/v1/owner/ | POST |
| 200 - On success 409 - if owner information for entity already exists 500 - Any internal errors |
| ||
Deleting Owner | ||||||
/v1/owner/ | DELETE |
| 200 - On success 500 - Any internal errors | |||
Getting Owner | ||||||
/v1/owner/ | GET |
| 200 - On success 500 - Any internal errors |
| ||
Getting Impersonation Information | ||||||
/v1/owner/impinfo | GET |
| 200 - On success 500 - Any internal errors |
|
Please see: Secure Impersonation Specification#EntityCreation
We will need to implement authorization on the above REST APIs (which manage the impersonation metadata). Authorization will also need to be added when programmatically accessing this metadata (such as when launching the programs or performing dataset operations involving impersonation).
This will rely on HBase for storing metadata (Similar to how we store all sorts of other metadata for applications). Without HBase (and dataset service), this will definitely not work.
Test ID | Test Description | Expected Results |
---|---|---|
IMP100 | (default namespace) Deploy an application from an artifact, for principal X, and run a program. | The program should run as X. Datasets/streams should havetheirhdfs/hbaseownedby X. |
IMP101 | (default namespace) Deploy another application from the same artifact, without specifying principal, and run a program. | The program should run as the cdap system user. Datasets/streams should havetheirhdfs/hbaseownedby cdap system user |
IMP102 | RUN IMP100 and IMP102 in a custom namespace, that doesn't have impersonation | Expectation should be the same. |
IMP103 | Run IMP100 and IMP102 in a namespace that already has impersonation configured. | < Expected behavior TBD > |
IMP104 | ||
IMP105 | ||
IMP106 |