Security

The implementation of security in CDAP breaks down into a few general areas:

  1. External authentication - how we integrate with customer authentication mechanisms and provide a consistent set of credentials for client interactions
  2. Internal authentication - how CDAP processes mutually authenticate with each other in order to secure (and optionally encrypt) internal communication
  3. Authorization - the security model used to intercept all operations and verify that the triggering principal has appropriate privileges to initiate the action
  4. Audit - a verifiable information trail that can be used to validate security of the overall system
  5. Data encryption - encrypting sensitive data and only allowing authorized principals to access the data

External Authentication

The key integration point for security in CDAP deployments will be our interactions with external authentication mechanisms (LDAP, Active Directory, etc.) and how we bridge those externally obtained and validated credentials into a consistent set of credentials that all CDAP components can independently verify.

Key goals for this portion of the implementation include:

  • Ability to integrate with as many customer-specific authentication mechanisms as possible (broadest possible reach)
  • Consistent handling of authenticated client credentials by CDAP processes (limit implementation complexity)
  • Seamless integration for REST clients (this is the primary mode of interaction for external clients)

Internal Authentication

As a distributed system, a CDAP cluster contains many independent processes which need to be able to communicate with each other.  However, as this implementation is internal to CDAP itself, we can make use of whatever authentication technology best matches our needs.

Key goals for the implementation of internal authentication include:

  • Mutual authentication of the remote actors in RPC connections (prevent impersonation)
  • Support for long running processes
  • Optional ability to encrypt / protect data in transit
  • Bonus: integrate with Hadoop & HBase authentication mechanisms (else will have to be done separately)

Authorization

Once we can reliably determine the identity of a client initiating an action in CDAP, we can start differentiating in the types of access allowed to each identity.  This allows us to protect sensitive data from unauthorized users, as well as protecting cluster operations and integrity from abuse or misuse by users.

Key goals for authorization are:

  • Isolation of data and operations from users unless access has been explicitly granted
  • Ability to restrict access along natural lines of operation in CDAP
  • Push down of access control to lower layers, where possible, in order to prevent circumventing controls by external means

Audit

To verify that all aspects of the security system are functioning as intended, and to allow tracing of unintended/unauthorized access, all security components must maintain an audit log of key security decision points.

Data Encryption

Users may store sensitive data in datasets, configuration, etc. CDAP must be able to store this data in an encrypted form and only allow certain principals to access the data.

Documentation

These changes will affect our current documentation:

  • REST clients will need to obtain an access token, so that is an additional step to perform before making regular REST calls so it's not exactly transparent but we're targeting a subset of OAuth 2.0 for that so at least developers should be able to leverage some standard tools; this will need adding to the existing documentation

  • Examples should be created showing how to obtain and use tokens, probably add this to the REST docs
  • Examples should show how to turn this on in a CDAP cluster, both existing and new, perhaps in the install docs and elsewhere (a Security doc?)

Created in 2020 by Google Inc.