Azure AD based login support on CDAP UI

CDAP users should be able to authenticate and authorize themselves using Azure AD.

What needs to be implemented is external OAuth2 based authentication in CDAP using Azure AD.

We want to add an option of using Azure AD as its IAM. Many vendors have their own Azure AD tenant and users of such vendors are added as Guest users in tenant AD. For Guest Users, LDAP authentication won't work as passwords are not managed with Guest accounts. So in this case, we will have to implement support of OAuth2 for authentication in CDAP.

Following items need to be done as part of this feature:
1. Azure AD Integration for user authentication and access to Azure services.
2. Use Azure Graph API for Group information in Azure AD to extract Users to Group mappings.


Flow overview

Our implicit flow is :

  • A standard OAuth 2.0 implicit flow sequence followed by the front-end web app, and calls to the back-end API.
  • Specific operations and controls to run on the back-end, out of the OAuth 2.0 standard scope but necessary for the end-to-end security and to get all the necessary user information.


The following diagram describes the different steps to use OAuth 2.0 implicit flow to :

  • Authenticate user on the web application.
  • Call backend APIs using the issued OAuth access token.
  • Get user information from Azure AD


The implicit flow is used for client applications running in a web browser (single-page applications, js, ...).


  1. The user has downloaded the application code in his browser and run it. At some point, the user have to authenticate. To start the authentication, a specifically formatted URL must be opened in the browser (new window, iFrame, ...).
  2. The Microsoft authentication form is displayed and the user has to go through the different steps :
    1. Enter his e-mail address as login, submit.
    2. Azure AD redirects the user to his organization identity provider (if the organization uses Azure AD), or to the authentication form for Microsoft personal accounts.
    3. The user enters his password.
    4. If activated and if the user context and attributes match pre-defined conditions, a second authentication factor can be required (a code sent through SMS channel or voice can be used, or the Microsoft Authenticator mobile app to generate an OTP code or send a push notification.
  3. After successful authentication, an access token is returned to the browser in a URL fragment of a redirection URL.
  4. The client application can call its backend APIs using the access token.
    1. A good practice here is to call here a "userinfo" API on the backend which returns all user information required for the front client app.
  5. The access token is a JWT with a set of claims, and a cryptographic signature. The next step 6 includes JWT validation from Azure AD directly, but before that the backend application has to control additional information which Azure AD does not at the moment.
  6. The backend application must call Microsoft Graph API to get additional user information. For this, a new access token specific to Graph API access is required. The backend application has to call an Azure AD API to exchange the first access token from the front-end client app to another access token for Graph API.
    1. This second access token allows the backend application to use Graph API on behalf of the authenticated user.
    2. If the token exchange is successful, the first JWT is considered valid from Azure AD point-of-view (i.e. validity time, signature).
  7. The backend application calls the Graph API with the second access token. The operations allowed here for the backend application depend on the use case. The backend application can :
    1. For simple authentication use case :
      1. get information about the authenticated user (standard attributes, extended attributes, groups)
    2. For identity administration use cases (if required) :
      1. Get information for other users, get groups memberships.
      2. Create an account, invite a partner account, update account attributes.
      3. Update group memberships.


After these steps, the backend application :

  • Has the knowledge of who is authenticated, with associated groups (permissions).
  • Can take authorization decisions based on this information at the API level, and at the data level (i.e. return only data in the authorized user scope for the same operation).
  • Can store in a cache the tokens and user information to avoid replaying the full sequence at each user request.


User Stories

  1. Users should be able to authenticate using their Azure AD credentials.
  2. Users should be able to authenticate themselves using Azure AD login screen.
    1. This is required for 2FA
  3. Users should be able to access cloud services(eg ADLS) in CDAP pipeline using service token obtained by the system using OAuth token.
    1. On service-token expire, CDAP pipeline should stop working.
  4. Users should be able to access cloud services(eg ADLS) using data prep service using service token obtained by the system using OAuth token.
  5. Users should be able to access CDAP service based on their group defined in Azure AD.


Current Authentication architecture: 

https://docs.cask.co/cdap/5.1.2/en/developer-manual/security/client-authentication.html


Proposed Architecture


In this approach, we are trying to make minimum changes to CDAP, idea here is to authenticate user using Azure AD and then generate CDAP token for processing.

Changes required:

  • Login with Azure AD option on login page
  • Change in /login API to send Azure AD access token(along with token_type,expires_in,scope,state,session_state) to /token API
  • AzureADAuthenticationHandler(custom implementation ofAbstractAuthenticationHandler) for validating Azure AD access token.

  • Token Manager Service: It will store user to Azure AD token mapping, it will also be responsible for Azure AD token refresh.
  • TokenManagerClient(this will be a client library) for retrieving Azure AD token using CDAP token from Token Manager Service.
    • Token Manager Service and TokenManagerClient are required for services which need access to cloud services



Created in 2020 by Google Inc.