...
When running CDAP on top of a secure Hadoop cluster (using Kerberos authentication), the CDAP processes will need to obtain Kerberos credentials in order to authenticate with Hadoop, HBase, and ZooKeeper, and (optionally) Hive. In this case, the setting for hdfs.user
in cdap-site.xml
will be ignored and the CDAP processes will be identified by the default authenticated Kerberos principal.
In order to configure CDAP for Kerberos authentication:
Create a Kerberos principal for the user running CDAP. The principal name should be in the form
username/hostname@REALM
, creating a separate principal for each host where a CDAP service will run. This prevents simultaneous login attempts from multiple hosts from being mistaken for a replay attack by the Kerberos KDC.Generate a keytab file for each CDAP Master Kerberos principal, and place the file as
/etc/security/keytabs/cdap.keytab
on the corresponding CDAP Master host. The file should be readable only by the user running the CDAP Master service.Edit
/etc/cdap/conf/cdap-site.xml
on each host running a CDAP service, substituting the Kerberos primary (user) for<cdap-principal>
, and your Kerberos authentication realm forEXAMPLE.COM
, when adding these two properties:Code Block <property> <name>cdap.master.kerberos.keytab</name> <value>/etc/security/keytabs/cdap.service.keytab</value> </property> <property> <name>cdap.master.kerberos.principal</name> <value><cdap-principal>/_HOST@EXAMPLE.COM</value> </property>
The
<cdap-principal>
is shown in the commands that follow ascdap
; however, you are free to use a different appropriate name.The
/cdap
directory needs to be owned by the<cdap-principal>
; you can set that by running the following command as thehdfs
user (change the ownership in the command fromcdap
to whatever is the<cdap-principal>
):Code Block $ |su_hdfs| && hadoop fs -mkdir -p /cdap && hadoop fs -chown cdap /cdap
When running on a secure HBase cluster, as the
hbase
user, issue the command:Code Block $ echo "grant 'cdap', 'RWCA'" | hbase shell
When CDAP Master is started, it will login using the configured keytab file and principal.
YARN for secure Hadoop: the <cdap-principal>
user must be able to launch YARN containers, either by adding it to the YARN allowed.system.users
whitelist (preferred) or by adjusting the YARN min.user.id
to include the <cdap-principal>
user.
In order to configure
CDAP Explore Service for secure Hadoop:
a. To allow CDAP to act as a Hive client, it must be given proxyuser
permissions and allowed from all hosts. For example: set the following properties in the configuration file core-site.xml
, where cdap
is a system group to which the cdap
user is a member:
Code Block |
---|
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>cdap,hadoop,hive</value>
</property>
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property> |
b. To execute Hive queries on a secure cluster, the cluster must be running the MapReduce JobHistoryServer
service. Consult your distribution documentation on the proper configuration of this service.
c. To execute Hive queries on a secure cluster using the CDAP Explore Service, the Hive MetaStore service must be configured for Kerberos authentication. Consult your distribution documentation on the proper configuration of the Hive MetaStore service.
With all these properties set, the CDAP Explore Service will run on secure Hadoop clusters.
Enabling CDAP HA
In addition to having a cluster architecture that supports HA (high availability), these additional configuration steps need to be followed and completed:
...
Install the
cdap-security
package (the CDAP Authentication Server) on different nodes.Start the
cdap-security
service on each node.Note that when an unauthenticated request is made in a secure HA setup, a list of all running authentication endpoints will be returned in the body of the request.
Hive Execution Engines
...
.