Installing CDAP on Kubernetes
- Terence Yim
- Arjan Bal
- Vinisha Shah
CDAP installation on Kubernetes was introduced in CDAP 6.2.3.
This document describes how to install CDAP on a Kubernetes cluster.
Dependencies
This section describes the infrastructure and software dependencies for operating CDAP in Kubernetes.
Kubernetes cluster
CDAP supports using Kubernetes (k8s) as the distributed resource manager. When CDAP is deployed to a k8s cluster, it spawns multiple Deployments and StatefulSets for running various CDAP services. The following diagram shows each of the CDAP services in the Kubernetes cluster:
The CDAP operator is responsible for deploying and managing all the CDAP services inside the cluster. The CDAP operator also supports managing multiple CDAP instances within the same k8s cluster. If multiple CDAP instances are deployed to the same k8s cluster, It is recommended to deploy them to different namespaces to provide better isolation.
Limitations
Currently CDAP only supports running one replica (pod) per service, except for the Preview Runner. Failure resiliency is handled by k8s to have pod restart upon failure. For pods created by StatefulSets, it relies on the infrastructure to have persistent volumes being re-mountable to the new pod, which potentially could be on a different machine.
Another limitation of operating CDAP in Kubernetes is that it does not support native compute profile. This means all user program executions are external to the Kubernetes cluster, and require a Hadoop cluster for program executions.
PostgreSQL database
CDAP needs a shared storage for its own metadata, such as deployed artifacts and applications, run histories, preferences, lineage information, and many more. Currently, CDAP supports both PostgreSQL and HBase as the metadata store. When running CDAP in Kubernetes, we recommend using PostgreSQL.
Elasticsearch
CDAP has support for metadata search, and it is backed by either Elasticsearch or HBase. In the Kubernetes environment, Elasticsearch is recommended. You can either configure CDAP to use an existing Elasticsearch cluster or run an Elasticsearch in Kubernetes by using the Elasticsearch Operator.
Hadoop Compatible File System (HCFS)
CDAP stores artifacts and runtime information through the HDFS API. Any of the HCFS implementations is supported.
Installation
This section describes the steps to deploy CDAP on Kubernetes.
Prerequisites
An operational Kubernetes cluster.
Recommended to have 64 GB of memory resources and 20 available virtual CPU for production deployment.
For better security, the Kubernetes cluster should have RBAC enabled.
Have kubectl set up to connect to the Kubernetes cluster.
A PostgreSQL database that is reachable from the Kubernetes cluster.
An Elasticsearch instance that is reachable from the Kubernetes cluster.
Refer to the Appendix section on how to set up an Elasticsearch instance inside the Kubernetes cluster.
Deploy CDAP Operator
CDAP provides a CDAP operator for easy deployment and management of CDAP in Kubernetes. You can deploy the following YAML to create all the necessary resources to have the operator running in the Kubernetes cluster, inside the cdap-system namespace.
Â
# Create operator namespace
apiVersion: v1
kind: Namespace
metadata:
name: cdap-system
labels:
name: cdap-system
control-plane: cdap-operator
---
# Create operator service account
apiVersion: v1
kind: ServiceAccount
metadata:
name: cdap-operator
namespace: cdap-system
labels:
control-plane: cdap-operator
---
# Source cdap-operator/config/rbac/cdapmaster_editor_role.yaml
# permissions to do edit cdapmasters.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cdapmaster-editor-role
rules:
- apiGroups:
- cdap.cdap.io
resources:
- cdapmasters
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- cdap.cdap.io
resources:
- cdapmasters/status
verbs:
- get
- patch
- update
---
# Source cdap-operator/config/rbac/cdapmaster_viewer_role.yaml
# permissions to do viewer cdapmasters.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cdapmaster-viewer-role
rules:
- apiGroups:
- cdap.cdap.io
resources:
- cdapmasters
verbs:
- get
- list
- watch
- apiGroups:
- cdap.cdap.io
resources:
- cdapmasters/status
verbs:
- get
---
# Source cdap-operator/config/rbac/role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: null
name: cdap-operator-role
rules:
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- ""
resources:
- services
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- apps
resources:
- deployments
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- apps
resources:
- deployments/status
verbs:
- get
- patch
- update
- apiGroups:
- apps
resources:
- statefulsets
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- batch
resources:
- jobs
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- cdap.cdap.io
resources:
- cdapmasters
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- cdap.cdap.io
resources:
- cdapmasters/status
verbs:
- get
- patch
- update
---
# Source cdap-operator/config/rbac/role_binding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cdap-operator-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cdap-operator-role
subjects:
- kind: ServiceAccount
name: cdap-operator
namespace: cdap-system
---
# Source cdap-operator/config/crd/bases/cdap.cdap.io_cdapmasters.yaml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.2.4
creationTimestamp: null
name: cdapmasters.cdap.cdap.io
spec:
group: cdap.cdap.io
names:
kind: CDAPMaster
listKind: CDAPMasterList
plural: cdapmasters
singular: cdapmaster
scope: Namespaced
validation:
openAPIV3Schema:
description: CDAPMaster is the Schema for the cdapmasters API
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: "CDAPMasterSpec defines the desired state of CDAPMaster \n
Important notes: * The field name of each service MUST match the constant
values of ServiceName in constants.go as reflection is used to find
field value. * For services that are optional (i.e. may or may not be
required for CDAP to be operational), their service specification fields
are pointers. By default, these optional services are disabled. Set to
non-nil to enable them."
properties:
appFabric:
description: AppFabric is specification for the CDAP app-fabric service.
properties:
env:
description: Env is a list of environment variables for the master
service container.
items:
description: EnvVar represents an environment variable present
in a Container.
properties:
name:
description: Name of the environment variable. Must be a C_IDENTIFIER.
type: string
value:
description: 'Variable references $(VAR_NAME) are expanded
using the previous defined environment variables in the
container and any service environment variables. If a variable
cannot be resolved, the reference in the input string will
be unchanged. The $(VAR_NAME) syntax can be escaped with
a double $$, ie: $$(VAR_NAME). Escaped references will never
be expanded, regardless of whether the variable exists or
not. Defaults to "".'
type: string
valueFrom:
description: Source for the environment variable's value.
Cannot be used if value is not empty.
properties:
configMapKeyRef:
description: Selects a key of a ConfigMap.
properties:
key:
description: The key to select.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the ConfigMap or its
key must be defined
type: boolean
required:
- key
type: object
fieldRef:
description: 'Selects a field of the pod: supports metadata.name,
metadata.namespace, metadata.labels, metadata.annotations,
spec.nodeName, spec.serviceAccountName, status.hostIP,
status.podIP, status.podIPs.'
properties:
apiVersion:
description: Version of the schema the FieldPath is
written in terms of, defaults to "v1".
type: string
fieldPath:
description: Path of the field to select in the specified
API version.
type: string
required:
- fieldPath
type: object
resourceFieldRef:
description: 'Selects a resource of the container: only
resources limits and requests (limits.cpu, limits.memory,
limits.ephemeral-storage, requests.cpu, requests.memory
and requests.ephemeral-storage) are currently supported.'
properties:
containerName:
description: 'Container name: required for volumes,
optional for env vars'
type: string
divisor:
description: Specifies the output format of the exposed
resources, defaults to "1"
type: string
resource:
description: 'Required: resource to select'
type: string
required:
- resource
type: object
secretKeyRef:
description: Selects a key of a secret in the pod's namespace
properties:
key:
description: The key of the secret to select from. Must
be a valid secret key.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the Secret or its key
must be defined
type: boolean
required:
- key
type: object
type: object
required:
- name
type: object
type: array
metadata:
description: Metadata for the service.
type: object
nodeSelector:
additionalProperties:
type: string
description: NodeSelector is a selector which must be true for the
pod to fit on a node.
type: object
priorityClassName:
description: PriorityClassName is to specify the priority of the
pods for this service.
type: string
resources:
description: Resources are Compute resources required by the service.
properties:
limits:
additionalProperties:
type: string
description: 'Limits describes the maximum amount of compute
resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
requests:
additionalProperties:
type: string
description: 'Requests describes the minimum amount of compute
resources required. If Requests is omitted for a container,
it defaults to Limits if that is explicitly specified, otherwise
to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
type: object
runtimeClassName:
description: RuntimeClassName refers to a RuntimeClass object in
the node.k8s.io group, which should be used to run pods for this
service. If no RuntimeClass resource matches the named class,
pods will not be running.
type: string
serviceAccountName:
description: ServiceAccountName overrides the service account for
the service pods.
type: string
storageClassName:
description: StorageClassName is the name of the StorageClass for
the persistent volume used by the service.
type: string
storageSize:
description: StorageSize is specification for the persistent volume
size used by the service.
type: string
type: object
config:
additionalProperties:
type: string
description: Config is a set of configurations that goes into cdap-site.xml.
type: object
configMapVolumes:
additionalProperties:
type: string
description: ConfigMapVolumes defines a map from ConfigMap names to
volume mount path. Key is the configmap object name. Value is the
mount path. This adds ConfigMap data to the directory specified by
the volume mount path.
type: object
image:
description: Image is the docker image name for the CDAP backend.
type: string
imagePullPolicy:
description: ImagePullPolicy is the policy for pulling docker images
on Pod creation.
type: string
locationURI:
description: LocationURI is an URI specifying an object storage for
CDAP.
type: string
logLevels:
additionalProperties:
type: string
description: LogLevels is a set of logger name to log level settings.
type: object
logs:
description: Logs is specification for the CDAP logging service.
properties:
env:
description: Env is a list of environment variables for the master
service container.
items:
description: EnvVar represents an environment variable present
in a Container.
properties:
name:
description: Name of the environment variable. Must be a C_IDENTIFIER.
type: string
value:
description: 'Variable references $(VAR_NAME) are expanded
using the previous defined environment variables in the
container and any service environment variables. If a variable
cannot be resolved, the reference in the input string will
be unchanged. The $(VAR_NAME) syntax can be escaped with
a double $$, ie: $$(VAR_NAME). Escaped references will never
be expanded, regardless of whether the variable exists or
not. Defaults to "".'
type: string
valueFrom:
description: Source for the environment variable's value.
Cannot be used if value is not empty.
properties:
configMapKeyRef:
description: Selects a key of a ConfigMap.
properties:
key:
description: The key to select.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the ConfigMap or its
key must be defined
type: boolean
required:
- key
type: object
fieldRef:
description: 'Selects a field of the pod: supports metadata.name,
metadata.namespace, metadata.labels, metadata.annotations,
spec.nodeName, spec.serviceAccountName, status.hostIP,
status.podIP, status.podIPs.'
properties:
apiVersion:
description: Version of the schema the FieldPath is
written in terms of, defaults to "v1".
type: string
fieldPath:
description: Path of the field to select in the specified
API version.
type: string
required:
- fieldPath
type: object
resourceFieldRef:
description: 'Selects a resource of the container: only
resources limits and requests (limits.cpu, limits.memory,
limits.ephemeral-storage, requests.cpu, requests.memory
and requests.ephemeral-storage) are currently supported.'
properties:
containerName:
description: 'Container name: required for volumes,
optional for env vars'
type: string
divisor:
description: Specifies the output format of the exposed
resources, defaults to "1"
type: string
resource:
description: 'Required: resource to select'
type: string
required:
- resource
type: object
secretKeyRef:
description: Selects a key of a secret in the pod's namespace
properties:
key:
description: The key of the secret to select from. Must
be a valid secret key.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the Secret or its key
must be defined
type: boolean
required:
- key
type: object
type: object
required:
- name
type: object
type: array
metadata:
description: Metadata for the service.
type: object
nodeSelector:
additionalProperties:
type: string
description: NodeSelector is a selector which must be true for the
pod to fit on a node.
type: object
priorityClassName:
description: PriorityClassName is to specify the priority of the
pods for this service.
type: string
resources:
description: Resources are Compute resources required by the service.
properties:
limits:
additionalProperties:
type: string
description: 'Limits describes the maximum amount of compute
resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
requests:
additionalProperties:
type: string
description: 'Requests describes the minimum amount of compute
resources required. If Requests is omitted for a container,
it defaults to Limits if that is explicitly specified, otherwise
to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
type: object
runtimeClassName:
description: RuntimeClassName refers to a RuntimeClass object in
the node.k8s.io group, which should be used to run pods for this
service. If no RuntimeClass resource matches the named class,
pods will not be running.
type: string
serviceAccountName:
description: ServiceAccountName overrides the service account for
the service pods.
type: string
storageClassName:
description: StorageClassName is the name of the StorageClass for
the persistent volume used by the service.
type: string
storageSize:
description: StorageSize is specification for the persistent volume
size used by the service.
type: string
type: object
messaging:
description: Messaging is specification for the CDAP messaging service.
properties:
env:
description: Env is a list of environment variables for the master
service container.
items:
description: EnvVar represents an environment variable present
in a Container.
properties:
name:
description: Name of the environment variable. Must be a C_IDENTIFIER.
type: string
value:
description: 'Variable references $(VAR_NAME) are expanded
using the previous defined environment variables in the
container and any service environment variables. If a variable
cannot be resolved, the reference in the input string will
be unchanged. The $(VAR_NAME) syntax can be escaped with
a double $$, ie: $$(VAR_NAME). Escaped references will never
be expanded, regardless of whether the variable exists or
not. Defaults to "".'
type: string
valueFrom:
description: Source for the environment variable's value.
Cannot be used if value is not empty.
properties:
configMapKeyRef:
description: Selects a key of a ConfigMap.
properties:
key:
description: The key to select.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the ConfigMap or its
key must be defined
type: boolean
required:
- key
type: object
fieldRef:
description: 'Selects a field of the pod: supports metadata.name,
metadata.namespace, metadata.labels, metadata.annotations,
spec.nodeName, spec.serviceAccountName, status.hostIP,
status.podIP, status.podIPs.'
properties:
apiVersion:
description: Version of the schema the FieldPath is
written in terms of, defaults to "v1".
type: string
fieldPath:
description: Path of the field to select in the specified
API version.
type: string
required:
- fieldPath
type: object
resourceFieldRef:
description: 'Selects a resource of the container: only
resources limits and requests (limits.cpu, limits.memory,
limits.ephemeral-storage, requests.cpu, requests.memory
and requests.ephemeral-storage) are currently supported.'
properties:
containerName:
description: 'Container name: required for volumes,
optional for env vars'
type: string
divisor:
description: Specifies the output format of the exposed
resources, defaults to "1"
type: string