Installing CDAP on Kubernetes

CDAP installation on Kubernetes was introduced in CDAP 6.2.3.

This document describes how to install CDAP on a Kubernetes cluster.

Dependencies

This section describes the infrastructure and software dependencies for operating CDAP in Kubernetes.

Kubernetes cluster

CDAP supports using Kubernetes (k8s) as the distributed resource manager. When CDAP is deployed to a k8s cluster, it spawns multiple Deployments and StatefulSets for running various CDAP services. The following diagram shows each of the CDAP services in the Kubernetes cluster:

The CDAP operator is responsible for deploying and managing all the CDAP services inside the cluster. The CDAP operator also supports managing multiple CDAP instances within the same k8s cluster. If multiple CDAP instances are deployed to the same k8s cluster, It is recommended to deploy them to different namespaces to provide better isolation.

Limitations

Currently CDAP only supports running one replica (pod) per service, except for the Preview Runner. Failure resiliency is handled by k8s to have pod restart upon failure. For pods created by StatefulSets, it relies on the infrastructure to have persistent volumes being re-mountable to the new pod, which potentially could be on a different machine.

Another limitation of operating CDAP in Kubernetes is that it does not support native compute profile. This means all user program executions are external to the Kubernetes cluster, and require a Hadoop cluster for program executions.

PostgreSQL database

CDAP needs a shared storage for its own metadata, such as deployed artifacts and applications, run histories, preferences, lineage information, and many more. Currently, CDAP supports both PostgreSQL and HBase as the metadata store. When running CDAP in Kubernetes, we recommend using PostgreSQL.

Elasticsearch

CDAP has support for metadata search, and it is backed by either Elasticsearch or HBase. In the Kubernetes environment, Elasticsearch is recommended. You can either configure CDAP to use an existing Elasticsearch cluster or run an Elasticsearch in Kubernetes by using the Elasticsearch Operator.

Hadoop Compatible File System (HCFS)

CDAP stores artifacts and runtime information through the HDFS API. Any of the HCFS implementations is supported.

Installation

This section describes the steps to deploy CDAP on Kubernetes.

Prerequisites

  • An operational Kubernetes cluster.

    • Recommended to have 64 GB of memory resources and 20 available virtual CPU for production deployment.

    • For better security, the Kubernetes cluster should have RBAC enabled.

    • Have kubectl set up to connect to the Kubernetes cluster.

  • A PostgreSQL database that is reachable from the Kubernetes cluster.

  • An Elasticsearch instance that is reachable from the Kubernetes cluster.

    • Refer to the Appendix section on how to set up an Elasticsearch instance inside the Kubernetes cluster.

Deploy CDAP Operator

CDAP provides a CDAP operator for easy deployment and management of CDAP in Kubernetes. You can deploy the following YAML to create all the necessary resources to have the operator running in the Kubernetes cluster, inside the cdap-system namespace.

 

# Create operator namespace apiVersion: v1 kind: Namespace metadata: name: cdap-system labels: name: cdap-system control-plane: cdap-operator --- # Create operator service account apiVersion: v1 kind: ServiceAccount metadata: name: cdap-operator namespace: cdap-system labels: control-plane: cdap-operator --- # Source cdap-operator/config/rbac/cdapmaster_editor_role.yaml # permissions to do edit cdapmasters. apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: cdapmaster-editor-role rules: - apiGroups: - cdap.cdap.io resources: - cdapmasters verbs: - create - delete - get - list - patch - update - watch - apiGroups: - cdap.cdap.io resources: - cdapmasters/status verbs: - get - patch - update --- # Source cdap-operator/config/rbac/cdapmaster_viewer_role.yaml # permissions to do viewer cdapmasters. apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: cdapmaster-viewer-role rules: - apiGroups: - cdap.cdap.io resources: - cdapmasters verbs: - get - list - watch - apiGroups: - cdap.cdap.io resources: - cdapmasters/status verbs: - get --- # Source cdap-operator/config/rbac/role.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: creationTimestamp: null name: cdap-operator-role rules: - apiGroups: - "" resources: - configmaps verbs: - create - delete - get - list - patch - update - watch - apiGroups: - "" resources: - services verbs: - create - delete - get - list - patch - update - watch - apiGroups: - apps resources: - deployments verbs: - create - delete - get - list - patch - update - watch - apiGroups: - apps resources: - deployments/status verbs: - get - patch - update - apiGroups: - apps resources: - statefulsets verbs: - create - delete - get - list - patch - update - watch - apiGroups: - batch resources: - jobs verbs: - create - delete - get - list - patch - update - watch - apiGroups: - cdap.cdap.io resources: - cdapmasters verbs: - create - delete - get - list - patch - update - watch - apiGroups: - cdap.cdap.io resources: - cdapmasters/status verbs: - get - patch - update --- # Source cdap-operator/config/rbac/role_binding.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: cdap-operator-rolebinding roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cdap-operator-role subjects: - kind: ServiceAccount name: cdap-operator namespace: cdap-system --- # Source cdap-operator/config/crd/bases/cdap.cdap.io_cdapmasters.yaml apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: annotations: controller-gen.kubebuilder.io/version: v0.2.4 creationTimestamp: null name: cdapmasters.cdap.cdap.io spec: group: cdap.cdap.io names: kind: CDAPMaster listKind: CDAPMasterList plural: cdapmasters singular: cdapmaster scope: Namespaced validation: openAPIV3Schema: description: CDAPMaster is the Schema for the cdapmasters API properties: apiVersion: description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources' type: string kind: description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' type: string metadata: type: object spec: description: "CDAPMasterSpec defines the desired state of CDAPMaster \n Important notes: * The field name of each service MUST match the constant values of ServiceName in constants.go as reflection is used to find field value. * For services that are optional (i.e. may or may not be required for CDAP to be operational), their service specification fields are pointers. By default, these optional services are disabled. Set to non-nil to enable them." properties: appFabric: description: AppFabric is specification for the CDAP app-fabric service. properties: env: description: Env is a list of environment variables for the master service container. items: description: EnvVar represents an environment variable present in a Container. properties: name: description: Name of the environment variable. Must be a C_IDENTIFIER. type: string value: description: 'Variable references $(VAR_NAME) are expanded using the previous defined environment variables in the container and any service environment variables. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not. Defaults to "".' type: string valueFrom: description: Source for the environment variable's value. Cannot be used if value is not empty. properties: configMapKeyRef: description: Selects a key of a ConfigMap. properties: key: description: The key to select. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the ConfigMap or its key must be defined type: boolean required: - key type: object fieldRef: description: 'Selects a field of the pod: supports metadata.name, metadata.namespace, metadata.labels, metadata.annotations, spec.nodeName, spec.serviceAccountName, status.hostIP, status.podIP, status.podIPs.' properties: apiVersion: description: Version of the schema the FieldPath is written in terms of, defaults to "v1". type: string fieldPath: description: Path of the field to select in the specified API version. type: string required: - fieldPath type: object resourceFieldRef: description: 'Selects a resource of the container: only resources limits and requests (limits.cpu, limits.memory, limits.ephemeral-storage, requests.cpu, requests.memory and requests.ephemeral-storage) are currently supported.' properties: containerName: description: 'Container name: required for volumes, optional for env vars' type: string divisor: description: Specifies the output format of the exposed resources, defaults to "1" type: string resource: description: 'Required: resource to select' type: string required: - resource type: object secretKeyRef: description: Selects a key of a secret in the pod's namespace properties: key: description: The key of the secret to select from. Must be a valid secret key. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the Secret or its key must be defined type: boolean required: - key type: object type: object required: - name type: object type: array metadata: description: Metadata for the service. type: object nodeSelector: additionalProperties: type: string description: NodeSelector is a selector which must be true for the pod to fit on a node. type: object priorityClassName: description: PriorityClassName is to specify the priority of the pods for this service. type: string resources: description: Resources are Compute resources required by the service. properties: limits: additionalProperties: type: string description: 'Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object requests: additionalProperties: type: string description: 'Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object type: object runtimeClassName: description: RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run pods for this service. If no RuntimeClass resource matches the named class, pods will not be running. type: string serviceAccountName: description: ServiceAccountName overrides the service account for the service pods. type: string storageClassName: description: StorageClassName is the name of the StorageClass for the persistent volume used by the service. type: string storageSize: description: StorageSize is specification for the persistent volume size used by the service. type: string type: object config: additionalProperties: type: string description: Config is a set of configurations that goes into cdap-site.xml. type: object configMapVolumes: additionalProperties: type: string description: ConfigMapVolumes defines a map from ConfigMap names to volume mount path. Key is the configmap object name. Value is the mount path. This adds ConfigMap data to the directory specified by the volume mount path. type: object image: description: Image is the docker image name for the CDAP backend. type: string imagePullPolicy: description: ImagePullPolicy is the policy for pulling docker images on Pod creation. type: string locationURI: description: LocationURI is an URI specifying an object storage for CDAP. type: string logLevels: additionalProperties: type: string description: LogLevels is a set of logger name to log level settings. type: object logs: description: Logs is specification for the CDAP logging service. properties: env: description: Env is a list of environment variables for the master service container. items: description: EnvVar represents an environment variable present in a Container. properties: name: description: Name of the environment variable. Must be a C_IDENTIFIER. type: string value: description: 'Variable references $(VAR_NAME) are expanded using the previous defined environment variables in the container and any service environment variables. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not. Defaults to "".' type: string valueFrom: description: Source for the environment variable's value. Cannot be used if value is not empty. properties: configMapKeyRef: description: Selects a key of a ConfigMap. properties: key: description: The key to select. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the ConfigMap or its key must be defined type: boolean required: - key type: object fieldRef: description: 'Selects a field of the pod: supports metadata.name, metadata.namespace, metadata.labels, metadata.annotations, spec.nodeName, spec.serviceAccountName, status.hostIP, status.podIP, status.podIPs.' properties: apiVersion: description: Version of the schema the FieldPath is written in terms of, defaults to "v1". type: string fieldPath: description: Path of the field to select in the specified API version. type: string required: - fieldPath type: object resourceFieldRef: description: 'Selects a resource of the container: only resources limits and requests (limits.cpu, limits.memory, limits.ephemeral-storage, requests.cpu, requests.memory and requests.ephemeral-storage) are currently supported.' properties: containerName: description: 'Container name: required for volumes, optional for env vars' type: string divisor: description: Specifies the output format of the exposed resources, defaults to "1" type: string resource: description: 'Required: resource to select' type: string required: - resource type: object secretKeyRef: description: Selects a key of a secret in the pod's namespace properties: key: description: The key of the secret to select from. Must be a valid secret key. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the Secret or its key must be defined type: boolean required: - key type: object type: object required: - name type: object type: array metadata: description: Metadata for the service. type: object nodeSelector: additionalProperties: type: string description: NodeSelector is a selector which must be true for the pod to fit on a node. type: object priorityClassName: description: PriorityClassName is to specify the priority of the pods for this service. type: string resources: description: Resources are Compute resources required by the service. properties: limits: additionalProperties: type: string description: 'Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object requests: additionalProperties: type: string description: 'Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object type: object runtimeClassName: description: RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run pods for this service. If no RuntimeClass resource matches the named class, pods will not be running. type: string serviceAccountName: description: ServiceAccountName overrides the service account for the service pods. type: string storageClassName: description: StorageClassName is the name of the StorageClass for the persistent volume used by the service. type: string storageSize: description: StorageSize is specification for the persistent volume size used by the service. type: string type: object messaging: description: Messaging is specification for the CDAP messaging service. properties: env: description: Env is a list of environment variables for the master service container. items: description: EnvVar represents an environment variable present in a Container. properties: name: description: Name of the environment variable. Must be a C_IDENTIFIER. type: string value: description: 'Variable references $(VAR_NAME) are expanded using the previous defined environment variables in the container and any service environment variables. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not. Defaults to "".' type: string valueFrom: description: Source for the environment variable's value. Cannot be used if value is not empty. properties: configMapKeyRef: description: Selects a key of a ConfigMap. properties: key: description: The key to select. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the ConfigMap or its key must be defined type: boolean required: - key type: object fieldRef: description: 'Selects a field of the pod: supports metadata.name, metadata.namespace, metadata.labels, metadata.annotations, spec.nodeName, spec.serviceAccountName, status.hostIP, status.podIP, status.podIPs.' properties: apiVersion: description: Version of the schema the FieldPath is written in terms of, defaults to "v1". type: string fieldPath: description: Path of the field to select in the specified API version. type: string required: - fieldPath type: object resourceFieldRef: description: 'Selects a resource of the container: only resources limits and requests (limits.cpu, limits.memory, limits.ephemeral-storage, requests.cpu, requests.memory and requests.ephemeral-storage) are currently supported.' properties: containerName: description: 'Container name: required for volumes, optional for env vars' type: string divisor: description: Specifies the output format of the exposed resources, defaults to "1" type: string