Installing CDAP on Kubernetes

CDAP installation on Kubernetes was introduced in CDAP 6.2.3.

This document describes how to install CDAP on a Kubernetes cluster.

Dependencies

This section describes the infrastructure and software dependencies for operating CDAP in Kubernetes.

Kubernetes cluster

CDAP supports using Kubernetes (k8s) as the distributed resource manager. When CDAP is deployed to a k8s cluster, it spawns multiple Deployments and StatefulSets for running various CDAP services. The following diagram shows each of the CDAP services in the Kubernetes cluster:

The CDAP operator is responsible for deploying and managing all the CDAP services inside the cluster. The CDAP operator also supports managing multiple CDAP instances within the same k8s cluster. If multiple CDAP instances are deployed to the same k8s cluster, It is recommended to deploy them to different namespaces to provide better isolation.

Limitations

Currently CDAP only supports running one replica (pod) per service, except for the Preview Runner. Failure resiliency is handled by k8s to have pod restart upon failure. For pods created by StatefulSets, it relies on the infrastructure to have persistent volumes being re-mountable to the new pod, which potentially could be on a different machine.

Another limitation of operating CDAP in Kubernetes is that it does not support native compute profile. This means all user program executions are external to the Kubernetes cluster, and require a Hadoop cluster for program executions.

PostgreSQL database

CDAP needs a shared storage for its own metadata, such as deployed artifacts and applications, run histories, preferences, lineage information, and many more. Currently, CDAP supports both PostgreSQL and HBase as the metadata store. When running CDAP in Kubernetes, we recommend using PostgreSQL.

Elasticsearch

CDAP has support for metadata search, and it is backed by either Elasticsearch or HBase. In the Kubernetes environment, Elasticsearch is recommended. You can either configure CDAP to use an existing Elasticsearch cluster or run an Elasticsearch in Kubernetes by using the Elasticsearch Operator.

Hadoop Compatible File System (HCFS)

CDAP stores artifacts and runtime information through the HDFS API. Any of the HCFS implementations is supported.

Installation

This section describes the steps to deploy CDAP on Kubernetes.

Prerequisites

  • An operational Kubernetes cluster.

    • Recommended to have 64 GB of memory resources and 20 available virtual CPU for production deployment.

    • For better security, the Kubernetes cluster should have RBAC enabled.

    • Have kubectl set up to connect to the Kubernetes cluster.

  • A PostgreSQL database that is reachable from the Kubernetes cluster.

  • An Elasticsearch instance that is reachable from the Kubernetes cluster.

    • Refer to the Appendix section on how to set up an Elasticsearch instance inside the Kubernetes cluster.

Deploy CDAP Operator

CDAP provides a CDAP operator for easy deployment and management of CDAP in Kubernetes. You can deploy the following YAML to create all the necessary resources to have the operator running in the Kubernetes cluster, inside the cdap-system namespace.

 

# Create operator namespace apiVersion: v1 kind: Namespace metadata: name: cdap-system labels: name: cdap-system control-plane: cdap-operator --- # Create operator service account apiVersion: v1 kind: ServiceAccount metadata: name: cdap-operator namespace: cdap-system labels: control-plane: cdap-operator --- # Source cdap-operator/config/rbac/cdapmaster_editor_role.yaml # permissions to do edit cdapmasters. apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: cdapmaster-editor-role rules: - apiGroups: - cdap.cdap.io resources: - cdapmasters verbs: - create - delete - get - list - patch - update - watch - apiGroups: - cdap.cdap.io resources: - cdapmasters/status verbs: - get - patch - update --- # Source cdap-operator/config/rbac/cdapmaster_viewer_role.yaml # permissions to do viewer cdapmasters. apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: cdapmaster-viewer-role rules: - apiGroups: - cdap.cdap.io resources: - cdapmasters verbs: - get - list - watch - apiGroups: - cdap.cdap.io resources: - cdapmasters/status verbs: - get --- # Source cdap-operator/config/rbac/role.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: creationTimestamp: null name: cdap-operator-role rules: - apiGroups: - "" resources: - configmaps verbs: - create - delete - get - list - patch - update - watch - apiGroups: - "" resources: - services verbs: - create - delete - get - list - patch - update - watch - apiGroups: - apps resources: - deployments verbs: - create - delete - get - list - patch - update - watch - apiGroups: - apps resources: - deployments/status verbs: - get - patch - update - apiGroups: - apps resources: - statefulsets verbs: - create - delete - get - list - patch - update - watch - apiGroups: - batch resources: - jobs verbs: - create - delete - get - list - patch - update - watch - apiGroups: - cdap.cdap.io resources: - cdapmasters verbs: - create - delete - get - list - patch - update - watch - apiGroups: - cdap.cdap.io resources: - cdapmasters/status verbs: - get - patch - update --- # Source cdap-operator/config/rbac/role_binding.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: cdap-operator-rolebinding roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cdap-operator-role subjects: - kind: ServiceAccount name: cdap-operator namespace: cdap-system --- # Source cdap-operator/config/crd/bases/cdap.cdap.io_cdapmasters.yaml apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: annotations: controller-gen.kubebuilder.io/version: v0.2.4 creationTimestamp: null name: cdapmasters.cdap.cdap.io spec: group: cdap.cdap.io names: kind: CDAPMaster listKind: CDAPMasterList plural: cdapmasters singular: cdapmaster scope: Namespaced validation: openAPIV3Schema: description: CDAPMaster is the Schema for the cdapmasters API properties: apiVersion: description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources' type: string kind: description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' type: string metadata: type: object spec: description: "CDAPMasterSpec defines the desired state of CDAPMaster \n Important notes: * The field name of each service MUST match the constant values of ServiceName in constants.go as reflection is used to find field value. * For services that are optional (i.e. may or may not be required for CDAP to be operational), their service specification fields are pointers. By default, these optional services are disabled. Set to non-nil to enable them." properties: appFabric: description: AppFabric is specification for the CDAP app-fabric service. properties: env: description: Env is a list of environment variables for the master service container. items: description: EnvVar represents an environment variable present in a Container. properties: name: description: Name of the environment variable. Must be a C_IDENTIFIER. type: string value: description: 'Variable references $(VAR_NAME) are expanded using the previous defined environment variables in the container and any service environment variables. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not. Defaults to "".' type: string valueFrom: description: Source for the environment variable's value. Cannot be used if value is not empty. properties: configMapKeyRef: description: Selects a key of a ConfigMap. properties: key: description: The key to select. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the ConfigMap or its key must be defined type: boolean required: - key type: object fieldRef: description: 'Selects a field of the pod: supports metadata.name, metadata.namespace, metadata.labels, metadata.annotations, spec.nodeName, spec.serviceAccountName, status.hostIP, status.podIP, status.podIPs.' properties: apiVersion: description: Version of the schema the FieldPath is written in terms of, defaults to "v1". type: string fieldPath: description: Path of the field to select in the specified API version. type: string required: - fieldPath type: object resourceFieldRef: description: 'Selects a resource of the container: only resources limits and requests (limits.cpu, limits.memory, limits.ephemeral-storage, requests.cpu, requests.memory and requests.ephemeral-storage) are currently supported.' properties: containerName: description: 'Container name: required for volumes, optional for env vars' type: string divisor: description: Specifies the output format of the exposed resources, defaults to "1" type: string resource: description: 'Required: resource to select' type: string required: - resource type: object secretKeyRef: description: Selects a key of a secret in the pod's namespace properties: key: description: The key of the secret to select from. Must be a valid secret key. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the Secret or its key must be defined type: boolean required: - key type: object type: object required: - name type: object type: array metadata: description: Metadata for the service. type: object nodeSelector: additionalProperties: type: string description: NodeSelector is a selector which must be true for the pod to fit on a node. type: object priorityClassName: description: PriorityClassName is to specify the priority of the pods for this service. type: string resources: description: Resources are Compute resources required by the service. properties: limits: additionalProperties: type: string description: 'Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object requests: additionalProperties: type: string description: 'Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object type: object runtimeClassName: description: RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run pods for this service. If no RuntimeClass resource matches the named class, pods will not be running. type: string serviceAccountName: description: ServiceAccountName overrides the service account for the service pods. type: string storageClassName: description: StorageClassName is the name of the StorageClass for the persistent volume used by the service. type: string storageSize: description: StorageSize is specification for the persistent volume size used by the service. type: string type: object config: additionalProperties: type: string description: Config is a set of configurations that goes into cdap-site.xml. type: object configMapVolumes: additionalProperties: type: string description: ConfigMapVolumes defines a map from ConfigMap names to volume mount path. Key is the configmap object name. Value is the mount path. This adds ConfigMap data to the directory specified by the volume mount path. type: object image: description: Image is the docker image name for the CDAP backend. type: string imagePullPolicy: description: ImagePullPolicy is the policy for pulling docker images on Pod creation. type: string locationURI: description: LocationURI is an URI specifying an object storage for CDAP. type: string logLevels: additionalProperties: type: string description: LogLevels is a set of logger name to log level settings. type: object logs: description: Logs is specification for the CDAP logging service. properties: env: description: Env is a list of environment variables for the master service container. items: description: EnvVar represents an environment variable present in a Container. properties: name: description: Name of the environment variable. Must be a C_IDENTIFIER. type: string value: description: 'Variable references $(VAR_NAME) are expanded using the previous defined environment variables in the container and any service environment variables. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not. Defaults to "".' type: string valueFrom: description: Source for the environment variable's value. Cannot be used if value is not empty. properties: configMapKeyRef: description: Selects a key of a ConfigMap. properties: key: description: The key to select. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the ConfigMap or its key must be defined type: boolean required: - key type: object fieldRef: description: 'Selects a field of the pod: supports metadata.name, metadata.namespace, metadata.labels, metadata.annotations, spec.nodeName, spec.serviceAccountName, status.hostIP, status.podIP, status.podIPs.' properties: apiVersion: description: Version of the schema the FieldPath is written in terms of, defaults to "v1". type: string fieldPath: description: Path of the field to select in the specified API version. type: string required: - fieldPath type: object resourceFieldRef: description: 'Selects a resource of the container: only resources limits and requests (limits.cpu, limits.memory, limits.ephemeral-storage, requests.cpu, requests.memory and requests.ephemeral-storage) are currently supported.' properties: containerName: description: 'Container name: required for volumes, optional for env vars' type: string divisor: description: Specifies the output format of the exposed resources, defaults to "1" type: string resource: description: 'Required: resource to select' type: string required: - resource type: object secretKeyRef: description: Selects a key of a secret in the pod's namespace properties: key: description: The key of the secret to select from. Must be a valid secret key. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the Secret or its key must be defined type: boolean required: - key type: object type: object required: - name type: object type: array metadata: description: Metadata for the service. type: object nodeSelector: additionalProperties: type: string description: NodeSelector is a selector which must be true for the pod to fit on a node. type: object priorityClassName: description: PriorityClassName is to specify the priority of the pods for this service. type: string resources: description: Resources are Compute resources required by the service. properties: limits: additionalProperties: type: string description: 'Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object requests: additionalProperties: type: string description: 'Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object type: object runtimeClassName: description: RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run pods for this service. If no RuntimeClass resource matches the named class, pods will not be running. type: string serviceAccountName: description: ServiceAccountName overrides the service account for the service pods. type: string storageClassName: description: StorageClassName is the name of the StorageClass for the persistent volume used by the service. type: string storageSize: description: StorageSize is specification for the persistent volume size used by the service. type: string type: object messaging: description: Messaging is specification for the CDAP messaging service. properties: env: description: Env is a list of environment variables for the master service container. items: description: EnvVar represents an environment variable present in a Container. properties: name: description: Name of the environment variable. Must be a C_IDENTIFIER. type: string value: description: 'Variable references $(VAR_NAME) are expanded using the previous defined environment variables in the container and any service environment variables. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not. Defaults to "".' type: string valueFrom: description: Source for the environment variable's value. Cannot be used if value is not empty. properties: configMapKeyRef: description: Selects a key of a ConfigMap. properties: key: description: The key to select. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the ConfigMap or its key must be defined type: boolean required: - key type: object fieldRef: description: 'Selects a field of the pod: supports metadata.name, metadata.namespace, metadata.labels, metadata.annotations, spec.nodeName, spec.serviceAccountName, status.hostIP, status.podIP, status.podIPs.' properties: apiVersion: description: Version of the schema the FieldPath is written in terms of, defaults to "v1". type: string fieldPath: description: Path of the field to select in the specified API version. type: string required: - fieldPath type: object resourceFieldRef: description: 'Selects a resource of the container: only resources limits and requests (limits.cpu, limits.memory, limits.ephemeral-storage, requests.cpu, requests.memory and requests.ephemeral-storage) are currently supported.' properties: containerName: description: 'Container name: required for volumes, optional for env vars' type: string divisor: description: Specifies the output format of the exposed resources, defaults to "1" type: string resource: description: 'Required: resource to select' type: string required: - resource type: object secretKeyRef: description: Selects a key of a secret in the pod's namespace properties: key: description: The key of the secret to select from. Must be a valid secret key. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the Secret or its key must be defined type: boolean required: - key type: object type: object required: - name type: object type: array metadata: description: Metadata for the service. type: object nodeSelector: additionalProperties: type: string description: NodeSelector is a selector which must be true for the pod to fit on a node. type: object priorityClassName: description: PriorityClassName is to specify the priority of the pods for this service. type: string resources: description: Resources are Compute resources required by the service. properties: limits: additionalProperties: type: string description: 'Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object requests: additionalProperties: type: string description: 'Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object type: object runtimeClassName: description: RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run pods for this service. If no RuntimeClass resource matches the named class, pods will not be running. type: string serviceAccountName: description: ServiceAccountName overrides the service account for the service pods. type: string storageClassName: description: StorageClassName is the name of the StorageClass for the persistent volume used by the service. type: string storageSize: description: StorageSize is specification for the persistent volume size used by the service. type: string type: object metadata: description: Metadata is specification for the CDAP metadata service. properties: env: description: Env is a list of environment variables for the master service container. items: description: EnvVar represents an environment variable present in a Container. properties: name: description: Name of the environment variable. Must be a C_IDENTIFIER. type: string value: description: 'Variable references $(VAR_NAME) are expanded using the previous defined environment variables in the container and any service environment variables. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not. Defaults to "".' type: string valueFrom: description: Source for the environment variable's value. Cannot be used if value is not empty. properties: configMapKeyRef: description: Selects a key of a ConfigMap. properties: key: description: The key to select. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the ConfigMap or its key must be defined type: boolean required: - key type: object fieldRef: description: 'Selects a field of the pod: supports metadata.name, metadata.namespace, metadata.labels, metadata.annotations, spec.nodeName, spec.serviceAccountName, status.hostIP, status.podIP, status.podIPs.' properties: apiVersion: description: Version of the schema the FieldPath is written in terms of, defaults to "v1". type: string fieldPath: description: Path of the field to select in the specified API version. type: string required: - fieldPath type: object resourceFieldRef: description: 'Selects a resource of the container: only resources limits and requests (limits.cpu, limits.memory, limits.ephemeral-storage, requests.cpu, requests.memory and requests.ephemeral-storage) are currently supported.' properties: containerName: description: 'Container name: required for volumes, optional for env vars' type: string divisor: description: Specifies the output format of the exposed resources, defaults to "1" type: string resource: description: 'Required: resource to select' type: string required: - resource type: object secretKeyRef: description: Selects a key of a secret in the pod's namespace properties: key: description: The key of the secret to select from. Must be a valid secret key. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the Secret or its key must be defined type: boolean required: - key type: object type: object required: - name type: object type: array metadata: description: Metadata for the service. type: object nodeSelector: additionalProperties: type: string description: NodeSelector is a selector which must be true for the pod to fit on a node. type: object priorityClassName: description: PriorityClassName is to specify the priority of the pods for this service. type: string resources: description: Resources are Compute resources required by the service. properties: limits: additionalProperties: type: string description: 'Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object requests: additionalProperties: type: string description: 'Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object type: object runtimeClassName: description: RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run pods for this service. If no RuntimeClass resource matches the named class, pods will not be running. type: string serviceAccountName: description: ServiceAccountName overrides the service account for the service pods. type: string storageClassName: description: StorageClassName is the name of the StorageClass for the persistent volume used by the service. type: string storageSize: description: StorageSize is specification for the persistent volume size used by the service. type: string type: object metrics: description: Metrics is specification for the CDAP metrics service. properties: env: description: Env is a list of environment variables for the master service container. items: description: EnvVar represents an environment variable present in a Container. properties: name: description: Name of the environment variable. Must be a C_IDENTIFIER. type: string value: description: 'Variable references $(VAR_NAME) are expanded using the previous defined environment variables in the container and any service environment variables. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not. Defaults to "".' type: string valueFrom: description: Source for the environment variable's value. Cannot be used if value is not empty. properties: configMapKeyRef: description: Selects a key of a ConfigMap. properties: key: description: The key to select. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the ConfigMap or its key must be defined type: boolean required: - key type: object fieldRef: description: 'Selects a field of the pod: supports metadata.name, metadata.namespace, metadata.labels, metadata.annotations, spec.nodeName, spec.serviceAccountName, status.hostIP, status.podIP, status.podIPs.' properties: apiVersion: description: Version of the schema the FieldPath is written in terms of, defaults to "v1". type: string fieldPath: description: Path of the field to select in the specified API version. type: string required: - fieldPath type: object resourceFieldRef: description: 'Selects a resource of the container: only resources limits and requests (limits.cpu, limits.memory, limits.ephemeral-storage, requests.cpu, requests.memory and requests.ephemeral-storage) are currently supported.' properties: containerName: description: 'Container name: required for volumes, optional for env vars' type: string divisor: description: Specifies the output format of the exposed resources, defaults to "1" type: string resource: description: 'Required: resource to select' type: string required: - resource type: object secretKeyRef: description: Selects a key of a secret in the pod's namespace properties: key: description: The key of the secret to select from. Must be a valid secret key. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the Secret or its key must be defined type: boolean required: - key type: object type: object required: - name type: object type: array metadata: description: Metadata for the service. type: object nodeSelector: additionalProperties: type: string description: NodeSelector is a selector which must be true for the pod to fit on a node. type: object priorityClassName: description: PriorityClassName is to specify the priority of the pods for this service. type: string resources: description: Resources are Compute resources required by the service. properties: limits: additionalProperties: type: string description: 'Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object requests: additionalProperties: type: string description: 'Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object type: object runtimeClassName: description: RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run pods for this service. If no RuntimeClass resource matches the named class, pods will not be running. type: string serviceAccountName: description: ServiceAccountName overrides the service account for the service pods. type: string storageClassName: description: StorageClassName is the name of the StorageClass for the persistent volume used by the service. type: string storageSize: description: StorageSize is specification for the persistent volume size used by the service. type: string type: object preview: description: Preview is specification for the CDAP preview service. properties: env: description: Env is a list of environment variables for the master service container. items: description: EnvVar represents an environment variable present in a Container. properties: name: description: Name of the environment variable. Must be a C_IDENTIFIER. type: string value: description: 'Variable references $(VAR_NAME) are expanded using the previous defined environment variables in the container and any service environment variables. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not. Defaults to "".' type: string valueFrom: description: Source for the environment variable's value. Cannot be used if value is not empty. properties: configMapKeyRef: description: Selects a key of a ConfigMap. properties: key: description: The key to select. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the ConfigMap or its key must be defined type: boolean required: - key type: object fieldRef: description: 'Selects a field of the pod: supports metadata.name, metadata.namespace, metadata.labels, metadata.annotations, spec.nodeName, spec.serviceAccountName, status.hostIP, status.podIP, status.podIPs.' properties: apiVersion: description: Version of the schema the FieldPath is written in terms of, defaults to "v1". type: string fieldPath: description: Path of the field to select in the specified API version. type: string required: - fieldPath type: object resourceFieldRef: description: 'Selects a resource of the container: only resources limits and requests (limits.cpu, limits.memory, limits.ephemeral-storage, requests.cpu, requests.memory and requests.ephemeral-storage) are currently supported.' properties: containerName: description: 'Container name: required for volumes, optional for env vars' type: string divisor: description: Specifies the output format of the exposed resources, defaults to "1" type: string resource: description: 'Required: resource to select' type: string required: - resource type: object secretKeyRef: description: Selects a key of a secret in the pod's namespace properties: key: description: The key of the secret to select from. Must be a valid secret key. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the Secret or its key must be defined type: boolean required: - key type: object type: object required: - name type: object type: array metadata: description: Metadata for the service. type: object nodeSelector: additionalProperties: type: string description: NodeSelector is a selector which must be true for the pod to fit on a node. type: object priorityClassName: description: PriorityClassName is to specify the priority of the pods for this service. type: string resources: description: Resources are Compute resources required by the service. properties: limits: additionalProperties: type: string description: 'Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object requests: additionalProperties: type: string description: 'Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object type: object runtimeClassName: description: RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run pods for this service. If no RuntimeClass resource matches the named class, pods will not be running. type: string serviceAccountName: description: ServiceAccountName overrides the service account for the service pods. type: string storageClassName: description: StorageClassName is the name of the StorageClass for the persistent volume used by the service. type: string storageSize: description: StorageSize is specification for the persistent volume size used by the service. type: string type: object router: description: Router is specification for the CDAP router service. properties: env: description: Env is a list of environment variables for the master service container. items: description: EnvVar represents an environment variable present in a Container. properties: name: description: Name of the environment variable. Must be a C_IDENTIFIER. type: string value: description: 'Variable references $(VAR_NAME) are expanded using the previous defined environment variables in the container and any service environment variables. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not. Defaults to "".' type: string valueFrom: description: Source for the environment variable's value. Cannot be used if value is not empty. properties: configMapKeyRef: description: Selects a key of a ConfigMap. properties: key: description: The key to select. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the ConfigMap or its key must be defined type: boolean required: - key type: object fieldRef: description: 'Selects a field of the pod: supports metadata.name, metadata.namespace, metadata.labels, metadata.annotations, spec.nodeName, spec.serviceAccountName, status.hostIP, status.podIP, status.podIPs.' properties: apiVersion: description: Version of the schema the FieldPath is written in terms of, defaults to "v1". type: string fieldPath: description: Path of the field to select in the specified API version. type: string required: - fieldPath type: object resourceFieldRef: description: 'Selects a resource of the container: only resources limits and requests (limits.cpu, limits.memory, limits.ephemeral-storage, requests.cpu, requests.memory and requests.ephemeral-storage) are currently supported.' properties: containerName: description: 'Container name: required for volumes, optional for env vars' type: string divisor: description: Specifies the output format of the exposed resources, defaults to "1" type: string resource: description: 'Required: resource to select' type: string required: - resource type: object secretKeyRef: description: Selects a key of a secret in the pod's namespace properties: key: description: The key of the secret to select from. Must be a valid secret key. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the Secret or its key must be defined type: boolean required: - key type: object type: object required: - name type: object type: array metadata: description: Metadata for the service. type: object nodeSelector: additionalProperties: type: string description: NodeSelector is a selector which must be true for the pod to fit on a node. type: object priorityClassName: description: PriorityClassName is to specify the priority of the pods for this service. type: string replicas: description: Replicas is number of replicas for the service. format: int32 type: integer resources: description: Resources are Compute resources required by the service. properties: limits: additionalProperties: type: string description: 'Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object requests: additionalProperties: type: string description: 'Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object type: object runtimeClassName: description: RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run pods for this service. If no RuntimeClass resource matches the named class, pods will not be running. type: string serviceAccountName: description: ServiceAccountName overrides the service account for the service pods. type: string servicePort: description: ServicePort is the port number for the service. format: int32 type: integer serviceType: description: ServiceType is the service type in kubernetes, default is NodePort. type: string type: object runtime: description: 'Runtime is specification for the CDAP runtime service. This is an optional service and may not be required for CDAP to be operational. To disable this service: either omit or set the field to nil To enable this service: set it to a pointer to a RuntimeSpec struct (can be an empty struct)' properties: env: description: Env is a list of environment variables for the master service container. items: description: EnvVar represents an environment variable present in a Container. properties: name: description: Name of the environment variable. Must be a C_IDENTIFIER. type: string value: description: 'Variable references $(VAR_NAME) are expanded using the previous defined environment variables in the container and any service environment variables. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not. Defaults to "".' type: string valueFrom: description: Source for the environment variable's value. Cannot be used if value is not empty. properties: configMapKeyRef: description: Selects a key of a ConfigMap. properties: key: description: The key to select. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the ConfigMap or its key must be defined type: boolean required: - key type: object fieldRef: description: 'Selects a field of the pod: supports metadata.name, metadata.namespace, metadata.labels, metadata.annotations, spec.nodeName, spec.serviceAccountName, status.hostIP, status.podIP, status.podIPs.' properties: apiVersion: description: Version of the schema the FieldPath is written in terms of, defaults to "v1". type: string fieldPath: description: Path of the field to select in the specified API version. type: string required: - fieldPath type: object resourceFieldRef: description: 'Selects a resource of the container: only resources limits and requests (limits.cpu, limits.memory, limits.ephemeral-storage, requests.cpu, requests.memory and requests.ephemeral-storage) are currently supported.' properties: containerName: description: 'Container name: required for volumes, optional for env vars' type: string divisor: description: Specifies the output format of the exposed resources, defaults to "1" type: string resource: description: 'Required: resource to select' type: string required: - resource type: object secretKeyRef: description: Selects a key of a secret in the pod's namespace properties: key: description: The key of the secret to select from. Must be a valid secret key. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the Secret or its key must be defined type: boolean required: - key type: object type: object required: - name type: object type: array metadata: description: Metadata for the service. type: object nodeSelector: additionalProperties: type: string description: NodeSelector is a selector which must be true for the pod to fit on a node. type: object priorityClassName: description: PriorityClassName is to specify the priority of the pods for this service. type: string resources: description: Resources are Compute resources required by the service. properties: limits: additionalProperties: type: string description: 'Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object requests: additionalProperties: type: string description: 'Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object type: object runtimeClassName: description: RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run pods for this service. If no RuntimeClass resource matches the named class, pods will not be running. type: string serviceAccountName: description: ServiceAccountName overrides the service account for the service pods. type: string storageClassName: description: StorageClassName is the name of the StorageClass for the persistent volume used by the service. type: string storageSize: description: StorageSize is specification for the persistent volume size used by the service. type: string type: object securitySecret: description: SecuritySecret is secret that contains security related configurations for CDAP. type: string serviceAccountName: description: ServiceAccountName is the service account for all the service pods. type: string systemappconfigs: additionalProperties: type: string description: SystemAppConfigs specifies configs used by CDAP to run system apps dynamically. Each entry is of format <filename, json app config> which will create a separate system config file with entry value as file content. type: object userInterface: description: UserInterface is specification for the CDAP UI service. properties: env: description: Env is a list of environment variables for the master service container. items: description: EnvVar represents an environment variable present in a Container. properties: name: description: Name of the environment variable. Must be a C_IDENTIFIER. type: string value: description: 'Variable references $(VAR_NAME) are expanded using the previous defined environment variables in the container and any service environment variables. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not. Defaults to "".' type: string valueFrom: description: Source for the environment variable's value. Cannot be used if value is not empty. properties: configMapKeyRef: description: Selects a key of a ConfigMap. properties: key: description: The key to select. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the ConfigMap or its key must be defined type: boolean required: - key type: object fieldRef: description: 'Selects a field of the pod: supports metadata.name, metadata.namespace, metadata.labels, metadata.annotations, spec.nodeName, spec.serviceAccountName, status.hostIP, status.podIP, status.podIPs.' properties: apiVersion: description: Version of the schema the FieldPath is written in terms of, defaults to "v1". type: string fieldPath: description: Path of the field to select in the specified API version. type: string required: - fieldPath type: object resourceFieldRef: description: 'Selects a resource of the container: only resources limits and requests (limits.cpu, limits.memory, limits.ephemeral-storage, requests.cpu, requests.memory and requests.ephemeral-storage) are currently supported.' properties: containerName: description: 'Container name: required for volumes, optional for env vars' type: string divisor: description: Specifies the output format of the exposed resources, defaults to "1" type: string resource: description: 'Required: resource to select' type: string required: - resource type: object secretKeyRef: description: Selects a key of a secret in the pod's namespace properties: key: description: The key of the secret to select from. Must be a valid secret key. type: string name: description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?' type: string optional: description: Specify whether the Secret or its key must be defined type: boolean required: - key type: object type: object required: - name type: object type: array metadata: description: Metadata for the service. type: object nodeSelector: additionalProperties: type: string description: NodeSelector is a selector which must be true for the pod to fit on a node. type: object priorityClassName: description: PriorityClassName is to specify the priority of the pods for this service. type: string replicas: description: Replicas is number of replicas for the service. format: int32 type: integer resources: description: Resources are Compute resources required by the service. properties: limits: additionalProperties: type: string description: 'Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object requests: additionalProperties: type: string description: 'Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' type: object type: object runtimeClassName: description: RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run pods for this service. If no RuntimeClass resource matches the named class, pods will not be running. type: string serviceAccountName: description: ServiceAccountName overrides the service account for the service pods. type: string servicePort: description: ServicePort is the port number for the service. format: int32 type: integer serviceType: description: ServiceType is the service type in kubernetes, default is NodePort. type: string type: object userInterfaceImage: description: UserInterfaceImage is the docker image name for the CDAP UI. type: string required: - locationURI type: object status: description: CDAPMasterStatus defines the observed state of CDAPMaster properties: components: description: Object status array for all matching objects items: description: ObjectStatus is a generic status holder for objects properties: group: description: Object group type: string kind: description: Kind of object type: string link: description: Link to object type: string name: description: Name of object type: string pdb: description: PDB status properties: currenthealthy: description: currentHealthy format: int32 type: integer desiredhealthy: description: desiredHealthy format: int32 type: integer required: - currenthealthy - desiredhealthy type: object status: description: 'Status. Values: InProgress, Ready, Unknown' type: string sts: description: StatefulSet status properties: currentcount: description: CurrentReplicas defines the no of MySQL instances that are created format: int32 type: integer progress: description: 'progress is a fuzzy indicator. Interpret as a percentage (0-100) eg: for statefulsets, progress = 100*readyreplicas/replicas' format: int32 type: integer readycount: description: ReadyReplicas defines the no of MySQL instances that are ready format: int32 type: integer replicas: description: Replicas defines the no of MySQL instances desired format: int32 type: integer required: - currentcount - progress - readycount - replicas type: object type: object type: array conditions: description: Conditions represents the latest state of the object items: description: Condition describes the state of an object at a certain point. properties: lastTransitionTime: description: Last time the condition transitioned from one status to another. format: date-time type: string lastUpdateTime: description: Last time the condition was probed format: date-time type: string message: description: A human readable message indicating details about the transition. type: string reason: description: The reason for the condition's last transition. type: string status: description: Status of the condition, one of True, False, Unknown. type: string type: description: Type of condition. type: string required: - status - type type: object type: array downgradeStartTimeMillis: description: DowngradeStartTimeMillis is the start time in milliseconds of the downgrade process format: int64 type: integer imageToUse: description: ImageToUse is the Docker image of CDAP backend the operator uses to deploy. type: string observedGeneration: description: ObservedGeneration is the most recent generation observed. It corresponds to the Object's generation, which is updated on mutation by the API Server. format: int64 type: integer upgradeStartTimeMillis: description: UpgradeStartTimeMillis is the start time in milliseconds of the upgrade process format: int64 type: integer userInterfaceImageToUse: description: UserInterfaceImageToUse is the Docker image of CDAP UI the operator uses to deploy. type: string type: object type: object version: v1alpha1 versions: - name: v1alpha1 served: true storage: true status: acceptedNames: kind: "" plural: "" conditions: [] storedVersions: [] --- # StatefulSet for running the cdap controller apiVersion: apps/v1 kind: StatefulSet metadata: name: cdap-controller namespace: cdap-system labels: control-plane: cdap-operator spec: selector: matchLabels: control-plane: cdap-operator serviceName: cdap-operator-service template: metadata: labels: control-plane: cdap-operator spec: serviceAccountName: cdap-operator containers: - command: - /manager image: gcr.io/cdapio/cdap-controller:latest name: manager resources: limits: cpu: 100m memory: 30Mi requests: cpu: 100m memory: 20Mi terminationGracePeriodSeconds: 10

Create RBAC Roles and RoleBinding

CDAP interacts with Kubernetes for configuration, service discovery, and also workload management. Deploying the following YAML file will create the necessary set of RBAC Roles and RoleBinding to the service account called cdap.

 

# Create cdap service account apiVersion: v1 kind: ServiceAccount metadata: name: cdap --- # Create cdap role apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: cdap-role rules: - apiGroups: - "" resources: - configmaps verbs: - create - delete - get - list - patch - update - watch - apiGroups: - "" resources: - secrets verbs: - create - delete - get - list - patch - update - watch - apiGroups: - "" resources: - pods verbs: - create - get - list - watch - delete - deletecollection - apiGroups: - "" resources: - services verbs: - create - delete - get - list - patch - update - watch - apiGroups: - "" resources: - persistentvolumeclaims verbs: - deletecollection - apiGroups: - apps resources: - deployments verbs: - create - delete - get - list - patch - update - watch - apiGroups: - apps resources: - statefulsets verbs: - create - delete - get - list - patch - update - watch - apiGroups: - apps resources: - replicasets verbs: - get - list - update - watch - apiGroups: - batch resources: - jobs verbs: - create - delete - get - list - patch - update - watch --- # Create cdap RoleBinding apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: cdap-rolebinding roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: cdap-role subjects: - kind: ServiceAccount name: cdap

Prepare the secret token for CDAP

We need to set up a secret in Kubernetes to provide the cdap-security.xml file to CDAP, which will contain the PostgreSQL and Elasticsearch password. The following command assumes the database username and password are in the environment variables DB_USER and DB_PASS respectively. For Elasticsearch authentication, it expects that the username and password comes from the ES_USER and ES_PASS environment variables.

# Create the content of the cdap-security.xml export CDAP_SECURITY=$(cat << EOF | base64 | tr -d '\n' <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>data.storage.sql.jdbc.username</name> <value>${DB_USER}</value> </property> <property> <name>data.storage.sql.jdbc.password</name> <value>${DB_PASS}</value> </property> <property> <name>metadata.elasticsearch.credentials.username</name> <value>${ES_USER}</value> </property> <property> <name>metadata.elasticsearch.credentials.password</name> <value>${ES_PASS}</value> </property> </configuration> EOF ) # Create the secret cat << EOF | kubectl apply -f - apiVersion: v1 kind: Secret metadata: name: cdap-security type: Opaque data: cdap-security.xml: $CDAP_SECURITY EOF

Deploy CDAP

Finally we are ready to deploy CDAP into the Kubernetes cluster. The following YAML provides a simple example. You will need to replace the locationURI with an HCFS compatible file system (e.g. HDFS, Google Cloud Storage, or Amazon AWS). Also, the data.storage.sql.jdbc.connection.url should be configured to point to a PostgreSQL database. Refer to cdap-default.xml for an explanation about the configurations.

 

You can also configure each of the CDAP services with different cpu, memory, storage, and environments. The following is a simple example that shows how to change the memory and disk size for the appFabric service.

Refer to the Custom Resource Definition (CRD) for all the supported settings.

You can verify CDAP is running correctly by listing out the pods in the Kubernetes cluster.

After CDAP is fully up and running, both the UI and REST can be accessed via the user-interface and router services exposed by CDAP.

For quick testing, you can use kubectl port-foward to provide access to the CDAP service. For example, you can expose the user interface and then access it through localhost:11011 from the browser.

For production use cases, it is better to expose the CDAP services through a load balancer. Consult with your Kubernetes provider for how to deploy a load balancer.

Enable Authentication Service

To enable the Authentication Service in K8s environment to provide Perimeter Security, extra configurations are needed in the CDAP YAML file.

  1. Set the following configurations in the CDAP YAML file "config:" section.

  2. Add configurations for the the authentication handler based on Configuring Managed Authentication under the "config:" section.

  3. Use the CDAP docker image to generate an "auth.key" file.

  4. Create a k8s secret from the "auth.key" file.

  5. Add the secret to the CDAP YAML file to map the secret into CDAP pods by adding a "secretVolumes" (same level as other options, like "config").

Now, you can start CDAP with security enabled, without needing Zookeeper.

Running CDAP Programs

Starting in CDAP 6.7.0, you can run CDAP programs on Kubernetes using Spark.

Note: MapReduce and Spark Streaming engines are not supported.

To run CDAP programs on Kubernetes, as a prerequisite the following service account and role binding needs to be created as a requirement from Spark.

  • Create service account

kubectl create serviceaccount spark

  • Create role binding

kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default

Verify by running a pipeline

Run a pipeline using CDAP UI:

Limitations

coming soon

Appendix

This section describes how to create the resources required for the CDAP installation using Google Cloud Platform.

Preparation

We will be using the standard bash shell and gcloud command line tool to perform the setup. Install Google Cloud SDK before you proceed.

Set up the following environment variables for using the gcloud command:

Kubernetes

Create a Google Container Engine (GKE) as the Kubernetes cluster. Make sure the GKE API is enabled before executing the following commands.

Postgresql Database

Create a Google Cloud SQL instance to serve as the PostgreSQL database. Make sure the Cloud SQL API is enabled before executing the following commands:

Elasticsearch in Kubernetes

We are using the Elasticsearch Operator to operate an Elasticsearch instance inside the Kubernetes cluster. You can deploy the following YAML to create all the necessary resources to have the operator running in the Kubernetes cluster, inside the elastic-system namespace:

 

After deploying the Elasticsearch operator, you can deploy the following custom resource to start an Elasticsearch instance inside the Kubernetes cluster.

You can validate that the Elasticsearch instance is up and running correctly by observing an Elasticsearch pod is in the RUNNING state.

After the Elasticsearch instance, you need to get the default user password from the secret created by the operator. This password is needed in the cdap-security.xml file for CDAP to authenticate itself to Elasticsearch.

 

Created in 2020 by Google Inc.