Installing CDAP on Kubernetes
Terence Yim
Arjan Bal
Vinisha Shah
CDAP installation on Kubernetes was introduced in CDAP 6.2.3.
This document describes how to install CDAP on a Kubernetes cluster.
Dependencies
This section describes the infrastructure and software dependencies for operating CDAP in Kubernetes.
Kubernetes cluster
CDAP supports using Kubernetes (k8s) as the distributed resource manager. When CDAP is deployed to a k8s cluster, it spawns multiple Deployments and StatefulSets for running various CDAP services. The following diagram shows each of the CDAP services in the Kubernetes cluster:
The CDAP operator is responsible for deploying and managing all the CDAP services inside the cluster. The CDAP operator also supports managing multiple CDAP instances within the same k8s cluster. If multiple CDAP instances are deployed to the same k8s cluster, It is recommended to deploy them to different namespaces to provide better isolation.
Limitations
Currently CDAP only supports running one replica (pod) per service, except for the Preview Runner. Failure resiliency is handled by k8s to have pod restart upon failure. For pods created by StatefulSets, it relies on the infrastructure to have persistent volumes being re-mountable to the new pod, which potentially could be on a different machine.
Another limitation of operating CDAP in Kubernetes is that it does not support native compute profile. This means all user program executions are external to the Kubernetes cluster, and require a Hadoop cluster for program executions.
PostgreSQL database
CDAP needs a shared storage for its own metadata, such as deployed artifacts and applications, run histories, preferences, lineage information, and many more. Currently, CDAP supports both PostgreSQL and HBase as the metadata store. When running CDAP in Kubernetes, we recommend using PostgreSQL.
Elasticsearch
CDAP has support for metadata search, and it is backed by either Elasticsearch or HBase. In the Kubernetes environment, Elasticsearch is recommended. You can either configure CDAP to use an existing Elasticsearch cluster or run an Elasticsearch in Kubernetes by using the Elasticsearch Operator.
Hadoop Compatible File System (HCFS)
CDAP stores artifacts and runtime information through the HDFS API. Any of the HCFS implementations is supported.
Installation
This section describes the steps to deploy CDAP on Kubernetes.
Prerequisites
An operational Kubernetes cluster.
Recommended to have 64 GB of memory resources and 20 available virtual CPU for production deployment.
For better security, the Kubernetes cluster should have RBAC enabled.
Have kubectl set up to connect to the Kubernetes cluster.
A PostgreSQL database that is reachable from the Kubernetes cluster.
An Elasticsearch instance that is reachable from the Kubernetes cluster.
Refer to the Appendix section on how to set up an Elasticsearch instance inside the Kubernetes cluster.
Deploy CDAP Operator
CDAP provides a CDAP operator for easy deployment and management of CDAP in Kubernetes. You can deploy the following YAML to create all the necessary resources to have the operator running in the Kubernetes cluster, inside the cdap-system namespace.
# Create operator namespace
apiVersion: v1
kind: Namespace
metadata:
name: cdap-system
labels:
name: cdap-system
control-plane: cdap-operator
---
# Create operator service account
apiVersion: v1
kind: ServiceAccount
metadata:
name: cdap-operator
namespace: cdap-system
labels:
control-plane: cdap-operator
---
# Source cdap-operator/config/rbac/cdapmaster_editor_role.yaml
# permissions to do edit cdapmasters.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cdapmaster-editor-role
rules:
- apiGroups:
- cdap.cdap.io
resources:
- cdapmasters
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- cdap.cdap.io
resources:
- cdapmasters/status
verbs:
- get
- patch
- update
---
# Source cdap-operator/config/rbac/cdapmaster_viewer_role.yaml
# permissions to do viewer cdapmasters.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cdapmaster-viewer-role
rules:
- apiGroups:
- cdap.cdap.io
resources:
- cdapmasters
verbs:
- get
- list
- watch
- apiGroups:
- cdap.cdap.io
resources:
- cdapmasters/status
verbs:
- get
---
# Source cdap-operator/config/rbac/role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: null
name: cdap-operator-role
rules:
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- ""
resources:
- services
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- apps
resources:
- deployments
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- apps
resources:
- deployments/status
verbs:
- get
- patch
- update
- apiGroups:
- apps
resources:
- statefulsets
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- batch
resources:
- jobs
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- cdap.cdap.io
resources:
- cdapmasters
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- cdap.cdap.io
resources:
- cdapmasters/status
verbs:
- get
- patch
- update
---
# Source cdap-operator/config/rbac/role_binding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cdap-operator-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cdap-operator-role
subjects:
- kind: ServiceAccount
name: cdap-operator
namespace: cdap-system
---
# Source cdap-operator/config/crd/bases/cdap.cdap.io_cdapmasters.yaml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.2.4
creationTimestamp: null
name: cdapmasters.cdap.cdap.io
spec:
group: cdap.cdap.io
names:
kind: CDAPMaster
listKind: CDAPMasterList
plural: cdapmasters
singular: cdapmaster
scope: Namespaced
validation:
openAPIV3Schema:
description: CDAPMaster is the Schema for the cdapmasters API
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: "CDAPMasterSpec defines the desired state of CDAPMaster \n
Important notes: * The field name of each service MUST match the constant
values of ServiceName in constants.go as reflection is used to find
field value. * For services that are optional (i.e. may or may not be
required for CDAP to be operational), their service specification fields
are pointers. By default, these optional services are disabled. Set to
non-nil to enable them."
properties:
appFabric:
description: AppFabric is specification for the CDAP app-fabric service.
properties:
env:
description: Env is a list of environment variables for the master
service container.
items:
description: EnvVar represents an environment variable present
in a Container.
properties:
name:
description: Name of the environment variable. Must be a C_IDENTIFIER.
type: string
value:
description: 'Variable references $(VAR_NAME) are expanded
using the previous defined environment variables in the
container and any service environment variables. If a variable
cannot be resolved, the reference in the input string will
be unchanged. The $(VAR_NAME) syntax can be escaped with
a double $$, ie: $$(VAR_NAME). Escaped references will never
be expanded, regardless of whether the variable exists or
not. Defaults to "".'
type: string
valueFrom:
description: Source for the environment variable's value.
Cannot be used if value is not empty.
properties:
configMapKeyRef:
description: Selects a key of a ConfigMap.
properties:
key:
description: The key to select.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the ConfigMap or its
key must be defined
type: boolean
required:
- key
type: object
fieldRef:
description: 'Selects a field of the pod: supports metadata.name,
metadata.namespace, metadata.labels, metadata.annotations,
spec.nodeName, spec.serviceAccountName, status.hostIP,
status.podIP, status.podIPs.'
properties:
apiVersion:
description: Version of the schema the FieldPath is
written in terms of, defaults to "v1".
type: string
fieldPath:
description: Path of the field to select in the specified
API version.
type: string
required:
- fieldPath
type: object
resourceFieldRef:
description: 'Selects a resource of the container: only
resources limits and requests (limits.cpu, limits.memory,
limits.ephemeral-storage, requests.cpu, requests.memory
and requests.ephemeral-storage) are currently supported.'
properties:
containerName:
description: 'Container name: required for volumes,
optional for env vars'
type: string
divisor:
description: Specifies the output format of the exposed
resources, defaults to "1"
type: string
resource:
description: 'Required: resource to select'
type: string
required:
- resource
type: object
secretKeyRef:
description: Selects a key of a secret in the pod's namespace
properties:
key:
description: The key of the secret to select from. Must
be a valid secret key.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the Secret or its key
must be defined
type: boolean
required:
- key
type: object
type: object
required:
- name
type: object
type: array
metadata:
description: Metadata for the service.
type: object
nodeSelector:
additionalProperties:
type: string
description: NodeSelector is a selector which must be true for the
pod to fit on a node.
type: object
priorityClassName:
description: PriorityClassName is to specify the priority of the
pods for this service.
type: string
resources:
description: Resources are Compute resources required by the service.
properties:
limits:
additionalProperties:
type: string
description: 'Limits describes the maximum amount of compute
resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
requests:
additionalProperties:
type: string
description: 'Requests describes the minimum amount of compute
resources required. If Requests is omitted for a container,
it defaults to Limits if that is explicitly specified, otherwise
to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
type: object
runtimeClassName:
description: RuntimeClassName refers to a RuntimeClass object in
the node.k8s.io group, which should be used to run pods for this
service. If no RuntimeClass resource matches the named class,
pods will not be running.
type: string
serviceAccountName:
description: ServiceAccountName overrides the service account for
the service pods.
type: string
storageClassName:
description: StorageClassName is the name of the StorageClass for
the persistent volume used by the service.
type: string
storageSize:
description: StorageSize is specification for the persistent volume
size used by the service.
type: string
type: object
config:
additionalProperties:
type: string
description: Config is a set of configurations that goes into cdap-site.xml.
type: object
configMapVolumes:
additionalProperties:
type: string
description: ConfigMapVolumes defines a map from ConfigMap names to
volume mount path. Key is the configmap object name. Value is the
mount path. This adds ConfigMap data to the directory specified by
the volume mount path.
type: object
image:
description: Image is the docker image name for the CDAP backend.
type: string
imagePullPolicy:
description: ImagePullPolicy is the policy for pulling docker images
on Pod creation.
type: string
locationURI:
description: LocationURI is an URI specifying an object storage for
CDAP.
type: string
logLevels:
additionalProperties:
type: string
description: LogLevels is a set of logger name to log level settings.
type: object
logs:
description: Logs is specification for the CDAP logging service.
properties:
env:
description: Env is a list of environment variables for the master
service container.
items:
description: EnvVar represents an environment variable present
in a Container.
properties:
name:
description: Name of the environment variable. Must be a C_IDENTIFIER.
type: string
value:
description: 'Variable references $(VAR_NAME) are expanded
using the previous defined environment variables in the
container and any service environment variables. If a variable
cannot be resolved, the reference in the input string will
be unchanged. The $(VAR_NAME) syntax can be escaped with
a double $$, ie: $$(VAR_NAME). Escaped references will never
be expanded, regardless of whether the variable exists or
not. Defaults to "".'
type: string
valueFrom:
description: Source for the environment variable's value.
Cannot be used if value is not empty.
properties:
configMapKeyRef:
description: Selects a key of a ConfigMap.
properties:
key:
description: The key to select.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the ConfigMap or its
key must be defined
type: boolean
required:
- key
type: object
fieldRef:
description: 'Selects a field of the pod: supports metadata.name,
metadata.namespace, metadata.labels, metadata.annotations,
spec.nodeName, spec.serviceAccountName, status.hostIP,
status.podIP, status.podIPs.'
properties:
apiVersion:
description: Version of the schema the FieldPath is
written in terms of, defaults to "v1".
type: string
fieldPath:
description: Path of the field to select in the specified
API version.
type: string
required:
- fieldPath
type: object
resourceFieldRef:
description: 'Selects a resource of the container: only
resources limits and requests (limits.cpu, limits.memory,
limits.ephemeral-storage, requests.cpu, requests.memory
and requests.ephemeral-storage) are currently supported.'
properties:
containerName:
description: 'Container name: required for volumes,
optional for env vars'
type: string
divisor:
description: Specifies the output format of the exposed
resources, defaults to "1"
type: string
resource:
description: 'Required: resource to select'
type: string
required:
- resource
type: object
secretKeyRef:
description: Selects a key of a secret in the pod's namespace
properties:
key:
description: The key of the secret to select from. Must
be a valid secret key.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the Secret or its key
must be defined
type: boolean
required:
- key
type: object
type: object
required:
- name
type: object
type: array
metadata:
description: Metadata for the service.
type: object
nodeSelector:
additionalProperties:
type: string
description: NodeSelector is a selector which must be true for the
pod to fit on a node.
type: object
priorityClassName:
description: PriorityClassName is to specify the priority of the
pods for this service.
type: string
resources:
description: Resources are Compute resources required by the service.
properties:
limits:
additionalProperties:
type: string
description: 'Limits describes the maximum amount of compute
resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
requests:
additionalProperties:
type: string
description: 'Requests describes the minimum amount of compute
resources required. If Requests is omitted for a container,
it defaults to Limits if that is explicitly specified, otherwise
to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
type: object
runtimeClassName:
description: RuntimeClassName refers to a RuntimeClass object in
the node.k8s.io group, which should be used to run pods for this
service. If no RuntimeClass resource matches the named class,
pods will not be running.
type: string
serviceAccountName:
description: ServiceAccountName overrides the service account for
the service pods.
type: string
storageClassName:
description: StorageClassName is the name of the StorageClass for
the persistent volume used by the service.
type: string
storageSize:
description: StorageSize is specification for the persistent volume
size used by the service.
type: string
type: object
messaging:
description: Messaging is specification for the CDAP messaging service.
properties:
env:
description: Env is a list of environment variables for the master
service container.
items:
description: EnvVar represents an environment variable present
in a Container.
properties:
name:
description: Name of the environment variable. Must be a C_IDENTIFIER.
type: string
value:
description: 'Variable references $(VAR_NAME) are expanded
using the previous defined environment variables in the
container and any service environment variables. If a variable
cannot be resolved, the reference in the input string will
be unchanged. The $(VAR_NAME) syntax can be escaped with
a double $$, ie: $$(VAR_NAME). Escaped references will never
be expanded, regardless of whether the variable exists or
not. Defaults to "".'
type: string
valueFrom:
description: Source for the environment variable's value.
Cannot be used if value is not empty.
properties:
configMapKeyRef:
description: Selects a key of a ConfigMap.
properties:
key:
description: The key to select.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the ConfigMap or its
key must be defined
type: boolean
required:
- key
type: object
fieldRef:
description: 'Selects a field of the pod: supports metadata.name,
metadata.namespace, metadata.labels, metadata.annotations,
spec.nodeName, spec.serviceAccountName, status.hostIP,
status.podIP, status.podIPs.'
properties:
apiVersion:
description: Version of the schema the FieldPath is
written in terms of, defaults to "v1".
type: string
fieldPath:
description: Path of the field to select in the specified
API version.
type: string
required:
- fieldPath
type: object
resourceFieldRef:
description: 'Selects a resource of the container: only
resources limits and requests (limits.cpu, limits.memory,
limits.ephemeral-storage, requests.cpu, requests.memory
and requests.ephemeral-storage) are currently supported.'
properties:
containerName:
description: 'Container name: required for volumes,
optional for env vars'
type: string
divisor:
description: Specifies the output format of the exposed
resources, defaults to "1"
type: string
resource:
description: 'Required: resource to select'
type: string
required:
- resource
type: object
secretKeyRef:
description: Selects a key of a secret in the pod's namespace
properties:
key:
description: The key of the secret to select from. Must
be a valid secret key.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the Secret or its key
must be defined
type: boolean
required:
- key
type: object
type: object
required:
- name
type: object
type: array
metadata:
description: Metadata for the service.
type: object
nodeSelector:
additionalProperties:
type: string
description: NodeSelector is a selector which must be true for the
pod to fit on a node.
type: object
priorityClassName:
description: PriorityClassName is to specify the priority of the
pods for this service.
type: string
resources:
description: Resources are Compute resources required by the service.
properties:
limits:
additionalProperties:
type: string
description: 'Limits describes the maximum amount of compute
resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
requests:
additionalProperties:
type: string
description: 'Requests describes the minimum amount of compute
resources required. If Requests is omitted for a container,
it defaults to Limits if that is explicitly specified, otherwise
to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
type: object
runtimeClassName:
description: RuntimeClassName refers to a RuntimeClass object in
the node.k8s.io group, which should be used to run pods for this
service. If no RuntimeClass resource matches the named class,
pods will not be running.
type: string
serviceAccountName:
description: ServiceAccountName overrides the service account for
the service pods.
type: string
storageClassName:
description: StorageClassName is the name of the StorageClass for
the persistent volume used by the service.
type: string
storageSize:
description: StorageSize is specification for the persistent volume
size used by the service.
type: string
type: object
metadata:
description: Metadata is specification for the CDAP metadata service.
properties:
env:
description: Env is a list of environment variables for the master
service container.
items:
description: EnvVar represents an environment variable present
in a Container.
properties:
name:
description: Name of the environment variable. Must be a C_IDENTIFIER.
type: string
value:
description: 'Variable references $(VAR_NAME) are expanded
using the previous defined environment variables in the
container and any service environment variables. If a variable
cannot be resolved, the reference in the input string will
be unchanged. The $(VAR_NAME) syntax can be escaped with
a double $$, ie: $$(VAR_NAME). Escaped references will never
be expanded, regardless of whether the variable exists or
not. Defaults to "".'
type: string
valueFrom:
description: Source for the environment variable's value.
Cannot be used if value is not empty.
properties:
configMapKeyRef:
description: Selects a key of a ConfigMap.
properties:
key:
description: The key to select.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the ConfigMap or its
key must be defined
type: boolean
required:
- key
type: object
fieldRef:
description: 'Selects a field of the pod: supports metadata.name,
metadata.namespace, metadata.labels, metadata.annotations,
spec.nodeName, spec.serviceAccountName, status.hostIP,
status.podIP, status.podIPs.'
properties:
apiVersion:
description: Version of the schema the FieldPath is
written in terms of, defaults to "v1".
type: string
fieldPath:
description: Path of the field to select in the specified
API version.
type: string
required:
- fieldPath
type: object
resourceFieldRef:
description: 'Selects a resource of the container: only
resources limits and requests (limits.cpu, limits.memory,
limits.ephemeral-storage, requests.cpu, requests.memory
and requests.ephemeral-storage) are currently supported.'
properties:
containerName:
description: 'Container name: required for volumes,
optional for env vars'
type: string
divisor:
description: Specifies the output format of the exposed
resources, defaults to "1"
type: string
resource:
description: 'Required: resource to select'
type: string
required:
- resource
type: object
secretKeyRef:
description: Selects a key of a secret in the pod's namespace
properties:
key:
description: The key of the secret to select from. Must
be a valid secret key.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the Secret or its key
must be defined
type: boolean
required:
- key
type: object
type: object
required:
- name
type: object
type: array
metadata:
description: Metadata for the service.
type: object
nodeSelector:
additionalProperties:
type: string
description: NodeSelector is a selector which must be true for the
pod to fit on a node.
type: object
priorityClassName:
description: PriorityClassName is to specify the priority of the
pods for this service.
type: string
resources:
description: Resources are Compute resources required by the service.
properties:
limits:
additionalProperties:
type: string
description: 'Limits describes the maximum amount of compute
resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
requests:
additionalProperties:
type: string
description: 'Requests describes the minimum amount of compute
resources required. If Requests is omitted for a container,
it defaults to Limits if that is explicitly specified, otherwise
to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
type: object
runtimeClassName:
description: RuntimeClassName refers to a RuntimeClass object in
the node.k8s.io group, which should be used to run pods for this
service. If no RuntimeClass resource matches the named class,
pods will not be running.
type: string
serviceAccountName:
description: ServiceAccountName overrides the service account for
the service pods.
type: string
storageClassName:
description: StorageClassName is the name of the StorageClass for
the persistent volume used by the service.
type: string
storageSize:
description: StorageSize is specification for the persistent volume
size used by the service.
type: string
type: object
metrics:
description: Metrics is specification for the CDAP metrics service.
properties:
env:
description: Env is a list of environment variables for the master
service container.
items:
description: EnvVar represents an environment variable present
in a Container.
properties:
name:
description: Name of the environment variable. Must be a C_IDENTIFIER.
type: string
value:
description: 'Variable references $(VAR_NAME) are expanded
using the previous defined environment variables in the
container and any service environment variables. If a variable
cannot be resolved, the reference in the input string will
be unchanged. The $(VAR_NAME) syntax can be escaped with
a double $$, ie: $$(VAR_NAME). Escaped references will never
be expanded, regardless of whether the variable exists or
not. Defaults to "".'
type: string
valueFrom:
description: Source for the environment variable's value.
Cannot be used if value is not empty.
properties:
configMapKeyRef:
description: Selects a key of a ConfigMap.
properties:
key:
description: The key to select.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the ConfigMap or its
key must be defined
type: boolean
required:
- key
type: object
fieldRef:
description: 'Selects a field of the pod: supports metadata.name,
metadata.namespace, metadata.labels, metadata.annotations,
spec.nodeName, spec.serviceAccountName, status.hostIP,
status.podIP, status.podIPs.'
properties:
apiVersion:
description: Version of the schema the FieldPath is
written in terms of, defaults to "v1".
type: string
fieldPath:
description: Path of the field to select in the specified
API version.
type: string
required:
- fieldPath
type: object
resourceFieldRef:
description: 'Selects a resource of the container: only
resources limits and requests (limits.cpu, limits.memory,
limits.ephemeral-storage, requests.cpu, requests.memory
and requests.ephemeral-storage) are currently supported.'
properties:
containerName:
description: 'Container name: required for volumes,
optional for env vars'
type: string
divisor:
description: Specifies the output format of the exposed
resources, defaults to "1"
type: string
resource:
description: 'Required: resource to select'
type: string
required:
- resource
type: object
secretKeyRef:
description: Selects a key of a secret in the pod's namespace
properties:
key:
description: The key of the secret to select from. Must
be a valid secret key.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the Secret or its key
must be defined
type: boolean
required:
- key
type: object
type: object
required:
- name
type: object
type: array
metadata:
description: Metadata for the service.
type: object
nodeSelector:
additionalProperties:
type: string
description: NodeSelector is a selector which must be true for the
pod to fit on a node.
type: object
priorityClassName:
description: PriorityClassName is to specify the priority of the
pods for this service.
type: string
resources:
description: Resources are Compute resources required by the service.
properties:
limits:
additionalProperties:
type: string
description: 'Limits describes the maximum amount of compute
resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
requests:
additionalProperties:
type: string
description: 'Requests describes the minimum amount of compute
resources required. If Requests is omitted for a container,
it defaults to Limits if that is explicitly specified, otherwise
to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
type: object
runtimeClassName:
description: RuntimeClassName refers to a RuntimeClass object in
the node.k8s.io group, which should be used to run pods for this
service. If no RuntimeClass resource matches the named class,
pods will not be running.
type: string
serviceAccountName:
description: ServiceAccountName overrides the service account for
the service pods.
type: string
storageClassName:
description: StorageClassName is the name of the StorageClass for
the persistent volume used by the service.
type: string
storageSize:
description: StorageSize is specification for the persistent volume
size used by the service.
type: string
type: object
preview:
description: Preview is specification for the CDAP preview service.
properties:
env:
description: Env is a list of environment variables for the master
service container.
items:
description: EnvVar represents an environment variable present
in a Container.
properties:
name:
description: Name of the environment variable. Must be a C_IDENTIFIER.
type: string
value:
description: 'Variable references $(VAR_NAME) are expanded
using the previous defined environment variables in the
container and any service environment variables. If a variable
cannot be resolved, the reference in the input string will
be unchanged. The $(VAR_NAME) syntax can be escaped with
a double $$, ie: $$(VAR_NAME). Escaped references will never
be expanded, regardless of whether the variable exists or
not. Defaults to "".'
type: string
valueFrom:
description: Source for the environment variable's value.
Cannot be used if value is not empty.
properties:
configMapKeyRef:
description: Selects a key of a ConfigMap.
properties:
key:
description: The key to select.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the ConfigMap or its
key must be defined
type: boolean
required:
- key
type: object
fieldRef:
description: 'Selects a field of the pod: supports metadata.name,
metadata.namespace, metadata.labels, metadata.annotations,
spec.nodeName, spec.serviceAccountName, status.hostIP,
status.podIP, status.podIPs.'
properties:
apiVersion:
description: Version of the schema the FieldPath is
written in terms of, defaults to "v1".
type: string
fieldPath:
description: Path of the field to select in the specified
API version.
type: string
required:
- fieldPath
type: object
resourceFieldRef:
description: 'Selects a resource of the container: only
resources limits and requests (limits.cpu, limits.memory,
limits.ephemeral-storage, requests.cpu, requests.memory
and requests.ephemeral-storage) are currently supported.'
properties:
containerName:
description: 'Container name: required for volumes,
optional for env vars'
type: string
divisor:
description: Specifies the output format of the exposed
resources, defaults to "1"
type: string
resource:
description: 'Required: resource to select'
type: string
required:
- resource
type: object
secretKeyRef:
description: Selects a key of a secret in the pod's namespace
properties:
key:
description: The key of the secret to select from. Must
be a valid secret key.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the Secret or its key
must be defined
type: boolean
required:
- key
type: object
type: object
required:
- name
type: object
type: array
metadata:
description: Metadata for the service.
type: object
nodeSelector:
additionalProperties:
type: string
description: NodeSelector is a selector which must be true for the
pod to fit on a node.
type: object
priorityClassName:
description: PriorityClassName is to specify the priority of the
pods for this service.
type: string
resources:
description: Resources are Compute resources required by the service.
properties:
limits:
additionalProperties:
type: string
description: 'Limits describes the maximum amount of compute
resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
requests:
additionalProperties:
type: string
description: 'Requests describes the minimum amount of compute
resources required. If Requests is omitted for a container,
it defaults to Limits if that is explicitly specified, otherwise
to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
type: object
runtimeClassName:
description: RuntimeClassName refers to a RuntimeClass object in
the node.k8s.io group, which should be used to run pods for this
service. If no RuntimeClass resource matches the named class,
pods will not be running.
type: string
serviceAccountName:
description: ServiceAccountName overrides the service account for
the service pods.
type: string
storageClassName:
description: StorageClassName is the name of the StorageClass for
the persistent volume used by the service.
type: string
storageSize:
description: StorageSize is specification for the persistent volume
size used by the service.
type: string
type: object
router:
description: Router is specification for the CDAP router service.
properties:
env:
description: Env is a list of environment variables for the master
service container.
items:
description: EnvVar represents an environment variable present
in a Container.
properties:
name:
description: Name of the environment variable. Must be a C_IDENTIFIER.
type: string
value:
description: 'Variable references $(VAR_NAME) are expanded
using the previous defined environment variables in the
container and any service environment variables. If a variable
cannot be resolved, the reference in the input string will
be unchanged. The $(VAR_NAME) syntax can be escaped with
a double $$, ie: $$(VAR_NAME). Escaped references will never
be expanded, regardless of whether the variable exists or
not. Defaults to "".'
type: string
valueFrom:
description: Source for the environment variable's value.
Cannot be used if value is not empty.
properties:
configMapKeyRef:
description: Selects a key of a ConfigMap.
properties:
key:
description: The key to select.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the ConfigMap or its
key must be defined
type: boolean
required:
- key
type: object
fieldRef:
description: 'Selects a field of the pod: supports metadata.name,
metadata.namespace, metadata.labels, metadata.annotations,
spec.nodeName, spec.serviceAccountName, status.hostIP,
status.podIP, status.podIPs.'
properties:
apiVersion:
description: Version of the schema the FieldPath is
written in terms of, defaults to "v1".
type: string
fieldPath:
description: Path of the field to select in the specified
API version.
type: string
required:
- fieldPath
type: object
resourceFieldRef:
description: 'Selects a resource of the container: only
resources limits and requests (limits.cpu, limits.memory,
limits.ephemeral-storage, requests.cpu, requests.memory
and requests.ephemeral-storage) are currently supported.'
properties:
containerName:
description: 'Container name: required for volumes,
optional for env vars'
type: string
divisor:
description: Specifies the output format of the exposed
resources, defaults to "1"
type: string
resource:
description: 'Required: resource to select'
type: string
required:
- resource
type: object
secretKeyRef:
description: Selects a key of a secret in the pod's namespace
properties:
key:
description: The key of the secret to select from. Must
be a valid secret key.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the Secret or its key
must be defined
type: boolean
required:
- key
type: object
type: object
required:
- name
type: object
type: array
metadata:
description: Metadata for the service.
type: object
nodeSelector:
additionalProperties:
type: string
description: NodeSelector is a selector which must be true for the
pod to fit on a node.
type: object
priorityClassName:
description: PriorityClassName is to specify the priority of the
pods for this service.
type: string
replicas:
description: Replicas is number of replicas for the service.
format: int32
type: integer
resources:
description: Resources are Compute resources required by the service.
properties:
limits:
additionalProperties:
type: string
description: 'Limits describes the maximum amount of compute
resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
requests:
additionalProperties:
type: string
description: 'Requests describes the minimum amount of compute
resources required. If Requests is omitted for a container,
it defaults to Limits if that is explicitly specified, otherwise
to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
type: object
runtimeClassName:
description: RuntimeClassName refers to a RuntimeClass object in
the node.k8s.io group, which should be used to run pods for this
service. If no RuntimeClass resource matches the named class,
pods will not be running.
type: string
serviceAccountName:
description: ServiceAccountName overrides the service account for
the service pods.
type: string
servicePort:
description: ServicePort is the port number for the service.
format: int32
type: integer
serviceType:
description: ServiceType is the service type in kubernetes, default
is NodePort.
type: string
type: object
runtime:
description: 'Runtime is specification for the CDAP runtime service.
This is an optional service and may not be required for CDAP to be
operational. To disable this service: either omit or set the field
to nil To enable this service: set it to a pointer to a RuntimeSpec
struct (can be an empty struct)'
properties:
env:
description: Env is a list of environment variables for the master
service container.
items:
description: EnvVar represents an environment variable present
in a Container.
properties:
name:
description: Name of the environment variable. Must be a C_IDENTIFIER.
type: string
value:
description: 'Variable references $(VAR_NAME) are expanded
using the previous defined environment variables in the
container and any service environment variables. If a variable
cannot be resolved, the reference in the input string will
be unchanged. The $(VAR_NAME) syntax can be escaped with
a double $$, ie: $$(VAR_NAME). Escaped references will never
be expanded, regardless of whether the variable exists or
not. Defaults to "".'
type: string
valueFrom:
description: Source for the environment variable's value.
Cannot be used if value is not empty.
properties:
configMapKeyRef:
description: Selects a key of a ConfigMap.
properties:
key:
description: The key to select.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the ConfigMap or its
key must be defined
type: boolean
required:
- key
type: object
fieldRef:
description: 'Selects a field of the pod: supports metadata.name,
metadata.namespace, metadata.labels, metadata.annotations,
spec.nodeName, spec.serviceAccountName, status.hostIP,
status.podIP, status.podIPs.'
properties:
apiVersion:
description: Version of the schema the FieldPath is
written in terms of, defaults to "v1".
type: string
fieldPath:
description: Path of the field to select in the specified
API version.
type: string
required:
- fieldPath
type: object
resourceFieldRef:
description: 'Selects a resource of the container: only
resources limits and requests (limits.cpu, limits.memory,
limits.ephemeral-storage, requests.cpu, requests.memory
and requests.ephemeral-storage) are currently supported.'
properties:
containerName:
description: 'Container name: required for volumes,
optional for env vars'
type: string
divisor:
description: Specifies the output format of the exposed
resources, defaults to "1"
type: string
resource:
description: 'Required: resource to select'
type: string
required:
- resource
type: object
secretKeyRef:
description: Selects a key of a secret in the pod's namespace
properties:
key:
description: The key of the secret to select from. Must
be a valid secret key.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the Secret or its key
must be defined
type: boolean
required:
- key
type: object
type: object
required:
- name
type: object
type: array
metadata:
description: Metadata for the service.
type: object
nodeSelector:
additionalProperties:
type: string
description: NodeSelector is a selector which must be true for the
pod to fit on a node.
type: object
priorityClassName:
description: PriorityClassName is to specify the priority of the
pods for this service.
type: string
resources:
description: Resources are Compute resources required by the service.
properties:
limits:
additionalProperties:
type: string
description: 'Limits describes the maximum amount of compute
resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
requests:
additionalProperties:
type: string
description: 'Requests describes the minimum amount of compute
resources required. If Requests is omitted for a container,
it defaults to Limits if that is explicitly specified, otherwise
to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
type: object
runtimeClassName:
description: RuntimeClassName refers to a RuntimeClass object in
the node.k8s.io group, which should be used to run pods for this
service. If no RuntimeClass resource matches the named class,
pods will not be running.
type: string
serviceAccountName:
description: ServiceAccountName overrides the service account for
the service pods.
type: string
storageClassName:
description: StorageClassName is the name of the StorageClass for
the persistent volume used by the service.
type: string
storageSize:
description: StorageSize is specification for the persistent volume
size used by the service.
type: string
type: object
securitySecret:
description: SecuritySecret is secret that contains security related
configurations for CDAP.
type: string
serviceAccountName:
description: ServiceAccountName is the service account for all the service
pods.
type: string
systemappconfigs:
additionalProperties:
type: string
description: SystemAppConfigs specifies configs used by CDAP to run
system apps dynamically. Each entry is of format <filename, json app
config> which will create a separate system config file with entry
value as file content.
type: object
userInterface:
description: UserInterface is specification for the CDAP UI service.
properties:
env:
description: Env is a list of environment variables for the master
service container.
items:
description: EnvVar represents an environment variable present
in a Container.
properties:
name:
description: Name of the environment variable. Must be a C_IDENTIFIER.
type: string
value:
description: 'Variable references $(VAR_NAME) are expanded
using the previous defined environment variables in the
container and any service environment variables. If a variable
cannot be resolved, the reference in the input string will
be unchanged. The $(VAR_NAME) syntax can be escaped with
a double $$, ie: $$(VAR_NAME). Escaped references will never
be expanded, regardless of whether the variable exists or
not. Defaults to "".'
type: string
valueFrom:
description: Source for the environment variable's value.
Cannot be used if value is not empty.
properties:
configMapKeyRef:
description: Selects a key of a ConfigMap.
properties:
key:
description: The key to select.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the ConfigMap or its
key must be defined
type: boolean
required:
- key
type: object
fieldRef:
description: 'Selects a field of the pod: supports metadata.name,
metadata.namespace, metadata.labels, metadata.annotations,
spec.nodeName, spec.serviceAccountName, status.hostIP,
status.podIP, status.podIPs.'
properties:
apiVersion:
description: Version of the schema the FieldPath is
written in terms of, defaults to "v1".
type: string
fieldPath:
description: Path of the field to select in the specified
API version.
type: string
required:
- fieldPath
type: object
resourceFieldRef:
description: 'Selects a resource of the container: only
resources limits and requests (limits.cpu, limits.memory,
limits.ephemeral-storage, requests.cpu, requests.memory
and requests.ephemeral-storage) are currently supported.'
properties:
containerName:
description: 'Container name: required for volumes,
optional for env vars'
type: string
divisor:
description: Specifies the output format of the exposed
resources, defaults to "1"
type: string
resource:
description: 'Required: resource to select'
type: string
required:
- resource
type: object
secretKeyRef:
description: Selects a key of a secret in the pod's namespace
properties:
key:
description: The key of the secret to select from. Must
be a valid secret key.
type: string
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind,
uid?'
type: string
optional:
description: Specify whether the Secret or its key
must be defined
type: boolean
required:
- key
type: object
type: object
required:
- name
type: object
type: array
metadata:
description: Metadata for the service.
type: object
nodeSelector:
additionalProperties:
type: string
description: NodeSelector is a selector which must be true for the
pod to fit on a node.
type: object
priorityClassName:
description: PriorityClassName is to specify the priority of the
pods for this service.
type: string
replicas:
description: Replicas is number of replicas for the service.
format: int32
type: integer
resources:
description: Resources are Compute resources required by the service.
properties:
limits:
additionalProperties:
type: string
description: 'Limits describes the maximum amount of compute
resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
requests:
additionalProperties:
type: string
description: 'Requests describes the minimum amount of compute
resources required. If Requests is omitted for a container,
it defaults to Limits if that is explicitly specified, otherwise
to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
type: object
runtimeClassName:
description: RuntimeClassName refers to a RuntimeClass object in
the node.k8s.io group, which should be used to run pods for this
service. If no RuntimeClass resource matches the named class,
pods will not be running.
type: string
serviceAccountName:
description: ServiceAccountName overrides the service account for
the service pods.
type: string
servicePort:
description: ServicePort is the port number for the service.
format: int32
type: integer
serviceType:
description: ServiceType is the service type in kubernetes, default
is NodePort.
type: string
type: object
userInterfaceImage:
description: UserInterfaceImage is the docker image name for the CDAP
UI.
type: string
required:
- locationURI
type: object
status:
description: CDAPMasterStatus defines the observed state of CDAPMaster
properties:
components:
description: Object status array for all matching objects
items:
description: ObjectStatus is a generic status holder for objects
properties:
group:
description: Object group
type: string
kind:
description: Kind of object
type: string
link:
description: Link to object
type: string
name:
description: Name of object
type: string
pdb:
description: PDB status
properties:
currenthealthy:
description: currentHealthy
format: int32
type: integer
desiredhealthy:
description: desiredHealthy
format: int32
type: integer
required:
- currenthealthy
- desiredhealthy
type: object
status:
description: 'Status. Values: InProgress, Ready, Unknown'
type: string
sts:
description: StatefulSet status
properties:
currentcount:
description: CurrentReplicas defines the no of MySQL instances
that are created
format: int32
type: integer
progress:
description: 'progress is a fuzzy indicator. Interpret as
a percentage (0-100) eg: for statefulsets, progress = 100*readyreplicas/replicas'
format: int32
type: integer
readycount:
description: ReadyReplicas defines the no of MySQL instances
that are ready
format: int32
type: integer
replicas:
description: Replicas defines the no of MySQL instances desired
format: int32
type: integer
required:
- currentcount
- progress
- readycount
- replicas
type: object
type: object
type: array
conditions:
description: Conditions represents the latest state of the object
items:
description: Condition describes the state of an object at a certain
point.
properties:
lastTransitionTime:
description: Last time the condition transitioned from one status
to another.
format: date-time
type: string
lastUpdateTime:
description: Last time the condition was probed
format: date-time
type: string
message:
description: A human readable message indicating details about
the transition.
type: string
reason:
description: The reason for the condition's last transition.
type: string
status:
description: Status of the condition, one of True, False, Unknown.
type: string
type:
description: Type of condition.
type: string
required:
- status
- type
type: object
type: array
downgradeStartTimeMillis:
description: DowngradeStartTimeMillis is the start time in milliseconds
of the downgrade process
format: int64
type: integer
imageToUse:
description: ImageToUse is the Docker image of CDAP backend the operator
uses to deploy.
type: string
observedGeneration:
description: ObservedGeneration is the most recent generation observed.
It corresponds to the Object's generation, which is updated on mutation
by the API Server.
format: int64
type: integer
upgradeStartTimeMillis:
description: UpgradeStartTimeMillis is the start time in milliseconds
of the upgrade process
format: int64
type: integer
userInterfaceImageToUse:
description: UserInterfaceImageToUse is the Docker image of CDAP UI
the operator uses to deploy.
type: string
type: object
type: object
version: v1alpha1
versions:
- name: v1alpha1
served: true
storage: true
status:
acceptedNames:
kind: ""
plural: ""
conditions: []
storedVersions: []
---
# StatefulSet for running the cdap controller
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: cdap-controller
namespace: cdap-system
labels:
control-plane: cdap-operator
spec:
selector:
matchLabels:
control-plane: cdap-operator
serviceName: cdap-operator-service
template:
metadata:
labels:
control-plane: cdap-operator
spec:
serviceAccountName: cdap-operator
containers:
- command:
- /manager
image: gcr.io/cdapio/cdap-controller:latest
name: manager
resources:
limits:
cpu: 100m
memory: 30Mi
requests:
cpu: 100m
memory: 20Mi
terminationGracePeriodSeconds: 10
Create RBAC Roles and RoleBinding
CDAP interacts with Kubernetes for configuration, service discovery, and also workload management. Deploying the following YAML file will create the necessary set of RBAC Roles and RoleBinding to the service account called cdap.
# Create cdap service account
apiVersion: v1
kind: ServiceAccount
metadata:
name: cdap
---
# Create cdap role
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: cdap-role
rules:
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- ""
resources:
- secrets
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- ""
resources:
- pods
verbs:
- create
- get
- list
- watch
- delete
- deletecollection
- apiGroups:
- ""
resources:
- services
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- ""
resources:
- persistentvolumeclaims
verbs:
- deletecollection
- apiGroups:
- apps
resources:
- deployments
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- apps
resources:
- statefulsets
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- apps
resources:
- replicasets
verbs:
- get
- list
- update
- watch
- apiGroups:
- batch
resources:
- jobs
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
---
# Create cdap RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: cdap-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: cdap-role
subjects:
- kind: ServiceAccount
name: cdap
Prepare the secret token for CDAP
We need to set up a secret in Kubernetes to provide the cdap-security.xml file to CDAP, which will contain the PostgreSQL and Elasticsearch password. The following command assumes the database username and password are in the environment variables DB_USER and DB_PASS respectively. For Elasticsearch authentication, it expects that the username and password comes from the ES_USER and ES_PASS environment variables.
# Create the content of the cdap-security.xml
export CDAP_SECURITY=$(cat << EOF | base64 | tr -d '\n'
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>data.storage.sql.jdbc.username</name>
<value>${DB_USER}</value>
</property>
<property>
<name>data.storage.sql.jdbc.password</name>
<value>${DB_PASS}</value>
</property>
<property>
<name>metadata.elasticsearch.credentials.username</name>
<value>${ES_USER}</value>
</property>
<property>
<name>metadata.elasticsearch.credentials.password</name>
<value>${ES_PASS}</value>
</property>
</configuration>
EOF
)
# Create the secret
cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: cdap-security
type: Opaque
data:
cdap-security.xml: $CDAP_SECURITY
EOF
Deploy CDAP
Finally we are ready to deploy CDAP into the Kubernetes cluster. The following YAML provides a simple example. You will need to replace the locationURI
with an HCFS compatible file system (e.g. HDFS, Google Cloud Storage, or Amazon AWS). Also, the data.storage.sql.jdbc.connection.url
should be configured to point to a PostgreSQL database. Refer to cdap-default.xml for an explanation about the configurations.
You can also configure each of the CDAP services with different cpu, memory, storage, and environments. The following is a simple example that shows how to change the memory and disk size for the appFabric service.
Refer to the Custom Resource Definition (CRD) for all the supported settings.
You can verify CDAP is running correctly by listing out the pods in the Kubernetes cluster.
After CDAP is fully up and running, both the UI and REST can be accessed via the user-interface and router services exposed by CDAP.
For quick testing, you can use kubectl port-foward to provide access to the CDAP service. For example, you can expose the user interface and then access it through localhost:11011 from the browser.
For production use cases, it is better to expose the CDAP services through a load balancer. Consult with your Kubernetes provider for how to deploy a load balancer.
Enable Authentication Service
To enable the Authentication Service in K8s environment to provide Perimeter Security, extra configurations are needed in the CDAP YAML file.
Set the following configurations in the CDAP YAML file "config:" section.
Add configurations for the the authentication handler based on Configuring Managed Authentication under the "config:" section.
Use the CDAP docker image to generate an "auth.key" file.
Create a k8s secret from the "auth.key" file.
Add the secret to the CDAP YAML file to map the secret into CDAP pods by adding a "secretVolumes" (same level as other options, like "config").
Now, you can start CDAP with security enabled, without needing Zookeeper.
Running CDAP Programs
Starting in CDAP 6.7.0, you can run CDAP programs on Kubernetes using Spark.
Note: MapReduce and Spark Streaming engines are not supported.
To run CDAP programs on Kubernetes, as a prerequisite the following service account and role binding needs to be created as a requirement from Spark.
Create service account
kubectl create serviceaccount spark
Create role binding
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
Verify by running a pipeline
Run a pipeline using CDAP UI:
Limitations
coming soon
Appendix
This section describes how to create the resources required for the CDAP installation using Google Cloud Platform.
Preparation
We will be using the standard bash shell and gcloud command line tool to perform the setup. Install Google Cloud SDK before you proceed.
Set up the following environment variables for using the gcloud command:
Kubernetes
Create a Google Container Engine (GKE) as the Kubernetes cluster. Make sure the GKE API is enabled before executing the following commands.
Postgresql Database
Create a Google Cloud SQL instance to serve as the PostgreSQL database. Make sure the Cloud SQL API is enabled before executing the following commands:
Elasticsearch in Kubernetes
We are using the Elasticsearch Operator to operate an Elasticsearch instance inside the Kubernetes cluster. You can deploy the following YAML to create all the necessary resources to have the operator running in the Kubernetes cluster, inside the elastic-system namespace:
After deploying the Elasticsearch operator, you can deploy the following custom resource to start an Elasticsearch instance inside the Kubernetes cluster.
You can validate that the Elasticsearch instance is up and running correctly by observing an Elasticsearch pod is in the RUNNING state.
After the Elasticsearch instance, you need to get the default user password from the secret created by the operator. This password is needed in the cdap-security.xml file for CDAP to authenticate itself to Elasticsearch.
Related content
Created in 2020 by Google Inc.