/
Running on Kubernetes
Running on Kubernetes
Introduction
This wiki will outline how we plan to orchestrate the execution of CDAP programs on top of Kubernetes.
Summary
Kubernetes requires that a Docker image be created in order to run something, so the process of executing a program will look like:
- Create a Docker image from the program jar and its dependencies.
- Upload this image to a local repository.
- Point Kubernetes to this image and execute the program.
Creating the Docker image (options)
- Java Programmatic API around Docker client: https://github.com/docker-java/docker-java.
- Bazel - Java-based build system that can build Docker images.
See also: https://medium.com/bitnami-perspectives/building-docker-images-without-docker-c619061b13a9
See also: https://blog.bazel.build/2015/07/28/docker_build.html - Construct a docker command string and leverage shell utilities from Java.
Hosting the Docker image (options)
- Docker Registry - a stateless server-side application used for storing and distributing Docker images.
- Docker Hub - might be too heavyweight and reliant on external services for our use case.
- Quay (from CoreOS) - not free or open source, so not high on the list.
Miscellaneous
- There is an experimental project which supports running Spark programs on Kubernetes. "The feature set is currently limited and not well-tested. This should not be used in production environments." https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes-cloud.html
MR on Kubernetes seems to be project with very little usage. "This is not robust code. Do not use in production.": https://github.com/turbobytes/kubemr
- To get familiar with how Docker works:
TODO:
- Have some numbers around building a Docker image.
- How can Kubernetes be the runtime under the Twill API, instead of YARN? What are the issues with this integration? What in the Twill API can't be supported?
- Is there a programmatic API (or at least RESTful) around Kubernetes command-line?
- How can CDAP master talk to the Kubernetes master to get program status (or any of the Kubernetes interactions)?
- How long will a Docker image take to run a CDAP program - with and without a base image that has as much as possible of the common stuff?
- How can we leverage functionality in Kubernetes to avoid a dependency on Zookeeper? Or should we just use etcd regardless of whether we're using Kubernetes or not?
- Do we need provisioner hooks? For instance, to kick off an instance of Docker Registry after provisioning a Kubernetes cluster?
- Do research about difficulty of use for YARN vs Kubernetes, ZooKeeper vs etcd.
, multiple selections available,
Related content
Installing CDAP on Kubernetes
Installing CDAP on Kubernetes
Read with this
Apache Kudu Sink (Deprecated)
Apache Kudu Sink (Deprecated)
More like this
Apache Kudu Batch Source (Deprecated)
Apache Kudu Batch Source (Deprecated)
More like this
Development Environment Setup
Development Environment Setup
More like this
Amazon Kinesis Spark Streaming Source (Deprecated)
Amazon Kinesis Spark Streaming Source (Deprecated)
More like this
Google Dataproc
Google Dataproc
More like this
Created in 2020 by Google Inc.