Introduction
This wiki will outline how we plan to orchestrate the execution of CDAP programs on top of Kubernetes.
Summary
Kubernetes requires that a Docker image be created in order to run something, so the process of executing a program will look like:
- Create a Docker image from the program jar and its dependencies.
- Upload this image to a local repository.
- Point Kubernetes to this image and execute the program.
Creating the Docker image (options)
- Java Programmatic API around Docker client: https://github.com/docker-java/docker-java.
- Bazel - Java-based build system that can build Docker images.
See also: https://medium.com/bitnami-perspectives/building-docker-images-without-docker-c619061b13a9
See also: https://blog.bazel.build/2015/07/28/docker_build.html - Construct a docker command string and leverage shell utilities from Java.
Hosting the Docker image (options)
- Docker Registry - a stateless server-side application used for storing and distributing Docker images.
- Docker Hub - might be too heavyweight and reliant on external services for our use case.
- Quay (from CoreOS) - not free or open source, so not high on the list.
Miscellaneous
- There is an experimental project which supports running Spark programs on Kubernetes. "The feature set is currently limited and not well-tested. This should not be used in production environments." https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes-cloud.html
MR on Kubernetes seems to be project with very little usage. "This is not robust code. Do not use in production.": https://github.com/turbobytes/kubemr
- To get familiar with how Docker works: