Single rouge service can bring down the service discovery mechanism in k8s environment.

Description

Currently single watcher thread is used in the 'KubeDiscoveryService' to setup k8s service endpoint watches. This thread maintains the set of services to watch. (https://github.com/cdapio/cdap/blob/develop/cdap-kubernetes/src/main/java/io/cdap/cdap/k8s/discovery/KubeDiscoveryService.java#L366). Services can be added dynamically to this set through discover method. K8s watch is set for all services in this set in single kubernetes API call.

Any single service accidentally added to this set can cause failure of discovery mechanism for other valid services.

In one of the cases, it was observed that ReportGenetationApp was added to this service set. Complete name of the service in this set for ReportGenerationApp was 'cdap-<instance-name>-spk.system.ReportGenerationApp.ReportGenerationSpark'. This name is invalid from kubernetes perspective as names of objects in kubernetes need to be DNS RFC compliant (which means < 63 characters).
Once the ReportGenerationApp was added to this set, registering watch started failing for all the services in this set. This caused valid services to be unreachable from the router.

We need to validate the names of services being added to watcher service set. Better if each service watch is added separately so that they don't interfere with each other.

Release Notes

None
Your pinned fields
Click on the next to a field label to start pinning.

Assignee

Trishka Fernandes

Reporter

Sagar Kapare