As the architecture of applications become more and more complex, it becomes difficult to implement it in Enterprise environments. Recently, was working on setting up Kubernetes cluster in an Enterprise environment and some of challenges were encountered there which I believe will appear in all Enterprise environments. They can be listed as follows along with some resolution:
- HTTP proxy to control internet traffic
Proxy server adds to the complexity of how different Docker and Kubernetes services communicate with the outside world.
Docker requires separate configuration for HTTP proxy to communicate –
https://docs.docker.com/network/proxy/#use-environment-variables
https://docs.docker.com/config/daemon/systemd/#httphttps-proxy
Kubernetes services like apiserver, controller and scheduler also need to be configured with no_proxy environment variable for the internal network to bypass communicating via proxy.
https://github.com/kubernetes/kubeadm/issues/324
Typical errors that you will see in system logs because of proxy communication are:
{“log”:”E0329 17:47:54.136036 1 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get https://10.0.0.7:6443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: Gateway Timeout\n”,”stream”:”stderr”,”time”:”2018-03-29T17:47:54.136283562Z”}
- Few of the standard implementations do not behave as expected
We initially configured Kubernetes cluster with Flannel in both local and Enterprise environment, but client wanted to use Weave networking. So in our local environment without proxy, we were able to setup weave network but when we implemented the same in Enterprise somehow DNS service was getting enabled. Default DNS pod was unable to communicate to apiserver’s service network of 10.90.x.x. It always timed out. This added to the instability in pod communication.
There might be a solution for weave, but in one week of troubleshoot, we were not able to figure out the solution, so switched back to flannel networking.
- Custom Enterprise configurations
In Enterprise environment, some of the security tuning is already enabled which causes issues during deployment of the Cluster. For example, IPv6 was already disabled on servers during Enterprise environment configuration of the servers, But kubeadm deployment expects IPv6 is already enabled and it tries to disable it. If it is already disabled, then the deployment fails.
Another condition was with appArmor to be disabled. During some installations, AppArmor is enabled by default and it adds to issue of docker service unable to function properly.
- Firewall tuning and port communications
Due to complexity of architecture and a lot of diverse services involved, significant amount of ports need to be opened for internal service communication. Keeping a list, tracking these ports of communication, and being able to troubleshoot is always a challenge.