US flag signifying that this is a United States Federal Government website An official website of the United States government

Troubleshooting Kubernetes

Troubleshooting Kubernetes

This page is primarily for the cloud.gov team. It's public so that you can learn from it. For help using cloud.gov, see the user docs.

Overview

Kubernetes is used to provided managed services to tenant applications via 18F/kubernetes-broker. We deploy both the Kubernetes Bosh release and the broker via the 18F/cg-deploy-kubernetes repository. Custom images can be found in the Kubernetes broker.

Responding to Kubernetes alerts

Alerts are generated whenever a pod’s status is not Running. Alerts contains the namespace, pod name, and pod status.

If the namespace is default and the pod name is a random string then it is a managed service and likely only impacts a specific tenant application. If the namespace is kube-system or the pod name is human readable then it is a platform component which could impact all managed services.

Login to kubernetes master

From a jumpbox in the appropriate environment: - ssh to a master - update the PATH to find the kubernetes binaries - sudo to root for all the kubectl commands

bosh -d kubernetes ssh master/0
sudo bash
export PATH=$PATH:/var/vcap/packages/kubernetes/bin

Fixing non-running pods

Login to a kubernetes master, find the namespaces, and describe the pods:

kubectl get namespaces
kubectl --namespace :namespace describe pod :pod-name

where you provide values for :namespace and :pod-name. A pod-name will be something like xc956b1d94dd64-master-0

The Events section should indicate why the pod cannot be started. Resolve the underlying issue and the pod should transition into a Running state.

For pods that are part of persistent set, like a statefulset, deployment, daemonset, etc, you can force a pod restart by deleting it, and letting the kubernetes scheduler recreate it:

kubectl --namespace :namespace delete pod :pod-name

Manually pulling an image from Docker

By default, Kubernetes does not pull docker images that already exist on the node. When updating an existing image and tag, force Kubernetes to pull the latest version from a jumpbox with the following command.

bosh -d kubernetes ssh minion 'bash -c "/var/vcap/packages/docker/bin/docker --host unix:///var/vcap/sys/run/docker/docker.sock pull ${DOCKER_USER}/${IMAGE_NAME}:${DOCKER_TAG}"'

Other useful Kubernetes kubectl commands

All of these assume you are logged into a Kubernetes master:

Get a list of pods

Pods are created to satisfy a deployment requirement.

kubectl get pods

Get a list of deployments

Deployments describe how to provision a application, including memory, disk, and services. For example, a WordPress deployment would need both a PHP pod and a MySQL pod:

kubectl get deployments

Get a list of replica sets

kubectl get rs

Get a list of services / ports and ips

kubectl get svc

Review pod logs

Pods output logs to STDOUT, and these are temporary stored by Kubernetes for review. Pods should not contain their own logging mechanisms (ie ElasticSearch should not also run logrotate):

kubectl logs :pod-name

Show pod status

kubectl describe pod :pod-name

Get a shell in a particular pod

Sometimes you need to connect inside a pod:

kubectl exec -it :pod-name /bin/bash

Deleting a pod

Sometimes a pod gets scheduled, and the EBS volume is unable to be mounted on that instance. You can safely delete a pod and it will be automatically rescheduled:

kubectl delete pod :pod-name