关注微信公众号
第一手干货与资讯
加入官方微信群
获取免费技术支持
Kubernetes is an open-source orchestration platform for working with containers. At its core, it gives us the means to do deployments, easy ways to scale, and monitoring. In this article, we will talk about the built-in monitoring capabilities of Kubernetes and include some demos for better understanding.
You may benefit from reading our Introduction to Kubernetes Monitoring which offers a more general primer on challenges and solutions for monitoring Kubernetes.
At the infrastructure level, a Kubernetes cluster is a set of physical or virtual machines, each acting in a specific role. The machines acting in the role of master function as the brain of the operations and are charged with orchestrating the management of all containers that run on all of the nodes.
Master components manage the life cycle of a pod:
Node components are worker machines in Kubernetes, managed by the master. Each node contains the necessary components to run pods:
From a logical perspective, a Kubernetes deployment is comprised of various components, each serving a specific purpose within the cluster:
Monitoring an application is absolutely required if we want to anticipate problems and have visibility of potential bottlenecks in a dev or production deployment.
To help monitor the cluster and the many moving parts that form a deployment, Kubernetes ships with some built-in monitoring capabilities:
In this article, we will be covering the first two built-in tools. A follow up article focusing on the remaining tools can be found here.
There are many Kubernetes metrics to monitor. As we’ve described the architecture in two separate ways (infrastructure and logical), we can do the same with monitoring and separate this into two main components: monitoring the cluster itself and monitoring the workloads running on it.
All clusters should monitor the underlying server components since problems at the server level will show up in the workloads. Some metrics to look for while monitoring node resources are CPU, disk, and network bandwidth. Having an overview of these metrics will let you know if it’s time to scale the cluster up or down (this is especially useful when using cloud providers where running cost is important).
Metrics related to deployments and their pods should be taken into consideration here. Checking the number of pods a deployment has at a moment compared to its desired state can be relevant. Also, we can look for health checks, container metrics, and finally application metrics.
In the following sections, we will take each of the listed built-in monitoring features one-by-one to see how they can help us. The prerequisites needed for this exercise include:
gcloud
gcloud init
gcloud auth login
To begin, start your Rancher instance. There is a very intuitive getting started guide for Rancher that you can follow for this purpose.
Use Rancher to set up and configure a Kubernetes cluster by following the how-to guide.
NoteNote: please make sure Kubernetes Dashboard is enabled and Kubernetes version is v.1.10.
Note: please make sure Kubernetes Dashboard is enabled and Kubernetes version is v.1.10.
Kubernetes Dashboard
v.1.10
As mentioned previously, in this guide we will be covering the first two built-in tools: the Kubernetes dashboard and cAdvisor. A follow up article that discusses probes and horizontal pod autoscalers can be found here.
The Kubernetes dashboard is a web-based Kubernetes user interface that we can use to troubleshoot applications and manage cluster resources.
Rancher, as seen above, helps us install the dashboard by just checking a radio button. Let’s take a look now at how the dashboard can help us by listing some of its uses:
To access the dashboard, we need to proxy the request between our machine and Kubernetes API server. Start a proxy server with kubectl by typing the following:
kubectl
kubectl proxy &
The proxy server will start in the background, providing output that looks similar to this:
[1] 3190 $ Starting to serve on 127.0.0.1:8001
Now, to view the dashboard, navigate to the following address in the browser:
http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/
We will then be prompted with the login page to enter the credentials:
Let’s take a look on how to create a user with admin permission using the Service Account mechanism. We will use two YAML files.
One will create the Service Account:
cat ServiceAccount.yaml
apiVersion: v1 kind: ServiceAccount metadata: name: admin-user namespace: kube-system
The other will create the ClusterRoleBinding for our user:
ClusterRoleBinding
cat ClusterRoleBinding.yaml
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: admin-user roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: admin-user namespace: kube-system
Apply the two YAML files to create the objects they define:
kubectl apply -f ServiceAccount.yaml kubectl apply -f ClusterRoleBinding.yaml
serviceaccount "admin-user" created clusterrolebinding.rbac.authorization.k8s.io "admin-user" created
Once our user is created and the correct permissions have been set, we will need to find out the token in order to login:
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}')
Name: admin-user-token-lnnsn Namespace: kube-system Labels: <none> Annotations: kubernetes.io/service-account.name=admin-user kubernetes.io/service-account.uid=e34a9438-4e12-11e9-a57b-42010aa4009e Type: kubernetes.io/service-account-token Data ==== ca.crt: 1119 bytes namespace: 11 bytes token: COPY_THIS_STRING
Select “Token” at the Kubernetes dashboard credentials prompt and enter the value you retrieved above in the token field to authenticate.
The Kubernetes Dashboard consists of few main views:
Without any workloads running, the dashboard’s views will be mainly empty since there will be nothing deployed on top of Kubernetes. If you want to explore all of the views the dashboard has to offer, the best option is to deploy apps that use different workload types (stateful set, deployments, replica sets, etc.). You can check out this article on deploying a Redis cluster for an example that deploys a Redis cluster (a stateful set with volume claims and configMaps) and a testing app (a Kubernetes deployment) so the dashboard tabs will have some relevant info.
After provisioning some workloads, we can take down one node and then check the different tabs to see some updates:
kubectl delete pod redis-cluster-2 kubectl get pods
pod "redis-cluster-2" deleted NAME READY STATUS RESTARTS AGE hit-counter-app-9c5d54b99-xv5hj 1/1 Running 0 1h redis-cluster-0 1/1 Running 0 1h redis-cluster-1 1/1 Running 0 1h redis-cluster-2 0/1 Terminating 0 1h redis-cluster-3 1/1 Running 0 44s redis-cluster-4 1/1 Running 0 1h redis-cluster-5 1/1 Running 0 1h
cAdvisor is an open-source agent integrated into the kubelet binary that monitors resource usage and analyzes the performance of containers. It collects statistics about the CPU, memory, file, and network usage for all containers running on a given node (it does not operate at the pod level). In addition to core metrics, it also monitors events as well. Metrics can be accessed directly, using commands like kubectl top or used by the scheduler to perform orchestration (for example with autoscaling).
kubelet
kubectl top
Note that cAdvisor doesn’t store metrics for long-term use, so if you want that functionality, you’ll need to look for a dedicated monitoring tool.
cAdvisor’s UI has been marked deprecated as of Kubernetes version 1.10 and the interface is scheduled to be completely removed in version 1.12. Rancher gives you the option to choose what version of Kubernetes to use for your clusters. When setting up the infrastructure for this demo, we configured the cluster to use version 1.10, so we should still have access to the cAdvisor UI.
To access the cAdvisor UI, we need to proxy between our machine and Kubernetes API server. Start a local instance of the proxy server by typing:
Next, find the name of your nodes:
kubectl get nodes
You can view the UI in you browser by navigating to the following address, replacing the node name with the identifier you found on the command line:
http://localhost:8001/api/v1/nodes/gke-c-plnf4-default-pool-5eb56043-23p5:4194/proxy/containers/
To confirm that kubelet is listening on port 4194, you can log into the node to get more information:
gcloud compute ssh admin@gke-c-plnf4-default-pool-5eb56043-23p5 --zone europe-west4-c
Welcome to Kubernetes v1.10.12-gke.7! You can find documentation for Kubernetes at: http://docs.kubernetes.io/ The source for this release can be found at: /home/kubernetes/kubernetes-src.tar.gz Or you can download it at: https://storage.googleapis.com/kubernetes-release-gke/release/v1.10.12-gke.7/kubernetes-src.tar.gz It is based on the Kubernetes source at: https://github.com/kubernetes/kubernetes/tree/v1.10.12-gke.7 For Kubernetes copyright and licensing information, see: /home/kubernetes/LICENSES
We can confirm that in our version of Kubernetes, the kubelet process is serving the cAdvisor web UI over that port:
sudo su - netstat -anp | grep LISTEN | grep 4194
tcp6 0 0 :::4194 :::* LISTEN 1060/kubelet
If you run Kubernetes version 1.12 or later, the UI has been removed, so kubelet does not listening on port 4194 anymore. You can confirm this with the commands above. However, the metrics are still there since cAdvisor is part of the kubelet binary.
The kubelet binary exposes all of its runtime metrics and all of the cAdvisor metrics at the /metrics endpoint using the Prometheus exposition format:
/metrics
http://localhost:8001/api/v1/nodes/gke-c-plnf4-default-pool-5eb56043-23p5/proxy/metrics/cadvisor
Among the output, metrics you can look for include:
container_cpu_user_seconds_total
container_cpu_system_seconds_total
container_cpu_usage_seconds_total
container_memory_cache
container_memory_swap
container_memory_usage_bytes
container_memory_max_usage_bytes
container_fs_io_time_seconds_total
container_fs_io_time_weighted_seconds_total
container_fs_writes_bytes_total
container_fs_reads_bytes_total
container_network_receive_bytes_total
container_network_receive_errors_total
container_network_transmit_bytes_total
container_network_transmit_errors_total
Some additional useful metrics can be found here:
/healthz
/healthz/ping
/spec
MachineInfo()
For example, to see the cAdvisor MachineInfo(), we could visit:
http://localhost:8001/api/v1/nodes/gke-c-plnf4-default-pool-5eb56043-23p5:10255/proxy/spec/
The pods endpoint provides the same output as kubectl get pods -o json for the pods running on the node:
pods
kubectl get pods -o json
http://localhost:8001/api/v1/nodes/gke-c-plnf4-default-pool-5eb56043-23p5:10255/proxy/pods/
Similarly, logs can also be retrieved by visiting:
http://localhost:8001/logs/kube-apiserver.log
Monitoring is vital in order to understand what is happening with our applications. Kubernetes helps us with a number of built-in tools and provides some great insights for both, infrastructure layer (nodes) and logical one (pods).
This article concentrated on the tools that focus on providing monitoring and metrics for users. Continue on to the second part of this series to learn about the included monitoring tools focused on workload scaling and life cycle management.
In Rancher, you can easily monitor and graph everything in your cluster, from nodes to pods to applications. The advanced monitoring tooling, powered by Prometheus, gives you real-time data about the performance of every aspect of your cluster. Watch our online meetup to see these advanced monitoring features demoed and discussed.