It can be rather challenging for legacy monitoring tools to monitor ephemeral and fast moving environments like Kubernetes. The good news though is, there are many new solutions that can help you with this.
When you are carrying out a monitoring operation for Kubernetes, it’s crucial to make sure that all the components are covered in it, including pod, container, cluster, and node. Also, there should be processes in place to combine the results of the monitoring and creating reports in order to take the correct measures.
Before moving forward, your DevOps team has to understand that monitoring a distributed system like Kubernetes is completely different from monitoring a simple client-server network. That is because monitoring network address or a server gives much lesser relevant information as compared to microservices.
Since Kubernetes is not self-monitoring, you need to come up with a monitoring strategy even before you choose the tool that can help you execute the strategy.
An ideal strategy will have highly tuned Kubernetes that can self-heal and protect itself against any downtime. The system will be able to use monitoring to identify critical issues before they even arise and then resort to self-healing.
Choosing monitoring tools and when to monitor
Every monitoring tool available for Kubernetes has its own set of pros and cons. That is why there is no one right answer for all types of requirements. In fact, most DevOps teams prefer to use a combination of monitoring tool sets to be able to monitor several everything simultaneously.
Another thing to note is that many DevOps teams often think about monitoring requirements much too late in the entire development process, which proves to be a disadvantage. To implement a healthy DevOps culture, its best to start monitoring early in the development process and there are other factors that need to be considered as well.
Monitoring during development – All the features being developed should include monitoring, and it should be handled just like the other activities in the development phase.
Monitoring non-functionals – Non-functionals like requests per second and response times should also be monitored which can help you identify small issues before they become big problems.
There are two levels to monitor the container environment of Kubernetes.
- Application process monitoring (APM) – This scans your custom code to find and locate any errors or bottlenecks
- Infrastructure monitoring – This helps collect metrics related to the container or the load like available memory, CPU load, and network I/O
Monitoring tools available for Kubernetes
The Grafana-Alertmanager-Prometheus (GAP) is a combination of three open source monitoring tools, which is flexible and powerful at the same time. You can use this to monitor your infrastructure and at the same time create alerts as well.
Part of the Cloud Native Computing Foundation (CNCF), Prometheus is a time series database. It is able to provide finely grained metrics by scraping data from the data points available on hosts and storing that data in its time series database. Though the large amount of data it captures also becomes one of its downsides since you cannot use it alone for long-term reporting or capacity management. That is where Grafana steps in since it allows you to front the data which is being scraped by Prometheus.
To connect to Grafana, all you have to do is add the Prometheus URL as a data source and then import the databases from it. Once that integration is done, you can connect Alertmanager with it to get timely monitoring alerts.
Sysdig is an open source monitoring tool which provides troubleshooting features but it’s not a full-service monitoring tool and neither can it store data to create trends. It is supported by a community, which makes Sysdig an affordable option for small teams.
Sysdig can implement RBAC into tooling, it has an agent-based solution for non-container platforms, and it also supports containerized implementation. Since it already has a service level agreement implementation, it allows you to check and get alerts according to response times, memory, and CPU. There are canned dashboards that you can use.
The only downside to using Sysdig is that to use it you need to install kernel headers. Though, the process has now been made much simpler with Sysdig able to rebuild a new module everytime a new kernel is present.
While Sysdig is the free and open source, there is premium support available for enterprise customers through Sysdig Monitor, and there are two types of solutions available- SaaS full-service monitoring and on-premise.
DataDog is a SaaS-only monitoring tool which integrates APM into its services. It provides flexibility with alert monitoring, and you also get access to dashboards through a UI. It can also provide APM for Python, Go, Ruby, and there will soon be a Java support as well.
You can connect DataDog to cloud provider environments, and it can consume data from sources like New Relic and Nagios. With its dashboards, you can overlay graphs, and it can process other APIs which expose data.
There is a service discovery option which allows DataDog to monitor dockerized containers across environments and hosts continuously.
As we mentioned above, monitoring Kubernetes is not an option but a crucial requirement, and it should be implemented right from the development phase.