Monitoring sounds like a terribly useful thing to do. And in many situations it is of course. However, rather than starting our discussion with monitoring, we really should start from the reasons for considering monitoring.
With regard to IT, we typically are faced with non-functional requirements regarding business applications. The business functionality should be available according to specific rules that dictate for example opening hours of availability and the maximum allowed response time for key functions (for 95 or 99% of the requests). In order to meet these requirements, we have to monitor the live behavior of the applications – to see what the end users experience and to act if what we (and they!) see is not as desired. Additionally of course we know that by not only looking through the eyes of the end users at the business application behavior from the outside, but also at the underlying components, we can learn and predict even better what is going on with the stack under the application and therefore possibly soon with the end user experience of the application as well.
Monitoring encompasses the collection of metrics about the actual behavior of various components throughout the stack – in order to detect deviations from desired state and behavior. Such deviations – and trends towards exceptions – should be identified and appropriate action must be taken, ideally proactively. Our monitoring mechanism should at the very least be capable of collecting the metrics across the stack, storing these metrics for further analysis and reporting, expose the metrics through APIs and query facilities, analyze the metrics in real time in order to detect exceptions and send out notification to the proper authorities and make the metrics available to dashboarding tools. The next figure illustrates these core requirements:
Prometheus – an open source project under the CNCF umbrella, grounded at https://prometheus.io/ and available since 2012 – provides exactly these capabilities. It has experienced a rapid rise to prominence, becoming one of the premier solutions for gathering monitoring metrics into a time series database. Its close association with Kubernetes and the community of Kubernetes has further contributed to this rapid ascent.
On 20th September, AMIS hosted a MeetUp on Prometheus and Grafana. This article provides you with the slides for this event as well as a link to the GitHub repository with all resources for the handson workshop we did during this meetup. The slides I presented during this session – see below – give a quick introduction to monitoring in general and Prometheus and how to use it for monitoring in particular. It then also discusses Grafana, a dashboarding tool that can do many things, including collaborating with Prometheus.
A quick overview of what Prometheus is about – at the core:
- Gather metrics into database
- Scheduled pull |harvest| scrape actions – HTTP/TCP requests, accessing Exporters and built in (scrape) endpoints
- Provide exporters (adapters) that expose metrics from technologies and components that are not Prometheus-aware
- Make metrics availabe to consuming systems and humans
- Such as Grafana (for dashboarding), REST APIs, through Prometheus UI – Graphs, Console, PromQL
- Analyze metrics according to [alert] rules and determine if alerts are “firing”
- Act on firing alerts and send notifications
- Supports federation – global view over local environments and recovery of local environment
The architecture of Prometheus is very similar to the figure describing the outline of a monitoring solution:
Prometheus contains functionality to evaluate alert conditions. When a condition is satisfied, the Alertmanager component can send notifications through various communication channels, such as Slack, Email and Chat. Prometheus also contains limited UI capabilities for browsing and analyzing the collected metrics.
For advanced dashboards and extended capabilities for alerting and notifications, a companion product such as as Grafana can be used, as shown in the next figure.
Grafana is a generic open source dashboard product. It supports many types of data sources, of which Prometheus is but one. Grafana queries data sources (such as Prometheus) periodically. It does not store any data. Grafana refreshes visualizations, evaluates alert conditions and triggers alerts/sends notifications.
Grafana has an extensive library of pre-built dashboards available as well as many plugins for visualizing data and importing data from many different sources. It supports user authentication and authorization and multi-tenancy. Grafana can easily be experimented with at its live playground: https://play.grafana.org/.
The GitHub Repository with the Workshop materials can be found here: https://github.com/lucasjellema/monitoring-workshop-prometheus-grafana. The main document is a 60 page PDF manual that takes you through installing Prometheus and companion component as well as Grafana. In order to follow along, you need VirtualBox and Vagrant in your environment.
Alternatively, you could run a Prometheus handson completely in the cloud, at https://www.katacoda.com/courses/prometheus/getting-started (at Katacoda, a Prometheus instance is spun up for you in the cloud and you can walk through the basic steps of scraping the Node Exporter that exposes Linux system metrics).