O11y guide: Keeping your cloud-native observability options open

This is the fourth article in a series covering my journey into the world of cloud-native observability. If you missed any of the previous articles, go back to the introduction for a quick update.

After laying the groundwork for this series in the initial article, I spent some time in the second article sharing who the observing players are. I also discussed about the teams these players are in this world of o11y cloud-native. For the third article, I looked at the ongoing discussion around surveillance pillars Vs steps,

Being a developer from my early days in IT, it has been very interesting to explore the complexities of cloud-native o11y. Monitoring applications goes far beyond just writing and deploying code, especially in a cloud-native world. One thing remains the same: maintaining your organization’s structure always requires both a cautious approach and an understanding of the open standards available.

In this fourth article, I’m going to look at architecture-level alternatives and share open standards with the open-source landscape.

As any architect will tell you, open standards are always a priority when considering adding to your existing infrastructure. Does the candidate component in question adhere to some defined open standards? Is this at least in line with using open standards?

open option

When an open standard exists, and in some early cases there is open consensus, where everyone is centered around a single technology or protocol, it gives an architect some peace of mind. You often have options as to the final component you want to use, as long as it is based on a standard that you feel you can swap it out in the future.

The Open Container Initiative (OCI) for container tooling in cloud-native environments is an example of one such standard. While ensuring that your organization’s architecture uses such a standard, all components and systems that interact with your containers are replaceable at any future choice by you, as long as they remain the same. comply with the standard. It makes choice and choice is a good thing!

Open O11y Project

In cloud-native observability (o11y), there are several open-source projects to help you tackle the early tasks of o11y. Many are closely associated as projects with the Cloud Native Computing Foundation (CNCF) and promote open standards where possible. Some of them have even become an unofficial open standard from their default widespread use in the o11y domain.

Let’s take a look at some of the most commonly encountered cloud-native o11y projects.

prometheus

Prometheus is an undergraduate project under the CNCF umbrella, defined as “…considered stable and used in production”. It is listed as a monitoring system and time series database, but the project site itself advertises that it is used to power your metrics and alerts with a leading open-source monitoring solution.

What does Prometheus do for you?

It provides a flexible data model that allows you to identify time series data, which is a sequence of data points sequenced in time order, by specifying a metric name. Time series are stored in an efficient format in memory and on local disk. Scaling is done by dividing the data into functional sharing, storage and federation.

Leveraging metrics data is done with a very powerful query language called PromQL which we’ll cover in the next section. Alerts for your system are set by using this query language and a provided alert manager for the notification.

Multiple modes are provided for viewing the collected data, from the built-in Expression browser to the integration of the Grafana dashboard and console templating language. There are also a number of client libraries available to help you easily integrate existing services into your architecture. If you want to import existing third-party data into Prometheus, there are a number of integrations available for you to take advantage of.

Each server runs independently, making it an easy starting point and reliable out of the box with only local storage to get started. It is written in Go language and all binaries are statically linked for easy deployment and performance.

There is a Prometheus organization with all the code bases for their projects.

promql

It is officially a part of the Prometheus project, but deserves mention in its own right as an unofficial standard widely used for querying ingested time series data. As stated in the Prometheus documentation:

Prometheus provides a functional query language called PromQL (Prometheus Query Language) that lets the user select and aggregate time series data in real time. Expression results can either be shown as graphs, viewed as tabular data in Prometheus’s expression browser, or consumed by external systems via HTTP APIs.

There are many ways to learn how to write queries in PromQL, but a fun little project called PromLens provides an online demo to help you get up to speed on using, understanding, and troubleshooting PromQL. You can easily spin up a Docker image with the tools setup on your local machine for exploration. Building queries from the perspective of your time series data is a huge boost to your productivity.

There’s a nice backstory on the origins of PromQL in an interview with producer Julius Volz.

open telemetry

Another upcoming project is found in the incubating section of the CNCF site: it’s called OpenTelemetry (OTEL). It is a very fast-growing project with a focus on “high-quality, ubiquitous and portable telemetry to enable effective observation”.

Open Telemetry Community Content

This project helps you generate telemetry data from your applications and services, then forward what is now considered a standard form, called the OTEL protocol, to a variety of monitoring devices. To generate telemetry data, you must first instrument your code, but OTEL makes this much easier with automated instrumentation through its integration with many existing languages.

You can find the community and its code at the Open-Telemetry organization.

pure wool fabric

Prior to OTEL’s appearance, the CNCF project Jaeger provided a distributed tracing platform targeting the cloud-native microservices industry.

CNCF Jaeger, a Distributed Tracing Platform

Jaeger is open-source, end-to-end distributed tracing. Monitoring and troubleshooting transactions in complex distributed systems.

While the project is fully mature, it targets an older protocol and has recently retired its classic client libraries, advising users to migrate to its native support for the OTEL protocol standard.

start your monitoring engine

This ends up being a brief overview of open source projects and (un)official standards that you’ll find when getting started with cloud-native o11y. This brings me to the first stage of getting to practicality where we want to start exploring open source projects, with the understanding that we are just getting started without massive issues so far.

Further, I plan to gain practical experience with Prometheus, to gain some practical experience for my cloud native o11y journey.

Leave a Comment