In the late 90s, when I studied telematics, we had that one course that did not really enthuse me too much. Maybe it was because I was down the path for the software aspects of the studies (one had to choose a path towards MSc) or maybe the topic in itself wasn’t exciting, I can’t remember. What I do remember is that it sounded and felt pretty much like what you can read up on Wikipedia:
Control theory deals with the control of dynamical systems in engineered processes and machines. The objective is to…
Systems. Feedback loops. Transfer functions. SISO/MIMO. Process variables. Use cases in mechanical and electrical systems.
Fast forward some twenty plus years and I come across observability. A term that sounded both familiar and at the same time I wasn’t really connecting, at first. Then it hit me: oh yeah, control theory, riiiiiiight.
With the raise of the open source & open standards driven cloud native world, shepherded by the Cloud Native Computing Foundation (CNCF for short), observability has gone mainstream, starting from ca. 2015 on. First, promoted by startups—LightStep and Honeycomb come to mind immediately—the observability (or: o11y as we kewl kids say) train is now, in 2020, at full speed. When will you board it? :)
Open source & open standards
Going forward, what’s up with o11y and where are we heading? I’d argue that as of end of 2019 the direction is pretty evident:
This keynote at the last “real” KubeCon in San Diego end of 2019 (it was a bit rainy but still, I mean, San Diego, very nice) by the fabulous Liz Fong-Jones & Sarah Novotny wrapped up the OpenTracing+OpenCensus=OpenTelemetry story and with end of 2020 we can expect that to go GA, at least partially.
Why does it matter?
When you look at the telemetry part of observability, that is, how to get the signals (like metrics or traces) to a downstream place where you can analyze, slice & dice or simply dashboard them, you first ask yourself of the ROI of instrumentation, right? Sure, everyone gets it: you don’t want to “fly blind” and for that you need to see where you are and what part of your machine is lighting up like a Christmas tree. But what gives?
Next, you want mobility & portability. Not only the one of your code and configuration but also and especially the one in the brains of your engineers.
So the obvious answer is what we’ve been practicing for the past decades in other areas (IETF, W3C): make an open standard and compete on implementations. Yay, FTW!
OK. If you’re only interested in the business value of o11y and otel, now is a good time to stop reading. Close the tab and think a little about this topic. If you are an engineer and interested in learning more about where we are and how you can start contributing, I invite you to stick around and carry on.
Next up: logs & metrics
The OpenTelemetry (or: otel if you want to be seen as an inside) community made impressive progress over the past year. Looks like the tracing bits will soon be declared GA and with that considered stable.
We’re working on metrics—and have to figure out how to avoid a two-competing-standards situation with OM— and logs. So, if you’re interested in either I’d say for now Gitter is the best place to jump into the deep end. Maybe read the draft specs first?
The OpenTelemetry Metrics API supports capturing measurements about the execution of a computer program at run time…
When I started to dig into the draft otel spec for metrics I came across a table that was hard to digest, so I redrew it (and yes, I will PR the spec to suggest an update ); here it is:
Yeah, slightly more complicated than OM (nee Prometheus model and exposition format) but, oh well, that’s the joy of standardization …
Equally exciting, but maybe a little less in the spotlight are the otel logs specs:
Of all telemetry signals logs have probably the biggest legacy. Most programming languages have built-in logging…
Same story. Read it, implement it, comment on it. The community, and your customers, will thank you.
There’s plenty to do.
From making sure otel uptake is smooth across the stack (are you fluent in C++? consider helping out making Envoy support otel in v3) as well as alignment across CNCF specs and tooling. For example, with my SIG o11y hat on, I’m interested in seeing the Service Mesh Interface (SMI) metrics being aligned with otel (wanna help? comment on issue 199 and offer your POV).
Last but certainly not least, with my AWS hat on, I encourage you to check out the AWS Distro for OpenTelemetry. This is our downstream implementation of the OpenTelemetry APIs and SDK: let us know what you think and where/how you use it, please. Some inspiration needed? Have a look at the blog post series on the OSS blog on that topic!