We’re doing too much.
The modern DevOps engineer is expected to be a log detective, a mentat to previously-unseen application code, an incident responder, a procurement liaison, a fiscal analyst for cloud pricing, an eBPF live debugger, a security guru, a real-time chart whisperer and dataviz expert; and half a dozen other things.
The breadth is only increasing. DevOps engineers are now also MLOps experts, AI toolchain selectors, LLM quality analysts.
This, for lack of a better word, is real ops (RealOps, if you’d like), or what we actually do everyday—not the world of over-idealized conference talks, CIO whitepapers, and, well, a lot of blog posts.
Most of these things have little to do with automation to improve software delivery (supposed focus of much DevOps work), and a lot to do with just thinking, and well, operating. How can we help operators think and then take action? How can we help, right now?
Metric and monitoring is easy and awful
One way is with data (all kinds). But getting the right data—when you want it and how you want it—is itself a huge task. Consider just one part of all this: metrics & monitoring.
Even without explicitly instrumenting code, today’s apps generate massive amounts of operational data. If you run on any cloud provider, you’ve got system-level data. If you run Kubernetes, you’ve got cluster data (and a lot more). Then with a little bit of OpenTelemetry/SkyWalking/Jaeger/etc you have all the app-level request tracing you’d ever want. Cool.
All this data moves through your systems via who-knows-how-many tools, shippers, pollers, whatever, each one from some recent vintage of the CNCF, and eventually the data ends up in some sort of database that some people know how to access with some sort of query language, and there’s also something to do graphs behind a VPN and also an alerting thing (maybe the same thing?). And this is mostly fine from a strict operational standpoint once it’s all working! Just ignore it really!
Here’s Jason Dixon on monitoring & metrics back in 2011 (that’s year 3 BKE if you use the Kubernetes calendar):
We don’t realize it yet, but what we really need is more competition in this area, particularly from the bottom-up… How awesome would it be if, five years from now, monitoring software looked like EC2, CloudFoundry or Heroku? I like to envision it as the Voltron of monitoring software.
By this he meant a vibrant set of metrics and monitoring tools that played really well together.
CNCF lists 100+ monitoring tools in their tooling landscape (out of 1500 overall tools). It’s an underestimate, too. So yeah: we did get competition; too much of it, in a bloated, complex, expensive ecosystem.
Here’s the real point: I’m talking about just metrics & monitoring. And that is just one of the dozen or so things expected of the modern DevOps engineer. (And who here wants to build yet another monitoring tool from scratch? Actually, don’t answer.)
Should’ve been a farmer?
Reading this state of the world you may have convinced yourself that this profession is awful and we should all do something else.
Well despite that, I’d like to try and make this stuff better.
I know, I know. Operators are drowning in data, with too many tools, way too many responsibilities, and so the way to help is: software? This time we’ll get it right! Well.
ML can help
It’s 2024 and the term “automation”, of course, means something new.
Last year, Ben (my co-founder here at Cased) and I wanted to figure out what machine learning could do to help make work better for DevOps engineers. This was exactly the goal and the level of thinking. Not build a better configuration tool, or another monitoring tool, or another deployment platform. Just, quite literally, make things better across the board for the people operating software. Because DevOps engineers are supposed to be operating software for customers—not operating yet another DevOps tool.
To us that meant using ML innovations (LLMs, mostly) to reduce the cognitive load for everything DevOps engineers do. All their tools. Not to replace those tools (not yet at least!), but instead to soften their edges, to bring out the parts that are most useful, and to weave together all the very disparate software that powers our applications. We don’t think “DevOps is broken”— at least no more than usual. But we do think things can be better, and this is an important time to try.