-1.3 C
New York
Wednesday, February 4, 2026

Infrastructure as Intent – O’Reilly



Infrastructure as Intent – O’Reilly

There’s an open secret on the earth of DevOps: No person trusts the CMDB. The Configuration Administration Database (CMDB) is meant to be the “supply of fact”—the central map of each server, service, and software in your enterprise. In principle, it’s the muse for safety audits, price evaluation, and incident response. In apply, it’s a piece of fiction. The second you populate a CMDB, it begins to rot. Engineers deploy a brand new microservice however neglect to register it. An autoscaling group spins up 20 new nodes, however the database solely data the unique three. . . 

We name this configuration drift, and for many years, our trade’s resolution has been to throw extra scripts on the downside. We write huge, brittle ETL (Extract-Remodel-Load) pipelines that try to scrape the world and shove it right into a relational database. It by no means works. The “world”—particularly the trendy cloud native world—strikes too quick.

We realized we couldn’t resolve this downside by writing higher scripts. We needed to change the basic structure of how we sync information. We stopped attempting to boil the ocean and repair the whole enterprise without delay. As an alternative, we centered on one notoriously troublesome atmosphere: Kubernetes. If we might construct an autonomous agent able to reasoning in regards to the advanced, ephemeral state of a Kubernetes cluster, we might show a sample that works in all places else. This text explores how we used the newly open-sourced Codex CLI and theMannequin Context Protocol (MCP) to construct that agent. Within the course of, we moved from passive code technology to lively infrastructure operation, remodeling the “stale CMDB” downside from a knowledge entry job right into a logic puzzle.

The Shift: From Code Era to Infrastructure Operation with Codex CLI and MCP

The rationale most CMDB initiatives fail is ambition. They attempt to monitor each swap port, digital machine, and SaaS license concurrently. The result’s a knowledge swamp—an excessive amount of noise, not sufficient sign. We took a unique strategy. We drew a small circle round a selected area: Kubernetes workloads. Kubernetes is the proper testing floor for AI brokers as a result of it’s high-velocity and declarative. Issues change always. Pods die; deployments roll over; providers change selectors. A static script struggles to tell apart between a CrashLoopBackOff (a brief error state) and a purposeful scale-down. We hypothesized that a big language mannequin (LLM), appearing as an operator, might perceive this nuance. It wouldn’t simply copy information; it might interpret it.

The Codex CLI turned this speculation right into a tangible structure by enabling a shift from “code technology” to “infrastructure operation.” As an alternative of treating the LLM as a junior programmer that writes scripts for people to assessment and run, Codex empowers the mannequin to execute code itself. We offer it with instruments—executable capabilities that act as its fingers and eyes—through the Mannequin Context Protocol. MCP defines a transparent interface between the AI mannequin and the skin world, permitting us to reveal high-level capabilities like cmdb_stage_transaction with out instructing the mannequin the advanced inside API of our CMDB. The mannequin learns to make use of the device, not the underlying API.

The structure of company

Our system, which we name k8s-agent, consists of three distinct layers. This isn’t a single script working prime to backside; it’s a cognitive structure.

The cognitive layer (Codex + contextual directions): That is the Codex CLI working a selected system immediate. We don’t fine-tune the mannequin weights. Infrastructure strikes too quick for fine-tuning: A mannequin educated on Kubernetes v1.25 could be hallucinating by v1.30. As an alternative, we use context engineering—the artwork of designing the atmosphere through which the AI operates. This entails device design (creating atomic, deterministic capabilities), immediate structure (structuring the system immediate), and data structure (deciding what data to cover or expose). We feed the mannequin a persistent context file (AGENTS.md) that defines its persona: “You’re a meticulous infrastructure auditor. Your purpose is to make sure the CMDB precisely displays the state of the Kubernetes cluster. You should prioritize security: Don’t delete data until you have got optimistic affirmation; they’re orphans.”

The device layer: Utilizing MCP, we expose deterministic Python capabilities to the agent.

  • Sensorsk8s_list_workloadscmdb_query_servicek8s_get_deployment_spec
  • Actuatorscmdb_stage_createcmdb_stage_updatecmdb_stage_delete

Notice that we monitor workloads (Deployments, StatefulSets), not Pods. Pods are ephemeral; monitoring them in a CMDB is an antipattern that creates noise. The agent understands this distinction—a semantic rule that’s arduous to implement in a inflexible script.

The state layer (the security internet): LLMs are probabilistic; infrastructure should be deterministic. We bridge this hole with a staging sample. The agent by no means writes on to the manufacturing database. It writes to a staged diff. This permits a human (or a coverage engine) to assessment the proposed modifications earlier than they’re dedicated.

The OODA Loop in Motion

How does this differ from a regular sync script? A script follows a linear path: Join → Fetch → Write. If any step fails or returns surprising information, the script crashes or corrupts information. Our agent follows the Observe-Orient-Determine-Act (OODA) loop, popularized by army strategists. Not like a linear script that executes blindly, the OODA loop forces the agent to pause and synthesize data earlier than taking motion. This cycle permits it to deal with incomplete information, confirm assumptions, and adapt to altering situations—traits important for working in a distributed system.

Let’s stroll by means of an actual situation we encountered throughout our pilot, the Ghost Deployment, to discover the advantages of utilizing an OODA loop. A developer had deleted a deployment named payment-processor-v1 from the cluster however forgot to take away the report from the CMDB. A typical script may pull the record of deployments, see payment-processor-v1 is lacking, and instantly problem a DELETE to the database. The chance is apparent: What if the API server was simply timing out? What if the script had a bug in its pagination logic? The script blindly destroys information based mostly on the absence of proof. 

The agent strategy is essentially completely different. First, it observes: Calling k8s_list_workloads and cmdb_query_service, noticing the discrepancy. Second, it orients: Checking its context directions to “confirm orphans earlier than deletion” and deciding to name k8s_get_event_history. Third, it decides: Seeing a “delete” occasion within the logs, it causes that the useful resource is lacking and that there’s been a deletion occasion. Lastly, it acts: Calling cmdb_stage_delete with a remark confirming the deletion. The agent didn’t simply sync information; it investigated. It dealt with the paradox that normally breaks automation.

Fixing the “Semantic Hole”

This particular Kubernetes use case highlights a broader downside in IT operations: the “semantic hole.” The information in our infrastructure (JSON, YAML, logs) is filled with implicit that means. A label “env: manufacturing” modifications the criticality of a useful resource. A standing CrashLoopBackOff means “damaged,” however Accomplished means “completed efficiently.” Conventional scripts require us to hardcode each permutation of this logic, leading to 1000’s of strains of unmaintainable if/else statements. With the Codex CLI, we change these 1000’s of strains of code with a number of sentences of English within the system immediate: “Ignore jobs which have accomplished efficiently. Sync failing Jobs so we are able to monitor instability.” The LLM bridges the semantic hole. It understands what “instability” implies within the context of a job standing. We’re describing our intent, and the agent is dealing with the implementation.

Scaling Past Kubernetes

We began with Kubernetes as a result of it’s the “arduous mode” of configuration administration. In a manufacturing atmosphere with 1000’s of workloads, issues change always. A typical script sees a snapshot and sometimes will get it incorrect. An agent, nonetheless, can work by means of the complexity. It’d run its OODA loop a number of instances to unravel a single problem—by checking logs, verifying dependencies, and confirming guidelines earlier than it ever makes a change. This capacity to attach reasoning steps permits it to deal with the size and uncertainty that breaks conventional automation.

However the sample we established, agentic OODA Loops through MCP, is common. As soon as we proved the mannequin labored for Pods and Providers, we realized we might lengthen it. For legacy infrastructure, we may give the agent instruments to SSH into Linux VMs. For SaaS administration, we may give it entry to Salesforce or GitHub APIs. For cloud governance, we are able to ask it to audit AWS Safety Teams. The fantastic thing about this structure is that the “mind” (the Codex CLI) stays the identical. To assist a brand new atmosphere, we don’t must rewrite the engine; we simply hand it a brand new set of instruments.Nevertheless, shifting to an agentic mannequin forces us to confront new trade-offs. Essentially the most quick is price versus context. We realized the arduous means that you just shouldn’t give the AI the uncooked YAML of a Kubernetes deployment—it consumes too many tokens and distracts the mannequin with irrelevant particulars. As an alternative, you create a device that returns a digest—a simplified JSON object with solely the fields that matter. That is context optimization, and it’s the key to working brokers cost-effectively.

Conclusion: The Human within the Cockpit

There’s a worry that AI will change the DevOps engineer. Our expertise with the Codex CLI suggests the other. This know-how doesn’t take away the human; it elevates them. It promotes the engineer from a “script author” to a “mission commander.” The stale CMDB was by no means actually a knowledge downside; it was a labor downside. It was merely an excessive amount of work for people to manually monitor and too advanced for easy scripts to automate. By introducing an agent that may cause, we lastly have a mechanism able to maintaining with the cloud. 

We began with a small Kubernetes cluster. However the vacation spot is an infrastructure that’s self-documenting, self-healing, and essentially intelligible. The period of the brittle sync script is over. The period of infrastructure as intent has begun!

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Stay Connected

0FansLike
0FollowersFollow
0SubscribersSubscribe
- Advertisement -spot_img

Latest Articles