Prompting is now obsolete. The leverage is in the loop.

You are still typing one message at a time

The perfectly worded prompt was the wrong thing to get good at. A prompt is a single transaction. You type, the model answers, the context evaporates, and the next thing needs you to type again. You are the runtime. Nothing moves unless you are sitting there feeding it, which makes you a faster typewriter, not a force multiplier.

What changed is that agents got capabilities. Not better text. Actual hands: the ability to run commands, read your repo, message your team, hit your APIs, and take the next action without being asked. Once an agent can act in a loop instead of just reply, the unit of work moves from the message to the standing objective. You stop prompting and start delegating.

If you run infrastructure this should land harder than it does for anyone else, because you already live this way. You do not watch a graph and manually decide to add a pod, you write an HPA. You do not eyeball disk and SSH in to rotate logs, you write the alert and the automation. A prompt is the manual SSH session. A loop is the control loop. You already know which one scales to a fleet. You just have not pointed that instinct at your agents yet.

1

A prompt does one thing, then forgets it ever happened

24/7

A loop holds the goal and keeps acting while you are offline

goal

You supply context and an objective, not a script of steps

fleet

The leverage is many agents at once, not one faster chat

What actually changed: agents got hands

What broke prompting open was agents getting tools. A new class of orchestration frameworks lets an ordinary agent run locally, connect to your messaging and ticketing systems, execute multi-step tasks, and keep going without a human in the chat for every step. None of them ship a smarter model. They give the agent real reach: hands instead of just a mouth. That single capability is the whole unlock, because once an agent can take an action and see the result, it can take the next one on its own.

The architecture these tools converge on is the same: a gateway and an agentic loop. The gateway handles connectivity, routing, and auth. The loop handles reasoning and tool execution, over and over, against a goal. Raw model calls never touch your inputs directly, an orchestration layer sits in between and decides what runs next. With that layer in place the agent stops answering you and starts operating.

A chatbot waits for input. An agent in a loop reasons through a problem, takes an action, observes what happened, and decides the next action, with no prompt from you between steps. You moved from operating a tool to supervising a worker. The job changed from writing the message to defining the mission and watching the dashboard.

Ten agents running a business while you sleep

Scale that up and you get a squad: autonomous agents running around the clock, one acting as lead, the rest specialized. They create tasks, claim them, execute, and review each other's work. A digital team where the agents behave like employees. One manages, others specialize, they collaborate to produce output. Nobody in that setup is prompting. Somebody set the mission and walked away.

That pattern is being built into what amounts to a workforce console: a shared task board, threaded discussions between agents, and a live read-only dashboard so you can watch a squad run a real workload around the clock. You hand it context and goals. It runs. The human reads the board, approves the risky moves, and decides what the squad works on next. None of that is typing instructions. This is what we are building toward in our own SRE space, a console over a fleet of loops rather than a chat window over a single model.

You operate the toolOne chat window. You type every step, paste every log, and nothing moves unless you are present to move it.

→

One agent with handsThe agent can run commands and take actions in a loop, so a single objective produces many steps without you between them.

→

You supervise a fleetA squad of agents creates tasks, claims them, executes, and reviews each other. You set the mission and watch the board.

The leverage comes from removing yourself from the inner loop and moving up to the mission layer, not from a sharper prompt.

Figure 1: The shift in three steps. The cost of the old model was that you were the runtime. The win of the new one is that the runtime is a standing fleet and you are the operator above it.

What a loop actually is

A loop is a standing objective plus the machinery to pursue it. You define a goal, hand the agent context and tools, and it iterates: act, observe, decide, repeat, until a real stop condition is met. Nothing here you have not built before in a different shape.

One fact drives the whole architecture: the agent forgets everything between runs, the loop does not. The model is amnesiac by design. Continuity lives in the system around it, never in the model. Every real decision about a loop is a decision about where state lives, because the agent brings none of its own. Your reconciliation controllers are stateless for the same reason: the desired state lives in a store, not in the controller's head.

Strip away the product names and every standing agent system reduces to the same handful of moving parts. Whether you run a multi-agent squad framework, a coding agent that turns issues into PRs, or your own orchestration, you are assembling some combination of these. They are tool-agnostic on purpose. The pattern outlives whichever model or framework is hot this quarter.

trigger

The thing that fires without you typing. A schedule, a webhook, an inbound message, an event. This is what makes a loop a loop instead of a one-shot. It is your CronJob, not your kubectl.

context

What you hand the agent up front instead of re-explaining each turn. The goal, the constraints, the access. Set once, read every run. This is the spec, not a per-task prompt.

tools

The hands. The connectors that let the agent touch the real world: the repo, the shell, the alerting system, the ticket queue, the messaging app, the observability API.

memory

A state file on disk, usually markdown. It survives between runs. The agent forgets, the file does not. In a squad this is literal: a manifest for who does what, an operation file for task state, an intel file for learnings.

squad

Division of labor across agents. One leads, others specialize, each sees only its slice. This is what turns a single loop into a fleet, and it is separation of duties, not a crowd.

gate

The check that decides when work is actually done, and the human approval on anything risky. Without it the loop runs forever or ships garbage confidently. We come back to this, because it is where most squads fail.

The state file is the loop

Create one place the whole fleet reads and writes. The mature squad setups do this in plain markdown: a manifest that defines the squad, an operation file that tracks task state, an intel file that accumulates what the agents have learned. Call yours STATE.md if you like. This is the loop's only memory, and it is the difference between a workforce and a goldfish that happens to come in packs of ten.

At the start of each run the agent reads this first. At the end it writes back what happened and what comes next. Without it, every run starts cold no matter how many runs came before, and a ten-agent squad with no shared state is just ten chatbots talking past each other. With it, agent six knows what agent one decided this morning. This is your desired-state document. The fleet reconciles against it the way a controller reconciles against a spec.

The rule that bites people

Keep it short and structured. A state file the agents have to wade through thousands of lines of is worse than none, because now every iteration burns tokens and attention re-reading a novel before it does any work. Plain sections: current goal, in progress, blocked, done, learnings. Treat it like a status page, not a changelog.

Autonomous fleets drift, and they drift quietly

Every unattended squad hits the same wall. Left to run on its own, an agent fleet drifts. Small errors compound. One agent hallucinates a fact and the others treat it as ground truth, because to them it is just more context. Circular discussions emerge where agents agree with each other and add nothing. And if the lead agent gets a bad idea or stuck in a loop, the entire squad follows it off the cliff, confidently, all night.

The fix is the oldest idea in engineering: separate the doing from the sign-off, and never let the agent that did the work be the only thing that says it is done. A second agent told to "review this" with no objective standard is just a second optimist, it will agree more often than not. The check has to fail on something real. A test suite. A type checker. A plan that shows no unexpected diff. A policy engine that returns deny. A metric that did or did not move. "The agents discussed it and feel good" is not a gate, it is two hallucinations shaking hands.

The trap

Activity is not value. A squad can generate mountains of research, content, and code overnight and feel incredibly productive while being completely disconnected from any outcome that matters. You wake up to a full board and an empty result.

The fix

Define the success metric up front and wire it into the loop as the stop condition. Tie done to something checkable outside the squad's own opinion. If the metric did not move, the work is not done, no matter how busy the board looks.

The stop condition is the whole ballgame

The quiet killer is a loop that thinks it finished when it did not. The agent is supposed to signal completion only when the job is truly done, and instead it signals early. The loop reads the signal, believes a half-done job is complete, and moves on or exits clean. No error. No crash. Just a confident wrong answer that looks exactly like success. It is a false negative on your own operation, and false negatives are the ones that hurt, because nobody is looking.

Figure 2: Two fleets with identical agents. The only difference is what is allowed to declare victory. The left one will eventually tell you a broken job shipped. The right one cannot, because a passing check is a fact and the squad's confidence is not.

Your stop condition has to be checkable by something other than the agents' own claim, and you add a hard iteration ceiling on top as a backstop. If the loop hits the ceiling without meeting the real condition, it halts and flags for a human. It does not keep grinding compute into the void, the same way a good retry has a budget and a circuit breaker.

Earn autonomy, do not assume it

Full autonomy sounds great until something goes wrong at 3am with nobody watching. So treat autonomy as a ladder. You start at the bottom and climb only when the fleet has earned the next rung. This is change management, not a feature flag, and it maps onto how the saner squad deployments actually run: powerful agents wrapped in approval workflows, role-based access, and audit trails, not turned loose on God-mode permissions.

L1

Suggests only. The fleet tells you what it would do. You do everything. A recommendation engine with no hands.

L2

Drafts the work. Agents open PRs, write plans, stage changes. You review and apply. The work is real, the commit is yours.

L3

Acts on low-risk. The squad executes the safe stuff on its own but needs human approval before anything publishes, merges, or touches money.

L4

Runs unattended, with audit logs. Fully autonomous on a bounded blast radius. Earned over weeks of clean L3 runs, never assumed on launch day.

Start every new fleet at L1 or L2. Run it for a week, read the board, correct what it gets wrong. Promote only what you would have approved unchanged. Build in leadership rotation and consensus on big decisions so one bad lead cannot drag the whole squad off a cliff. And give it periodic context resets, because the longer a squad runs unattended the more its own hallucinations calcify into facts it trusts.

The part nobody puts on the slide: cost and blast radius

A single bad prompt is a wasted minute. A fleet of ten agents looping unattended overnight is a bill, and possibly an incident. Each agent run is a full model call dragging accumulated context behind it, and a squad multiplies that by however many workers you spun up, on whatever cadence you set them loose. The community deployments that brag about huge productivity multipliers are the same ones that quietly mention four and five figure monthly model bills when nobody put quotas in place.

measure

Run it manually for a few iterations and watch the per-run token cost before you let the fleet loose

× agents

Per-run cost times squad size times cadence equals your real worst-case daily burn

quota

Quota-aware scheduling and off-peak batching are how heavy users avoid surprise overages

allowlist

Restrict tools to exactly what the mission needs. Everything else denied by default

This is a security boundary, not just a budget line

A fleet with unrestricted shell or connector access running unattended is the fastest way to turn a cost problem into a security incident. Scope the credentials to the blast radius you are willing to lose. Read access, not write, until the rung is earned. Run the risky workers in isolation so one compromised agent cannot reach the rest. Build the allowlist and the audit trail before the fleet runs, not after the postmortem.

Fleets worth standing up if you run a platform

These are standing loops that pay for themselves the first week, mapped to work you are almost certainly doing by hand right now. Each one is a trigger, shared context, tools, a memory file, and a hard gate. Nothing more exotic than that.

On-call first responder

Fires on a high-severity alert. Pulls the logs, recent deploys, and related past incidents from shared memory, posts a structured first hypothesis before a human opens a terminal. Gate: read-only, never remediates on its own.

Config drift patrol

Runs on a schedule across environments, diffs live state against desired, and opens a reconcile PR for anything that wandered. Gate: the plan must be non-destructive or it escalates instead of acting.

CI failure triage squad

One agent watches the pipeline, another groups failures by likely cause, a third drafts the fix in isolation. They post a ranked board to the channel. Gate: humans merge, the squad only triages and drafts.

Dependency and CVE sweep

Nightly. Scans dependencies and images, cross-references what is actually deployed, opens bump PRs ranked by real exposure. Gate: the build and tests pass on the bumped version before the PR opens.

Cardinality governance

Watches every PR that touches instrumentation, flags high-cardinality tags and redundant series before they hit the bill. Gate: the policy check passes or the PR fails. No human in the inner loop.

Error budget watch

Fires when burn rate crosses a threshold. Reads SLO history, names the top contributors, posts a freeze recommendation with evidence. Gate: it recommends, a human flips the freeze until L4 is earned.

The second fleet is where it compounds

Your first loop is small, single-purpose, heavily supervised. Your second one connects to the first. The triage squad writes ranked findings to a shared board. A second squad reads that board and drafts fixes for the top items in isolation. Neither needs the other to function. Together they move work from discovered to fix-drafted-and-waiting-for-review without you touching either one.

The shared intel file is where this pays off. Once a squad writes down how to triage a given failure, every future fleet that hits that failure reads the same learnings instead of rediscovering them. The agents compound rather than just run in parallel, the same way a good runbook outlives the incident that produced it. The operation gets smarter every night because the state it leaves behind is better than the state it started with.

What your job becomes

Once a few fleets are running, the shape of your day changes. You stop opening a chat window to ask a question. You start opening a board to review what the squads did overnight. The to-do list stops being a static pile and becomes a set of standing agents that keep converting goals into drafts, fixes, and ranked hypotheses before you are awake.

This does not mean you stop deciding what matters. The deciding moves up a level, from per-message to mission design. The fleet does the typing, so your attention goes to the three things that still need a human and probably always will: the approval checkpoint, the stop condition, and the next mission worth standing up.

01

Stop sending steps, start setting missions

The unit of work is no longer the message, it is the standing objective. Give an agent context, tools, and a goal once, and let it loop. If you are still typing the next step, you are the bottleneck.

02

Put the state in a file the whole fleet shares

Goal, in progress, blocked, done, learnings. The agents forget between runs, the file does not. A ten-agent squad with no shared state is ten chatbots talking past each other.

03

Never let the doer be the only sign-off

Squads drift, agents treat each other's hallucinations as facts, and a bad lead drags everyone off the cliff. The check has to fail on something real, not on the squad agreeing with itself.

04

Close the loop on an outcome, not a vibe

Premature completion is silent and it will tell you a broken job shipped. The stop condition must be checkable outside the fleet. Add a hard iteration ceiling as a backstop.

05

Activity is not value

A full board overnight means nothing if the metric did not move. Define success up front and wire it in as the stop condition, or you are paying compute to look busy.

06

Climb the autonomy ladder, do not skip it

L1 suggests, L4 runs unattended. Start at the bottom, run a week, promote only what you would have approved unchanged. Approval workflows and audit trails, not God-mode.

07

Budget the spend and scope the blast radius

Per-run cost times squad size times cadence is your real bill. Quota-aware scheduling, off-peak batching, read-only credentials, isolation for the risky workers. Do it before the fleet runs.

08

Move up to the mission layer

The approval checkpoint, the stop condition, the next fleet worth building. Those need a human. Pasting logs into a chat window does not. Stop being the runtime for your own agents.

You already run control loops for everything that matters in production. Reconciliation, autoscaling, alerting, retries with backoff and a circuit breaker. A standing agent is just a new kind of worker inside that same pattern, except this one can read, reason, and act across your whole stack. Stop hand-driving it through a chat window. Give it a mission, shared memory, a hard gate, and a blast radius you can afford, then go stand up the next one. The demo below runs a live loop you can watch hold or fail in front of you, the same rules, made executable.

Prompting is now obsolete. The leverage is in the loop.

You are still typing one message at a time

What actually changed: agents got hands

Ten agents running a business while you sleep

What a loop actually is

The state file is the loop

The rule that bites people

Autonomous fleets drift, and they drift quietly

The trap

The fix

The stop condition is the whole ballgame

Earn autonomy, do not assume it

The part nobody puts on the slide: cost and blast radius

This is a security boundary, not just a budget line

Fleets worth standing up if you run a platform

On-call first responder

Config drift patrol

CI failure triage squad

Dependency and CVE sweep

Cardinality governance

Error budget watch

The second fleet is where it compounds

What your job becomes

Stop sending steps, start setting missions

Put the state in a file the whole fleet shares

Never let the doer be the only sign-off

Close the loop on an outcome, not a vibe

Activity is not value

Climb the autonomy ladder, do not skip it

Budget the spend and scope the blast radius

Move up to the mission layer

Accessibility