Field Notes · Agent Orchestration

After Loops

The industry went loops-bad, then loops-good. The destination is neither. What was missing from both consensuses, and where the practitioner conversation is heading next.

Loops were bad because nobody was watching them.

The early instinct about autonomous agent loops was right about the symptom and wrong about the cause. A loop with no oversight would run forever, hallucinate progress, consume money without producing output, retry a broken tool a hundred times, and report success having done nothing recoverable. The practitioners who called this bad were describing real failures.

The lesson they took was: minimize the loop. Keep humans in the path. One prompt, one result. Structure the task so the model never has to decide whether to continue. Enough real systems burned enough real money that "loops are dangerous" became a reasonable prior.

It was the right lesson from the wrong experiment. The experiment was not "loops." It was "unsupervised loops." Nobody was watching. Nobody was watching because nobody had built anything to watch.

The experiments that failed were not loop experiments. They were unsupervised-loop experiments. The lesson got shorter than the evidence.

Loops worked, so everyone shipped more loop.

The pendulum swung when enough constrained loops worked in enough demos that "multi-agent" became the entry ticket to the conversation. LangChain, CrewAI, AutoGen, and a dozen descendants organized around one shared claim: with the right framework, loops produce useful work.

The proof of concept was real. A loop with tool access, a clear task, and some retry logic could draft code, call APIs, fix its own tests, and return something a developer would use. The demo worked. The papers got cited. The benchmarks improved. Loops were good.

The framework competition that followed was loop competition. How many tools can one agent hold? How large a context window before it starts losing the thread? How do you route tool selection? The entire product surface area was loop surface: context, memory, tool routing, planning. Nobody stopped to ask what happens when a hundred loops run at once. The demo was never ten agents and a broken credential pool.

The loop got the credit. The constrained task in a controlled demo absorbed the complexity the loop never had to manage.

Look at the loops that actually shipped. Something was watching.

A loop that worked in production had something around it that the demo version did not have. A timeout. A max-step count. A spend cap that killed the run before it became a budget incident. A human operator refreshing the terminal every thirty seconds who would notice if nothing had changed in two minutes. A test suite that caught the bad output before it merged.

None of that is the loop. All of it is supervision. The loop did the work. Something else watched the loop and enforced the boundary conditions. When the boundary held, the run succeeded. When it did not, the loop got blamed.

Credit attribution got misassigned at the system level. People who shipped production loops were actually shipping "supervised loop with adequate constraint layer" and calling it "a loop." People whose loops failed had shipped "loop with no watcher" and called it a loop failure. The variable was the supervision, not the loop, and the vocabulary to say so did not exist.

When a loop failed, the loop got blamed. When it succeeded, the loop got the credit. The constraint layer watching it was invisible in both cases.

Not a smarter agent at the top. A watcher that acts on signals.

Supervision is not a manager agent that coordinates workers. That is orchestration, which is another kind of loop one tier up. A supervisor watches a worker and takes action based on what it observes: restart when the worker stalls, cancel when the budget is exhausted, escalate when the failure pattern is unfamiliar. The supervisor's job is not to do the work. It is to notice when the work has gone wrong and act on that before the failure propagates.

This is what Erlang/OTP solved in 1986. It is what EVE Online re-derived for fleet combat in 2008. The agent layer is the third domain to need the pattern and the first to be missing it. I wrote the longer version in The Supervisor Tier. The short version: a supervision tree with tiered restart policies is forty years old, it works, and the agent framework literature does not have it yet.

Signals are the other half. A loop that terminates silently, exits with no structured output about what it did or why, cannot be supervised. A supervisor needs a terminal signal to act on: success with evidence, failure with a reason, escalation with the specific blocker. Without structured signals, the supervisor is reading tea leaves. With them, most routine failure handling runs in code, without LLM involvement, without token cost.

The combination is what changes the shape of the system. Loop alone is a worker with no foreman. Supervisor alone has nothing to watch. The signal is what connects them, the structured terminal state the loop emits and the supervisor acts on.

The loop is the worker. The supervisor is the watcher. The signal is what makes the system governable. All three are load-bearing.

The practitioners who get there first will do it by scale, not by reading a blog post.

The current practitioner conversation is still "loops good" with the emphasis on loop sophistication. Bigger context windows. More tools per agent. Smarter planning. Better memory. The framework wars are still loop wars, and the practitioners competing on them have not yet hit the scale where the supervision gap becomes the visible bottleneck.

That scale is not far off for anyone building seriously. The jump from ten agents to a hundred is where the human operator stops being the supervision loop. The jump from a hundred to a thousand is where "supervision" stops being optional. Most people have not made the first jump. The lesson is hard to receive at ten agents. It is unavoidable at a hundred.

Most will get there by necessity: scale past where the human operator can absorb the supervision load, discover nothing exists to replace them, and build the tier out of requirement. The expensive route is optional.

"Loops bad" was right about failure, wrong about cause. "Loops good" fixed the framing but kept the supervision debt.

The loop was never the point. The watcher is.

Both consensuses were correct about something real. Unsupervised loops fail for real reasons, and constrained loops ship production work for real reasons. But both diagnoses stopped one tier short. The variable that separates "this loop works" from "this loop burned money and produced nothing" is not the loop. It is what watches it, what signal it emits at termination, and what acts on that signal without a human in the path.

The industry will arrive at supervisors and signals. It usually takes getting past the scale where a human operator can absorb the supervision load, discovering there is no mechanism to replace them, and building one. The pattern is forty years old. The substrate is new. The agent layer is the third domain to need it.