Field Notes · AI Development

Prompt Sprawl

Rules pile up in the prompt, where nothing can enforce them. I spent a week sharpening one rule, watching the agent slip past it every time, before I gave up and deleted the role it was meant to govern. That role was one small case of a much bigger anti-pattern.

I deleted an agent this week instead of fixing it.

The agent had a simple job. Take a piece of work, break it into a smaller task, hand that task to the next agent. Three steps. Nothing ambiguous about any of them.

It would not do that.

It would take the work and start reframing it. It would write analysis nobody asked for. It would propose two other ways to break the task down, draft the hand-off, second-guess the draft, then rewrite it. It would produce a little planning document explaining how it was thinking about the hand-off. It would do everything except the hand-off.

So I edited its instructions. I made the rule sharper. I told it not to reframe. I told it to hand off the task and nothing else. I added bullet points. I added warnings. At one point I put the rule in capital letters at the top of the file, which is what you do when you are quietly losing an argument with your own software.

The loop I was stuck in

01Edit the instructions. Add the new rule.

02Run it. It behaves the first time.

03Run it again. It drifts.

04Edit the instructions. Make the rule firmer.

05Run it. It finds a new way to drift.

06Go to 01.

I wasn't fixing the agent. I was negotiating with it, every single run.

This went on for over a week before I admitted the obvious. The next edit was not another line in the instructions. It was deleting the agent and writing the hand-off as plain code. A program now takes the work, breaks it down, and passes it on. There is no agent reading the work and deciding what to do with it, because there is no agent there anymore.

Compliance went to 100% the moment I did that. Not because the model got better. Because the model is no longer the place where the rule lives.

Prompt Sprawl is what you get when rules collect where nothing enforces them.

That one agent was a small instance of something you will find in almost every agent system shipping today. The rules live in the prompt. System prompts. In-context examples. Retrieved boilerplate. Role descriptions. Tool descriptions with behavioral hints tucked inside them. Memory plugins re-injecting old instructions. Persona text. Safety preambles.

Every new failure adds another piece of prompt. Something breaks, so you add a sentence. Something else breaks, so you add another. The prompt grows. The rules pile up. None of them are enforced. All of them are requests.

This is Prompt Sprawl: rules accumulating across too many places, none of which can actually enforce them, all of them competing for the model's attention at once. It is the same shape as config sprawl or dependency sprawl. Different material, same failure.

What makes it dangerous is the layer it happens in. The model is asked to comply, never required to. The rule is text sitting inside a system that predicts text, next to a lot of other text. Sometimes the model honors it. Sometimes another instruction wins. Sometimes two rules disagree and the model just picks one. You cannot debug that, and you cannot reproduce it on command.

The prompt has quietly become a junk drawer for business logic, security rules, persona, and role discipline, all mixed in with the actual work.

The reflex, when it fails, is to write more prompt. Sharpen the rule. Add an example of the right behavior. Add a warning about the wrong one. Try a different phrasing. The whole craft of prompt-engineering is built around getting the model to want to comply.

It can't get you all the way, because the model is the wrong layer to ask. You cannot turn a text predictor into a policy enforcer by feeding it more text. You can push compliance from 60% to 95%. You cannot push it to 100%, and the last few percent is exactly where the expensive failures live.

Stop sharpening the rule. Move it out of the prompt.

The fix is not better prompting. It is taking the rule out of the prompt entirely and putting it in the layer that can actually hold it: the runtime. The rule stops being a sentence the model is asked to honor and becomes a gate the runtime checks before the action is allowed. A planning agent cannot write code, not because you told it not to, but because the tool to write code was never handed to it. It is not asked to behave. The path to misbehave does not exist.

Prompt Sprawl

Rule lives in the prompt

The model enforces it. Compliance is a probability. Auditing means watching behavior across many runs and guessing. A new rule grows the prompt. Two rules can conflict and the model decides which wins. The misses are the cost of the layer.

Runtime enforcement

Rule lives in the runtime

The runtime enforces it. Compliance is structural. Auditing means reading tested code. A new rule changes the runtime, not the prompt. Conflicts show up when you write them, not at run time. Compliance is total, because breaking the rule isn't a path that exists.

Once the rule moves, the prompt stops carrying weight it was never built to carry. It goes back to being the brief: the thing you are asking the agent to do, written in plain language because that is how people pass intent. Not a contract. Not a permission system. Just the work.

You don't fix the prompt. You take away its authority.

The agent I deleted is the small version of this. Handing off the task used to be a rule the agent was asked to follow. Now it is something a program does. The agent layer no longer has to remember to hand off, because the agent layer no longer gets a vote.

"Move it to the runtime" is short. The architecture behind it isn't.

Moving the rule to the runtime is one sentence. The argument under it is longer, and it runs in three parts: why prompting can't reach 100% no matter how good you get at it, how to tell which rules become code and which stay prose, and why this is the same control shift operating systems went through years ago. Each is its own piece. If Prompt Sprawl is the disease, these three are the anatomy of the cure, best read in order:

The thread through all three is the same. Reliability, token cost, and auditability are not separate wins you chase one at a time. They come together, out of a single choice. Put the rule where it can be enforced and you get full compliance, a leaner context window, and rules you can test, all at once. Not because you optimized for them, but because they are properties of the layer you moved to.

A prompt is a Terms of Service checkbox. The runtime is a firewall.

A checkbox asks you to agree to behave. A firewall makes the behavior impossible. The industry has spent four years building agent systems on checkboxes: rules piling up in the prompt, compliance left to chance, the model trusted to enforce every rule it is also being asked to carry out. The shift is not more checkbox text. It is moving the rules to a layer where text was never the thing doing the enforcing.

The agent is gone. The hand-off is plain code now. The model is no longer the wrong enforcer; it is the right worker, doing a job the runtime has already drawn the lines around. That is the whole difference. One system asks the model to behave. The other never has to.