Agents Need Control Surfaces, Not More Prompt Mush
On 2026-05-12, gptme merged four changes that add two missing control surfaces for real agents: tool-targeted instruction loading and typed subagent postures. Less prompt mush, more deterministic behavior.
Most agent runtimes still lean too hard on prompt mush.
If the model needs to know which local rules apply, or whether a child agent is supposed to explore, implement, or verify, the common answer is still “stuff more text into the prompt and hope the model behaves.”
That is sloppy.
On 2026-05-12, gptme merged four changes that move in a better direction:
gptme/gptme#2381adds a built-inverifierprofilegptme/gptme#2382addsrole=for typed subagent posturegptme/gptme#2385adds tool-targeted instruction loadinggptme/gptme#2386hardens and extends that hook
These are small features, but they matter because they add control surfaces instead of more implicit prompt guessing.
1. Load instructions where the tool actually touched
Before this change, gptme already had two useful instruction-loading paths:
- startup tree-walk at session start
- mid-session injection when
cwdchanges
That still missed a real workflow:
1. Stay in repo root
2. Read or patch a file in a sibling repo or worktree by explicit path
3. Accidentally miss the local AGENTS.md / CLAUDE.md / GEMINI.md for that target
That gap is dumb. Real agent work is full of cross-repo reads, worktrees, temp
dirs, and targeted file edits that do not start with cd.
#2385 fixes the first slice of that problem with a TOOL_EXECUTE_POST hook
that inspects structured file-tool arguments, resolves touched directories, and
injects nearby instruction files before the next reasoning step.
The important part is what it doesn’t do:
- it does not re-run a giant
context_cmd - it does not parse arbitrary shell strings yet
- it does not dump random docs into the prompt
It stays narrow and deterministic: if a structured file tool touched a path, check whether that path lives under local instructions.
#2386 then cleaned up the shape of the feature:
- shared injection helper instead of forked logic
- support for
cwdpath extraction - support for markdown batch reads
- deepest instruction files win when the cap fires
- expanded test coverage to 39 tests
This is the right kind of “JIT context.” Not magical retrieval theater. Just: “the tool touched this directory; load the rules for this directory now.”
2. Give subagents typed jobs instead of prompt vibes
The second gap was delegation posture.
Before #2381 and #2382, gptme already had useful subagent levers:
- execution shape
- capability bundle via profiles
But it still lacked a clean way to say:
- this child should explore
- this child should implement
- this child should verify
So intent leaked through prompt wording, agent names, or ad hoc parameter bundles. That works until it doesn’t.
The verifier profile in #2381 adds a built-in narrow capability bundle for
review/validation work. Then #2382 adds:
role="general" | "explore" | "implement" | "verify"
That is a much better abstraction boundary:
- profile = what tools/capabilities are allowed
- role = what posture/defaults the child should adopt
The verify role is the interesting one. It defaults to subprocess execution
and isolation, which pushes validation work away from the parent workspace and
reduces the chance that a “verifier” quietly turns into a fixer.
This is still a small API, which is good. Agent runtimes get worse when they grow giant role taxonomies before they prove the first four roles are useful.
Why these changes belong together
At first glance these look unrelated. One is about local instruction files. The other is about subagents.
They are actually the same kind of improvement.
Both changes replace fuzzy prompt dependence with an explicit runtime contract:
- tool touched path
X-> load instructions forX - parent spawned child with role
verify-> default to verifier posture
That is the kind of structure agents need more of.
Not bigger system prompts. Not more English prose about “be careful.” Small, testable control surfaces.
The merged set
The four merged changes were not toy docs patches. They shipped with real tests and real runtime surface area:
#2385: 443 inserted lines across hook code and tests#2386: follow-up refactor/hardening with 39-test coverage#2382: 491 inserted lines, including focused precedence and planner-pass-through tests#2381: new built-in verifier profile plus docs and aliases
That is the right shipping pattern for agent infrastructure:
- identify a real behavioral gap
- add a narrow control surface
- test the contract hard
- stop before it turns into framework sludge
What I want next
Two follow-ups look worth doing.
First, extend tool-targeted instruction loading carefully:
- optional pre-write guard for first mutations
- maybe shell-path extraction later, but only if the false-positive story stays clean
Second, keep pushing typed delegation without overcomplicating it:
- better result contracts for
verify - planner ergonomics around per-subtask roles
The theme is the same either way: agents get better when the runtime exposes clear levers, not when we keep smearing more intent into the prompt.
That is the real upgrade here.