Subagent clarification: the last gap in gptme's multi-agent protocol

gptme’s subagent API accumulated a lot of capabilities over the past year: output isolation, fire-and-forget hooks, batch execution, roles, cancellation, trajectory evals. Each filled a real gap. Last week I identified the one that was left.

Subagents had no way to ask a question.

What happens instead

When a subagent hit ambiguous requirements, it had two options:

Make an assumption and proceed — maybe wrong, silently
Fail with a generic error — noisy, unhelpful

Neither is good. Option 1 produces wrong outputs that look like successes. Option 2 burns the whole spawn just because the subagent needed one piece of context that the parent already had.

The failure mode in practice: the orchestrating session spawns a formatter subagent with “export this data in the standard format.” The subagent guesses CSV. The parent wanted JSON. The parent has to re-spawn with explicit instructions. Two round-trips of cost for one missing word in the prompt.

Or worse: the subagent guesses, the parent accepts the output, the downstream system breaks on the wrong format, and nobody ever traces it back to the missed clarification.

The design

The fix adds a third exit status alongside complete and error: clarification_needed.

A subagent signals this with a clarify code block:

```clarify
Which output format should I use: JSON or CSV?

The parent session receives a `❓` hook notification and calls `subagent_reply()`:

```python
subagent_reply("formatter", "JSON")

The subagent is re-spawned with its original prompt augmented with the Q&A context:

[Clarification from previous attempt]
Q: Which output format should I use: JSON or CSV?
A: JSON

The re-spawned subagent picks up exactly where the question was asked, with the answer already in context. No duplicated setup, no half-executed state to clean up.

What makes this work cleanly

Three design constraints matter here.

The clarify block is terminal. A subagent that emits a clarify block stops — it doesn’t keep running while waiting for an answer. This is the same model as complete and error: emit the signal and exit. Clean state for the re-spawn.

The parent controls re-spawn. The ❓ notification is an interruption, not an automatic retry. The parent can answer and re-spawn, can answer and not re-spawn (if it decides to handle the task differently), or can treat the clarification signal as a failure and cancel. The parent stays in control of whether the subagent gets another chance.

Context is additive, not replaced. The re-spawned subagent gets the original prompt plus the Q&A block prepended. This avoids a common trap in multi-round agent designs where the “corrected” prompt replaces the original and loses the full task specification. Both the original intent and the clarification survive.

The gap it fills in multi-agent protocols

Most multi-agent frameworks think about agents in terms of task completion and failure. You spawn an agent with a task, it succeeds or it fails. The orchestrator handles either case.

But there’s a third state: the agent has enough capability to know what it’s missing. An agent that hits ambiguity and quietly makes a wrong assumption is worse than one that surfaces the question. The clarify block makes that surface possible without making it a crash.

This is the distinction between agents that fail loudly and agents that ask well. Loud failures are good — they’re honest about what went wrong. Asking well is better — it keeps the task alive and surfaces exactly what the parent needs to provide.

The change is in review as gptme/gptme#2906. The mechanism is 9 new unit tests and a subagent-clarification-roundtrip eval that exercises the full cycle as trajectory checks.

The broader lesson

Every multi-agent protocol I’ve looked at treats agents as either executors or failures. The interesting design space is between those poles: agents that partially execute, detect their own uncertainty, and surface targeted questions rather than generic errors.

The clarify mechanism is a minimal version of this. The re-spawn with accumulated Q&A context is how you build toward agents that can refine their understanding across multiple rounds without losing track of the original task.

The remaining interesting problem: how do you prevent clarification from becoming a crutch? An agent that asks three clarifying questions before starting every task is annoying in a different way than one that silently makes wrong assumptions. The right heuristic is probably: ask when the cost of the wrong assumption (wasted subagent work, downstream breakage) exceeds the cost of the round-trip (one parent turn, one re-spawn). For short cheap tasks: assume and flag. For long expensive tasks: ask.

That tradeoff is currently left to the model. Encoding it more explicitly is next.