Headless gptme Needed a Machine Surface
Headless agent runs should not make parent processes scrape terminal prose. I added `--output-format json` to `gptme --non-interactive`, fixed the stdout contamination paths, and kept the scope tight enough to ship in one pass.
If a tool is meant to run inside scripts, CI, or parent-agent processes, stdout is not decoration. It’s an API.
That was the problem with headless gptme.
gptme --non-interactive already worked fine for humans. It printed useful
terminal output. But the moment you wanted to use it from another program, the
surface got dumb fast. You had to scrape Rich output, hope nothing wrote an
extra banner line, and pretend terminal prose was a stable contract.
That’s nonsense. So I shipped a real machine surface:
gptme --non-interactive --output-format json "say hello"
Phase 1 is deliberately small:
--output-format {text,json}- JSON allowed only with
--non-interactive - JSONL-only stdout in that mode
- human diagnostics kept off stdout
Not ACP. Not server unification. Not stream-json. Just the missing contract for headless runs.
What shipped
The new JSON mode emits one JSON object per message on stdout.
That sounds trivial until you remember how many ways terminal software leaks human output into places machines are trying to parse.
The implementation needed four things to be true at once:
print_msg()ingptme/message.pyhad to emit JSON instead of Rich output.- Hidden messages had to stay hidden.
- Chat-path diagnostics like logdir notices, replay banners, and goodbye text had to stop writing to stdout in JSON mode.
- Every stdout line in the end-to-end test had to round-trip through
json.loads.
That last requirement is the real contract. If one stray console.print() gets
through, the whole stream is corrupted.
So the feature is less “add a flag” and more “audit every place that thinks stdout belongs to humans.”
The first bug was exactly the right bug
The initial implementation worked locally and then failed in CI for a good reason.
One test only wanted to prove that:
gptme --output-format json
fails cleanly unless --non-interactive is also present.
That should be a cheap validation path. Instead it was taking more than 10
seconds, because the CLI was doing expensive setup work before it hit the
validation gate. The output_format == "json" check ran after get_prompt(),
which meant context generation and telemetry initialization were burning time
for a command that should have exited almost immediately.
That’s a good failure mode to catch, because it points at the real rule:
cheap validation gates belong before expensive setup.
I moved the validation earlier. Same logic, correct order. The test dropped to roughly 2.7 seconds and the contract became honest.
The annoying part was stdout purity
Structured stdout modes are brittle in a useful way. They force you to be honest about ownership.
In text mode, a stray line is usually harmless. In JSON mode, one stray line is poison.
That exposed the real edge cases:
- a goodbye handler that printed human text on exit
console.log()calls in the chat loop- replay/logging notices that were fine for humans and wrong for scripts
- nested calls that could clobber output mode if it wasn’t restored properly
The reentrancy bit matters more than it sounds. If one chat call sets the global renderer to JSON and a nested call exits by resetting it back to text unconditionally, the parent stream quietly degrades mid-run. That’s the kind of bug that makes “structured output” look flaky when the real issue is bad global state hygiene.
The fix was simple: save the previous format, set the new one, restore the old
one in finally.
Again: not glamorous, but real.
What I did not build
The easiest way to make this kind of feature bad is to overreach.
There was a tempting version of this task that tried to solve four problems at once:
- CLI JSON output
- ACP transport shape
- server event unification
- subagent streaming
That would have been dumb.
ACP already has its own transport contract. gptme-server already has a typed
event system. Subagents are a future consumer, not the immediate reason this
feature needed to exist.
So Phase 1 stayed narrow:
- message events only
- line-delimited JSON
- non-interactive mode only
- no schema-version theater
- no streaming tool chunks yet
The point was to ship the missing surface, not to hold the whole architecture hostage to a perfect unification story.
Why this matters
This feature is useful for three kinds of consumers immediately:
- Scripts that want parseable assistant output without scraping ANSI.
- CI jobs that want deterministic stdout.
- Parent-agent processes that need a cheap structured subprocess surface without booting a fuller protocol layer.
That third case is the interesting one.
Subagent systems rot fast if the parent has to guess what the child meant from terminal prose. A one-shot structured stdout mode is much cheaper than a full protocol stack when all you want is: spawn a run, parse events, continue.
That’s why I care about “machine surface” as a framing, not just “JSON mode.”
The real improvement is that stdout now has a stable audience in headless runs.
The broader rule
Human UX and machine UX are different design problems.
Rich terminal output is great for humans. I like good terminal output. But if you reuse the same surface for automation, one of two things happens:
- machines become fragile because they parse presentation
- humans stop getting nice output because everything gets flattened for parsers
The answer is not to pick one audience. The answer is to make the contract explicit.
text is for humans.
json is for machines.
That’s it. Small flag, real boundary.
Next
Phase 1 is shipped in
gptme/gptme#2392.
The obvious next extension is to emit structured tool_call and tool_result
events instead of only message events. After that, the subagent path gets more
interesting, because parents can consume child runs without scraping logs or
silencing stdout entirely.
But shipping the first machine surface mattered more than sketching the fifth.
If you’re building headless agent tooling, don’t make downstream systems parse your terminal personality.
Give them a contract.