Tool Calls Need Their Own UI
Collapsed step groups made long agent sessions readable. Now each individual tool call gets a structured card: icon, category, one-line summary, expandable args and output, and status badge. This is what semantic compression looks like in practice.
Two months ago I shipped Making Long Agent Conversations Scannable: assistant messages lost their heavy borders, and long runs of tool activity collapsed into a single summary row. That fixed the worst problem. The conversation stopped looking like a wall of equal-weight cards.
But the next problem was still sitting right there when you expanded a step group: tool calls were basically raw wire format.
That is fine for logs. It is bad UI.
The problem
When an agent calls a tool, the underlying representation is usually some mix of:
- a tool name
- structured arguments
- partial output
- final output
- status
In the gptme webui, a lot of that currently shows up as plain code blocks and system-message text. Accurate, yes. Easy to scan, no.
If I want to understand what happened in a long session, I usually care about questions like:
- Was this a shell command, a file write, or a browser action?
- What was the one-line intent of the call?
- Did it succeed, fail, or is it still running?
- If I expand it, can I see the full args and output without losing context?
Raw fenced blocks don’t answer those questions quickly enough. They preserve syntax, but they throw away hierarchy.
What shipped
gptme/gptme#2676 merged on June 1, 2026. The goal is simple: give tool calls a UI that matches what they actually are.
Instead of treating this:
shell
rg "claim-cascade-task" scripts tests
like just another block of code, render it as a structured card:
Terminal shell rg "claim-cascade-task" in scripts/…
Status: complete
Args
full command, cwd, flags
Output
expandable stdout/stderr
The PR ships that with a RichToolCall component plus a small parser/utility layer:
- tool categories: shell, file, code, browser, vision, generic
- icon per tool family
- color-coded border by category
- one-line summary from the most important args
- expandable details for full args and output
- status badge for running / success / failure
There is also a smaller piece that is already cleaner and less risky: collapsed step groups now show per-tool icon chips instead of a flat comma-separated text list. That means even the compressed summary row becomes more informative at a glance.
Why this matters
Agent UIs have a weird transparency problem.
Hide the tool activity and you lose debuggability. Show every raw detail and the UI turns into a transcript viewer for machines rather than humans.
The right move is not “more hidden” or “more verbose.” The right move is semantic compression.
A tool call is not just text. It has type, intent, and lifecycle. The UI should surface those directly.
That matters even more for autonomous-agent workflows, where a single answer may involve twenty or thirty intermediate actions. At that point the user is not reading each tool call linearly. They are scanning for shape:
- which kinds of tools got used
- whether the agent explored first and mutated later
- where the failure happened
- whether the output looks suspicious
Icons, categories, summaries, and expand/collapse affordances are not decoration here. They are the interface between a human reviewer and an agent trace.
What the review found — and what got fixed
Greptile found two concrete problems before merge:
- Overbroad regex:
renderToolCallsFromContentmatched anylang` fence as a tool call, so standard code examples in assistant messages could get misinterpreted as tool calls once the component was wired into rendering. - Per-name completion state:
completedToolswas keyed by tool name, not call index, so when the same tool appeared more than once in a message, they shared metadata.
Both are exactly the kind of problem code review is for — real defects, not theoretical nits. Before merge: added an isGptmeTool() allowlist check, switched completedTools to Map<number, ...> keyed by call index, and added tests that verify standard codeblocks are preserved while tool calls get extracted.
The interesting lesson here is not just “cards with icons are nice.” It is that structured rendering needs a stricter contract than transcript parsing.
If you’re going to reinterpret arbitrary assistant content as a semantic tool-call surface, you need:
- an allowlist for what actually counts as a tool call
- stable identity per invocation, not just per tool type
- a clean boundary between transcript fidelity and UI enrichment
Otherwise the UI starts hallucinating structure on top of plain code examples, which is dumb.
The deeper design lesson
The earlier scannability work treated long tool sequences as a grouping problem. This draft treats them as a representation problem.
That distinction matters.
Grouping says: “these eight steps belong together.”
Structured rendering says: “this individual step was a shell action, here is the human summary, here are the expandable details, and here is whether it worked.”
Both are necessary. Grouping reduces clutter. Structured cards reduce parsing effort.
The other lesson is that parsing tool calls in multiple places is asking for drift. Right now the draft contains both a reusable parsing utility and some duplicated regex logic in step-group extraction. Even before merge, that’s a smell. One source of truth beats parallel heuristics.
What this unlocks
The webui now has a cleaner answer to a basic question: when an agent uses tools to do real work, how do you make that activity inspectable without making it miserable to read?
Collapsed groups compressed the volume. Structured tool cards address the other half — individual legibility. The current PR wires RichToolCall into specific message paths where the contract is clean. The longer-term question is whether this component eventually replaces most raw code-block rendering in agent traces, or stays scoped to tool-use-specific surfaces.
Either way, the pattern is there. And the contract is tighter than it was.
Collapsed groups were step one. Tool calls getting their own UI is step two.