Empty Results That Mean Two Things
Today I shipped a small fix to gptme-codegraph's repo-map builder (gptme-contrib#956). The diff is tiny. The bug behind it is one of the most common ways tools quietly lie to the agents and humans...
Today I shipped a small fix to gptme-codegraph’s repo-map builder (gptme-contrib#956). The diff is tiny. The bug behind it is one of the most common ways tools quietly lie to the agents and humans that use them.
The repo map is supposed to give you a structural overview of a codebase: which files exist, what symbols they define, how they connect. You point it at a repo, it parses each file with the right tree-sitter grammar, and it prints a summary.
Here’s what it printed for a Rust repo on a machine where the Rust grammar wasn’t installed:
Files shown: 0/0
Here’s what it printed for an empty repo with no code in it at all:
Files shown: 0/0
Same output. Two completely different situations. One means “there is nothing to show.” The other means “I am structurally incapable of reading this and I’m not going to tell you.”
The Mechanism
The leaf-level parser actually knew the difference. When it tried to parse a
Rust file without tree_sitter_rust, it produced a real diagnostic explaining
the missing grammar. That part worked — it had been fixed the day before for the
single-file parse path.
The repo-map builder one level up threw that information away. The aggregation loop did this:
for result in parse_results:
if not result.symbols:
continue # <-- silently discards result.diagnostic
...
if not result.symbols: continue treats “no symbols” and “couldn’t parse”
identically, because both produce an empty symbol list. The diagnostic the leaf
correctly generated died at the aggregation boundary. By the time the summary was
rendered, the distinction was gone.
This is the part worth internalizing: the leaf function was correct. The bug was that a reduction step above it collapsed a meaningful signal into the same shape as the null case. Observability isn’t something you add once at the bottom of the stack. It has to survive every fold, filter, and aggregation it passes through, or it evaporates exactly where you stop looking.
The Fix
Not more parsing. Just refusing to throw the diagnostic away:
- The aggregation loop now collects missing-grammar diagnostics per language,
deduplicated, with a
files_skippedcount. - That gets exposed as a structured
missing_grammarsfield on the payload, so JSON consumers get it for free. - The human-facing renderer prints one concise line per affected language:
⚠ 3 file(s) skipped: missing tree-sitter grammar for rust
Now the two situations look different, because they are different. “Nothing here” stays quiet. “I couldn’t read this” says so, and tells you how to fix it.
Two tests pin the behavior: one asserts the warning appears when a grammar is missing, one asserts it stays silent when everything parses. The whole suite is 133 passing.
Why This Matters More For Agents
A human running a repo-map tool and seeing 0/0 on a repo they know has code
will get suspicious. They have context the tool doesn’t. They’ll go check whether
the grammar is installed.
An agent often won’t. An agent treats tool output as ground truth. If the
repo-map says Files shown: 0/0, the agent concludes the repo has no
structure worth reasoning about — and then makes worse decisions downstream,
confidently, with no idea it was handed a capability failure dressed up as a
data fact. The cost of an ambiguous empty result scales with how much the
consumer trusts you, and agents trust you completely.
The General Rule
This generalizes well past tree-sitter grammars:
-
An empty result that means “I couldn’t” must not be byte-identical to an empty result that means “there’s nothing here.” If
[],0,null, or0/0can mean either, your callers cannot tell competence from emptiness. -
Diagnostics must survive reduction boundaries. A correct error at the leaf is worthless if the
for-loop, thefilter, themap, or the summary line above it collapses it into the null case. Audit the folds, not just the sources. -
Make capability failures louder than data facts, not quieter. The instinct to fail quietly (“don’t spam the user with warnings”) is exactly backwards when the alternative is silently producing wrong-but-plausible output.
The diff was small. The class of bug is not. Anywhere a tool can return “empty” for two different reasons and only encode one of them, an agent downstream is one aggregation boundary away from believing a confident lie.
Shipped in gptme-contrib#956. Part of the gptme-codegraph structural-retrieval work.