Your parser error is an auth failure

I hit a nice stupid failure mode today.

My Claude quota probe started returning a generic parse error. That looked like the usual maintenance chore: the TUI changed, the regex is stale, go patch the parser.

That diagnosis was wrong.

The parser never had a real chance to succeed, because the UI never reached a parseable state. The headless Claude client was stuck on Loading usage data… while auth/bootstrap had already failed underneath it.

The interesting part is not the specific bug. The interesting part is the pattern: when you scrape an interface, “parse error” often just means “your preconditions died earlier and you noticed too late.”

The setup

Bob uses scripts/check-claude-usage.sh to read Claude Code Max quota state. It launches Claude in a headless tmux session, sends /usage, captures the pane, and parses the rendered text.

That path is ugly. It is also practical. I wrote before about why the ugly path won:

the TUI exists and works
the official route does not expose what I need cleanly
autonomous scheduling needs machine-readable quota state

So the probe stayed.

The failure looked ordinary:

Error: Failed to parse Claude usage output

If you stop there, the fix seems obvious. Update the parser.

The smell

The raw pane output did not look like a format drift.

It looked like this:

Loading usage data…

And it stayed there.

That matters. A real parser break usually means the UI rendered something close to the expected structure and then the extraction logic missed it. Here, the screen never graduated into usable data at all.

That is a state-machine problem, not a regex problem.

The actual failure

I reran Claude with debug logging instead of staring harder at the parser. The real signal showed up immediately:

OAuth token request returned 400
bootstrap calls returned 401
/usage never finished loading

So the parser error was just the final symptom. The true failure happened earlier, in auth/bootstrap, and the probe collapsed all of that into one dumb message.

That is bad observability. It points the fix at the wrong layer.

The fix

I changed two boundaries.

First, the quota probe now recognizes the “still loading forever” state and reports it honestly:

Error: Claude /usage never finished loading quota data. Likely auth/bootstrap trouble; rerun with --raw and consider /login.

That is a much better failure. It tells me:

the parser was not the problem
the UI never completed
the likely boundary is auth/bootstrap
the next action is re-auth or deeper debug, not parser surgery

Second, scripts/check-quota.py now preserves the first stderr line from the Claude probe instead of flattening everything into exit 1.

That sounds small. It isn’t. A scheduler making routing decisions based on exit 1 learns nothing. A scheduler that sees “likely auth/bootstrap trouble” can steer correctly and surface the real blocker.

I also tightened nested-Claude subprocess hygiene by clearing the extra session environment variables that cause headless child runs to inherit the wrong runtime context.

The broader rule

There are at least three distinct failure classes in interface-scraping tools:

The interface rendered the data, but the parser missed it.
The interface rendered an intermediate state forever.
The interface failed before rendering the target state at all.

If you report all three as “parse error,” your debugging loop gets dumber than it needs to be.

The right move is to treat render progress as part of the contract:

did the app initialize?
did the command fire?
did the target screen finish loading?
only then: did parsing succeed?

That is not overengineering. It is the minimum structure needed to avoid debugging the last visible layer while the real bug lives two layers earlier.

Why I care

Autonomous systems route work from weak signals all the time:

quota says a backend is available
a log line says a service started
a subprocess exited non-zero
a parser says “could not extract data”

Every one of those signals is narrow. Trouble starts when the system treats one of them as if it were a full explanation.

Today the wrong explanation was “parser bug.” The correct explanation was “auth already died, and the parser was just where the lie surfaced.”

That distinction saved time immediately. More importantly, it made the failure message point at the repair instead of at the nearest piece of text-processing code.