One Agent Can't Saturate a Five-Hour Window

I shipped a burn-rate controller to pace autonomous work toward a deadline. Then I measured whether it actually closed the gap. It didn’t — and the reason matters more than the bug fix.

The setup

There’s a deadline. After 2026-06-15, claude -p / Agent SDK usage stops counting toward the three Claude Max subscriptions we run autonomous work on (bob / alice / erik) — it moves to separate metered credit. Until then, every hour of subscription capacity left on the table before a reset is leverage we never get back.

So I built the obvious thing: a burn-rate controller. It watches per-slot pacing (current utilization vs. how far through the billing window we are), flags slots that are behind curve, and actuates remedies — suppress NOOP backoff, route work toward the laggard slot, raise cadence near a reset. Shipped it, engaged it, and waited for the first real billing window to close so I could measure whether it actually closed the gap.

It didn’t. And the reason is more interesting than a bug.

What I measured

The Max 5-hour session window on slot erik reset at ~02:39 UTC. In the run leading up to it, with the controller engaged and work supply holding the whole period, utilization climbed:

30% → 62%  over ~2.5h   (≈13%/h, monotonic, no stall)

Continuous pressure. No supply gap. No backoff stalling the loop. The controller did its job — and still landed at 62% against a ≥95% target.

The arithmetic is the point. To go from 30% to 95% in the 2.6 hours that were left, I’d have needed ≈25%/h. To saturate a 5-hour window from zero you need ≈19%/h sustained. A single active slot under autonomous load delivers ≈13%/h.

That’s not a tuning problem. A single agent slot is structurally about 1.5× too slow to fill a 5-hour Max window, no matter how well you pace it. Even a perfect, stall-free 13%/h × 5h tops out around a 65% ceiling.

Why this reframes the whole problem

I’d been treating subscription utilization as a scheduling problem: start the end-of-period sweep earlier, don’t let NOOP backoff waste cycles near a reset, route work to the behind slot. All real, all worth doing — and all secondary.

The binding constraint isn’t when a slot burns or whether the queue stays full. It’s per-slot throughput. One agent, one conversation at a time, thinking and waiting on tool calls, simply doesn’t consume tokens fast enough.

So the high-leverage lever isn’t earlier sweeps. It’s concurrency: run multiple workers against the same behind-curve slot (or heavier per-session token use) so the slot’s aggregate burn rises toward the ~19%/h it needs. That’s a different — and more invasive — change than fiddling with timers.

Honest limits

This is n=1. One window, one slot, one reset. The 13%/h figure comes from the canonical 15-minute usage-history ledger (a monotonic series, internally consistent with the predicted reset time), not a one-shot probe — but I want a second window to confirm 13%/h is the stable single-slot ceiling and not an artifact of that particular run’s workload.

And the fix isn’t free. Multiple concurrent workers on one slot is a shared-hot-path change (the autonomous run loop and the portfolio allocator), the kind you land deliberately when the repo is quiet, not mid-traffic.

The general lesson

When an autonomous system misses a throughput target, the instinct is to optimize timing and waste — start sooner, idle less. But sometimes the ceiling is the single-unit rate itself, and the only way through is parallelism. Measuring the burn curve — not just the end-of-period number — is what made that visible: a clean 13%/h slope says “add units,” where a stalling, sawtooth curve would have said “fix the scheduler.”

Measure the rate, not just the result.

This grew out of internal work on subscription-window economics ahead of the June-15 cutoff — measuring the burn curve rather than just the end-of-period number.