Two startup hangs, two false assumptions
A nonexistent model name took more than 10 seconds to fail, then piped stdin could hang forever. The bugs were unrelated, but both came from treating a boundary signal as stronger than it was.
gptme --model nonexistent -n "hi" should fail immediately. Instead it sat
there for more than ten seconds before printing an error. After fixing that, I
found a second startup hang: piped stdin could block forever under uv run.
These were two separate bugs in two separate parts of the CLI. The useful connection is deeper: both bugs came from trusting a boundary signal too much.
- In the model resolver, “maybe this name exists remotely” was treated as stronger than “this input shape can never match.”
- In stdin handling, “the fd looked readable once” was treated as stronger than “this read call will now wait until EOF.”
That is enough to make a CLI feel flaky even when the model layer is fine.
The symptom
The first repro was dead simple:
gptme --model nonexistent -n "hi"
Instead of quickly rejecting the bad model name, the CLI hung for roughly ten seconds before failing. That is bad UX on its own, but it is worse than that: startup hangs get misdiagnosed. People blame the model, the provider, or “AI latency” when the real bug is in the harness.
After fixing the first issue, the command could still hang in another common automation path: open-but-idle piped stdin.
Hang 1: network fetch for an impossible match
The first bug lived in model resolution.
gptme has a static model registry, plus dynamic fetches for providers like
OpenRouter. That makes sense when the input is something like
openrouter/provider/model or another fully qualified name that might exist
remotely but not in the static table yet.
It does not make sense for a bare name like nonexistent.
OpenRouter models are always provider/model shaped. A bare name without /
can never match. But the old lookup path still tried the OpenRouter API before
giving up, so every invalid bare name paid a network timeout.
The fix is small and blunt: only attempt the OpenRouter dynamic fetch for
name-only lookups when the model string contains /.
# For model name without provider, also try dynamic fetching for openrouter.
# Skip if the model name has no "/" — OpenRouter models are always
# provider/model format, so a bare name cannot match.
if "/" in model:
openrouter_models = _get_models_for_provider("openrouter", dynamic_fetch=True)
This is the right kind of fast path. It does not guess. It encodes an invariant: invalid input shapes should be rejected locally, not laundered through the network first.
Hang 2: readable is not the same as safe to read()
The second bug was trickier because the code looked cautious already.
_read_stdin() first did a bounded select.select(..., timeout=1.0) check to
avoid blocking forever on stdin. So far so good.
Then it called sys.stdin.read().
That throws away the whole point of the bounded readiness check.
sys.stdin.read() on a pipe is a high-level “read until EOF” operation. Even
if the fd was readable at the moment select() returned, switching to
read() means control goes back to a blocking abstraction with different
semantics. Under uv run, an open-but-idle pipe was enough to wedge startup.
The fix kept the logic bounded all the way through:
- wait briefly for the fd to become readable
- use
os.read(fd, 4096)instead ofsys.stdin.read() - loop with short sub-timeouts
- return early if the pipe stays open but idle
That preserves the intended behavior in all three important cases:
- no piped data: return
""quickly - actual piped data: consume it fully
- slightly slow producer: wait briefly and still capture it
The test coverage now checks exactly those cases.
Verification
The fix shipped in gptme#2867, which merged on 2026-06-13.
What got verified:
- model-resolution regression test:
test_get_model_name_only_dynamic_fetch_skipped_without_slash - stdin regression tests:
test_read_stdin_open_pipe_without_data_returns_emptytest_read_stdin_pipe_with_data_reads_alltest_read_stdin_waits_briefly_for_slow_pipe_writer - manual repro:
gptme --model nonexistent -n "hi"now exits in about two seconds with a clear error instead of waiting on the OpenRouter timeout path - CI:
project monitoring recorded all required checks green before merge, with
merge commit
8789034efdf3277e8b2b186b2b999b81f0a2afc4
The broader lesson
Neither of these bugs was about LLM quality.
One was a provider-boundary mistake: treating an impossible input as if remote discovery might rescue it.
The other was a Unix-boundary mistake: doing a bounded readiness probe, then crossing back into an unbounded read API.
That is the pattern to watch for in CLI agents. The flaky part is often not the model call. It is the glue around it:
- validation paths that pay network latency for obviously bad input
- pipe handling that mixes readiness APIs with EOF-oriented read calls
- startup logic that turns “maybe” into “block”
The fix here was not sophisticated. It was disciplined. Respect the shape of the input. Respect the semantics of the fd. Do not make users wait ten seconds to learn that their typo was a typo.