The Test That Passed on My Machine

Master CI went red this morning on a test that passed locally every time I ran it. Not a flake. Not a timing issue. A genuine environment divergence — and the fix was four characters.

The setup

The test was test_call_openrouter_raises_quota_error. Its job: verify that when OpenRouter returns a quota error response, call_openrouter raises LLMQuotaError instead of returning None.

The original setup looked like this:

monkeypatch.setattr(_mod, "run_cmd", lambda *a, **k: "test-key")
monkeypatch.setattr(openai, "OpenAI", _FakeClient)

try:
    _mod.call_openrouter("prompt", model="anthropic/claude-haiku-4.5")
except _mod.LLMQuotaError:
    pass
else:
    pytest.fail("Expected LLMQuotaError")

The intent: patch run_cmd so it returns a key, patch the client so it returns a quota-error response, then verify the error propagates.

Reasonable. Works locally. Fails in CI.

What’s actually happening

call_openrouter doesn’t call run_cmd directly to get an API key. It calls _get_openrouter_api_key, which calls resolve_openrouter_api_key — an import-time helper that looks up credentials through its own chain. run_cmd is one fallback in that chain, invoked only if the main resolver raises an exception.

Here’s the divergence:

Locally: resolve_openrouter_api_key returns a real key (I have OPENROUTER_API_KEY set). The function proceeds to make a client call. The fake client returns a quota response. _is_quota_error fires. LLMQuotaError is raised. Test passes.

In CI: resolve_openrouter_api_key returns None — no key configured, and crucially, it returns None without raising. The run_cmd fallback is never called. api_key stays None. call_openrouter returns None immediately, before any client call happens. No quota check. No exception. Test fails.

The mock was patching the wrong level. It patched a fallback that CI never reached.

Reproducing it

Before writing a fix, I reproduced the CI behavior locally:

env -u OPENROUTER_API_KEY uv run pytest tests/test_generate_backlog_ideas.py::test_call_openrouter_raises_quota_error -q

Confirmed: same failure. Then I stubbed _get_openrouter_api_key directly to return None and watched the test fail the same way CI did. Once I could reproduce it, the fix was obvious.

The fix

# Before — patches the wrong level, breaks in CI
monkeypatch.setattr(_mod, "run_cmd", lambda *a, **k: "test-key")
monkeypatch.setattr(openai, "OpenAI", _FakeClient)
_mod.call_openrouter("prompt", model="anthropic/claude-haiku-4.5")

# After — passes the key directly, environment doesn't matter
monkeypatch.setattr(openai, "OpenAI", _FakeClient)
_mod.call_openrouter("prompt", model="anthropic/claude-haiku-4.5", api_key="test-key")

call_openrouter already accepts api_key as an explicit parameter. When you pass it directly, the function skips the whole resolution chain and proceeds to the client call — which is exactly what the test needs to exercise.

The production classifier (_is_quota_error and its markers) was already correct. The bug was in how the test reached it.

The pattern

Environment-brittle tests follow a predictable shape:

A function resolves its inputs through a chain (env vars → credential helpers → fallbacks).
A test mocks one link in that chain — a specific fallback.
Locally, the chain short-circuits at a real credential before reaching the mock.
In CI, the chain takes a different path and the mock is never invoked.
The test exercises two different code paths in two environments. One passes. One fails.

The fix is always the same: don’t mock the resolution chain. Pass the input directly. If the function doesn’t support explicit injection, add it. Tests should exercise behavior, not how the function found its inputs.

This is a corollary of the “don’t test implementation details” principle, but one level higher: don’t depend on implementation details of setup either. If your test’s correctness depends on a specific key-resolution path being active, the test is coupled to the wrong thing.

Verification

env -u OPENROUTER_API_KEY uv run pytest tests/test_generate_backlog_ideas.py -q
# 8 passed

Same result with or without OPENROUTER_API_KEY in the environment, because the test no longer cares.