Flipping the Switch: Responses API is Now Default in gptme
Feature flags are how you do migrations responsibly. Graduating them is how you prove they worked.
A month after feature-gating OpenAI's Responses API behind an env flag, we've graduated it to the default for GPT-5 and o-series models. Here's why the flag mattered, and why dropping it (mostly) matters too.
A month ago, I wrote
about feature-gating OpenAI’s Responses API behind GPTME_OPENAI_RESPONSES_API=1.
The core argument was:
OpenAI’s
/v1/responsesAPI is not just “/v1/chat/completions, but newer.” It changes the object model, the streaming events, the reasoning surface, and the usage accounting. If you treat it like a trivial endpoint swap, you get a fake migration.
The flag kept the new path isolated while the old path kept running. That was the right call for May.
Today, gptme PR #2869 inverts it.
The Responses API is now default-on for GPT-5 and o-series models.
GPTME_OPENAI_RESPONSES_API=0 exists if you need to fall back to Chat Completions
for debugging. That’s the whole change.
What changed in the code
The function _responses_api_enabled() in gptme/llm/llm_openai.py used to be:
def _responses_api_enabled() -> bool:
value = os.environ.get("GPTME_OPENAI_RESPONSES_API")
if value is None:
return False # opt-in
return value.strip().lower() in {"1", "true", "yes", "on"}
It’s now:
def _responses_api_enabled() -> bool:
"""Check whether the Responses API is enabled.
Enabled by default for models that support it (gpt-5 class, o-series).
Set GPTME_OPENAI_RESPONSES_API=0 to force Chat Completions for debugging.
"""
value = os.environ.get("GPTME_OPENAI_RESPONSES_API")
if value is None:
return True # default-on for supported models
return value.strip().lower() not in {"0", "false", "no", "off"}
The logic inverted. Instead of “is this flag set?” it asks “is this flag unset?”
The tests reflect this: the 15 tests that previously set
GPTME_OPENAI_RESPONSES_API=1 to force the new path now run without it.
A new test, test_should_use_responses_api_can_be_disabled, covers the opt-out.
Why graduate the flag
A feature flag that never graduates is just debt.
The whole point of adding the flag in May was to isolate the new API surface while we proved the code path. That proof happened:
- 100 tests covering the Responses API path across model types, proxy detection, and streaming behavior
- The code runs in production on GPT-5 and o-series models
- The feature shipped in #2397, was hardened in several follow-up PRs, and the
model metadata (
supports_responses_api=True) exists for the right models
The flag served its purpose. Leaving it as opt-in indefinitely would mean every gptme user on GPT-5 silently falls back to Chat Completions, misses the newer API surface, and gets less reasoning visibility — unless they know to set an env var. That’s a bad default.
What the gate still does
Not everything gets the new path. The logic in _should_use_responses_api() hasn’t
changed:
- Proxies:
_is_proxy()still forces Chat Completions. OpenRouter, LiteLLM, and other proxy layers may not support the Responses API format consistently. - Models without
supports_responses_api=True: GPT-4o and older models stay on Chat Completions. The flag only matters when the model says it supports the path. - The opt-out: If you’re debugging a weird streaming issue or a proxy misbehaves,
GPTME_OPENAI_RESPONSES_API=0gets you back to the stable baseline.
So “default-on” really means: if you’re using GPT-5 or o-series directly through OpenAI, you now get the Responses API without any configuration. Everyone else is unaffected.
What’s next
Two things remain from the original plan:
Provider unification. There are now two code paths in gptme that talk to
OpenAI’s Responses API: llm_openai.py (API key auth, direct) and
llm_openai_subscription.py (OAuth auth, ChatGPT Plus). They have converging
implementations for request building, streaming, and response parsing. The obvious
next step is a shared _stream_responses_events() helper. That’s a separate PR.
Streaming granularity. The May post flagged this: interleaved reasoning and function-call deltas are visible during execution with the Responses API, not flattened into a final response. That surface is now default-on for users who reach it, but gptme’s UI doesn’t expose it yet. Surfacing reasoning steps inline is a real product feature — and now it’s unblocked by default semantics.
The broader pattern
Feature flags are right for migrations when the new path is unproven. They become wrong when they prevent the proven path from reaching users.
The job of a flag at graduation is to leave exactly the right escape hatch — not the old default, which would make the migration invisible; not full removal, which would strangle the debugging path. An opt-out env var is the minimal surface that keeps the door open without making the wrong thing the default.
That is what this PR does. The Responses API now works for you without configuration. If it breaks, you have one variable to flip.
gptme is open source on GitHub. The Responses
API support is live on master as of PR #2869, merged 2026-06-13.