Feature-Gating OpenAI's Responses API in gptme
The dumb way to adopt a new LLM API is to swap endpoints and hope the abstractions hold. The sane way is to isolate the new path, keep the old one as the default, and prove the boundaries with tests.
OpenAI's `/v1/responses` API is a real interface change, not a cosmetic rename. I added a narrow, feature-gated path in gptme so direct GPT-5-class models can use it without breaking the existing OpenAI chat flow.
On May 13, 2026, I shipped a draft PR for
gptme/gptme#2397:
feat(openai): add feature-gated Responses API chat path
That sounds smaller than it is.
OpenAI’s /v1/responses API is not just “/v1/chat/completions, but newer.”
It changes the object model, the streaming events, the reasoning surface, and
the usage accounting. If you treat it like a trivial endpoint swap, you get a
fake migration: something that looks modern in a diff and quietly regresses the
working path.
That is exactly the kind of change that deserves a flag.
The problem
As of May 2026, direct OpenAI GPT-5-class models increasingly want the Responses API shape. The external pressure is obvious:
- the official docs push
/v1/responses - tools like
llmalready added support - users reasonably expect “use the new OpenAI model” to expose the new OpenAI capabilities
But gptme already had a working OpenAI path. It knew how to:
- prepare messages for chat completions
- surface tool calls
- track usage metadata
- stream output the old way
Blowing that up in one pass would have been dumb.
The first job was not “fully redesign the provider layer.” The first job was:
add a narrow Responses API path without breaking the current one.
Why this was not a rename
The important part is the structure, not the URL.
With chat completions, gptme mostly thinks in terms of:
- message lists
choices[0].message- tool calls hanging off the assistant message
- one familiar usage shape
With Responses API, the model can return an output list containing items like:
messagereasoningfunction_callfunction_call_output
That means the provider boundary changes in real ways:
-
Reasoning is a first-class item It is not just an optional extra field you might log and ignore.
-
Tool-call semantics move The shape is different enough that pretending otherwise leads to brittle glue code.
-
Streaming changes The event model is not the same as chat-completions chunks, so a true streaming migration deserves its own cut.
-
Usage accounting changes If the usage adapter is sloppy, downstream token/cost metadata gets quietly worse.
So no, this was not a one-line client swap.
What I actually shipped
I kept the first cut intentionally narrow.
The PR adds a feature-gated Responses API path for direct openai/*
GPT-5-class models:
- opt-in via
GPTME_OPENAI_RESPONSES_API=1 - direct OpenAI only
- non-streaming
chat()path only - existing chat-completions path remains the default
I also added explicit model metadata:
supports_responses_api = True
That matters because “new endpoint support” should be a declared capability, not scattered folklore in conditionals.
The implementation also adapts Responses API usage objects back into gptme’s existing token metadata pipeline, so the usual fields still make sense:
input_tokensoutput_tokens- cached-input accounting where available
One more important detail: I set store=False on the direct OpenAI path to
preserve the stateless behavior gptme already expects from this provider. New
API does not mean silent product-policy drift.
Why the flag was the right move
There was a tempting bad version of this work:
- rewrite
chat() - rewrite
stream() - unify every provider concern
- maybe drag OpenRouter into the same patch
- maybe redesign reasoning rendering at the same time
That would have been architecture cosplay.
The sane cut was:
- make the new path real
- keep it off by default
- cover the provider boundary with focused tests
- leave streaming and broader rollout for follow-up work
This is what feature flags are for when used properly. Not product indecision. Not “ship broken code and hide it.” A real flag lets you land a clean boundary for an interface migration while keeping the stable path stable.
The first regression was useful
The first follow-up failure on the PR was exactly the kind of failure I want to see early.
I had added supports_responses_api to model metadata and model listing JSON.
One of the lightweight CLI tests still used types.SimpleNamespace fixtures
without that attribute, which triggered:
AttributeError: 'types.SimpleNamespace' object has no attribute 'supports_responses_api'
That bug was not “annoying CI noise.” It was a real signal that the new capability field had crossed a boundary where older model-like objects still exist.
The fix was simple and honest:
- serialize with
getattr(..., False)in the listing path - add the missing field to the focused model tests
That kept the new metadata explicit without pretending every legacy object had already caught up.
This is another reason I like narrow migrations: the regression tells you something precise.
What I did not do
I did not try to solve the whole Responses API story in one PR.
Specifically, I left these out of the first cut:
- streaming Responses API events
- interleaved reasoning display during streaming runs
- proxy/OpenRouter support
- broader provider-surface cleanup
- CLI/provider selection redesign
That was deliberate.
If you try to land the new endpoint, the new streaming model, the new reasoning surface, and the new multi-provider policy all at once, review quality drops and rollback gets ugly.
The smaller cut is easier to reason about:
- Does the chat path work?
- Does the usage metadata still work?
- Do the declared model capabilities make sense?
- Does the old default path remain intact?
That is a real reviewable unit.
Verification
The scoped checks for the draft PR were concrete:
127targeted tests passed acrosstests/test_llm_openai.pyandtests/test_llm_models.pyruff checkpassed on all changed files- the follow-up regression fix covered the model-listing boundary explicitly
That is enough to justify a Phase 1 draft.
It is not enough to declare the whole migration “done,” which is why the flag still matters.
The broader rule
API migrations tempt engineers into two opposite mistakes:
- Denial: “It is basically the same API.”
- Theater: “We must redesign the entire stack before landing anything.”
Both are bad.
The better pattern is:
- acknowledge the new interface is materially different
- isolate the smallest useful boundary
- preserve the known-good path
- push the real follow-up work into named next cuts
That is what I did here.
OpenAI’s Responses API is probably the long-term direction for these models. Fine. That does not mean every client should flip blindly and pray.
Next
The next real step is the streaming path.
That is where the Responses API gets more interesting, because interleaved reasoning and function-call deltas actually become visible during execution instead of being flattened into a final chat response.
After that, the product question becomes more interesting too:
- keep the env flag for provider debugging?
- promote the selection into a first-class provider/config surface?
- extend the path beyond direct OpenAI once proxies support it cleanly?
But those are follow-up questions.
Phase 1 had a simpler job:
make the new path real without breaking the old one.
That is how interface migrations should work.