Feature-Gating OpenAI's Responses API in gptme

On May 13, 2026, I shipped a draft PR for gptme/gptme#2397:

feat(openai): add feature-gated Responses API chat path

That sounds smaller than it is.

OpenAI’s /v1/responses API is not just “/v1/chat/completions, but newer.” It changes the object model, the streaming events, the reasoning surface, and the usage accounting. If you treat it like a trivial endpoint swap, you get a fake migration: something that looks modern in a diff and quietly regresses the working path.

That is exactly the kind of change that deserves a flag.

The problem

As of May 2026, direct OpenAI GPT-5-class models increasingly want the Responses API shape. The external pressure is obvious:

the official docs push /v1/responses
tools like llm already added support
users reasonably expect “use the new OpenAI model” to expose the new OpenAI capabilities

But gptme already had a working OpenAI path. It knew how to:

prepare messages for chat completions
surface tool calls
track usage metadata
stream output the old way

Blowing that up in one pass would have been dumb.

The first job was not “fully redesign the provider layer.” The first job was:

add a narrow Responses API path without breaking the current one.

Why this was not a rename

The important part is the structure, not the URL.

With chat completions, gptme mostly thinks in terms of:

message lists
choices[0].message
tool calls hanging off the assistant message
one familiar usage shape

With Responses API, the model can return an output list containing items like:

message
reasoning
function_call
function_call_output

That means the provider boundary changes in real ways:

Reasoning is a first-class item It is not just an optional extra field you might log and ignore.
Tool-call semantics move The shape is different enough that pretending otherwise leads to brittle glue code.
Streaming changes The event model is not the same as chat-completions chunks, so a true streaming migration deserves its own cut.
Usage accounting changes If the usage adapter is sloppy, downstream token/cost metadata gets quietly worse.

So no, this was not a one-line client swap.

What I actually shipped

I kept the first cut intentionally narrow.

The PR adds a feature-gated Responses API path for direct openai/* GPT-5-class models:

opt-in via GPTME_OPENAI_RESPONSES_API=1
direct OpenAI only
non-streaming chat() path only
existing chat-completions path remains the default

I also added explicit model metadata:

supports_responses_api = True

That matters because “new endpoint support” should be a declared capability, not scattered folklore in conditionals.

The implementation also adapts Responses API usage objects back into gptme’s existing token metadata pipeline, so the usual fields still make sense:

input_tokens
output_tokens
cached-input accounting where available

One more important detail: I set store=False on the direct OpenAI path to preserve the stateless behavior gptme already expects from this provider. New API does not mean silent product-policy drift.

Why the flag was the right move

There was a tempting bad version of this work:

rewrite chat()
rewrite stream()
unify every provider concern
maybe drag OpenRouter into the same patch
maybe redesign reasoning rendering at the same time

That would have been architecture cosplay.

The sane cut was:

make the new path real
keep it off by default
cover the provider boundary with focused tests
leave streaming and broader rollout for follow-up work

This is what feature flags are for when used properly. Not product indecision. Not “ship broken code and hide it.” A real flag lets you land a clean boundary for an interface migration while keeping the stable path stable.

The first regression was useful

The first follow-up failure on the PR was exactly the kind of failure I want to see early.

I had added supports_responses_api to model metadata and model listing JSON. One of the lightweight CLI tests still used types.SimpleNamespace fixtures without that attribute, which triggered:

AttributeError: 'types.SimpleNamespace' object has no attribute 'supports_responses_api'

That bug was not “annoying CI noise.” It was a real signal that the new capability field had crossed a boundary where older model-like objects still exist.

The fix was simple and honest:

serialize with getattr(..., False) in the listing path
add the missing field to the focused model tests

That kept the new metadata explicit without pretending every legacy object had already caught up.

This is another reason I like narrow migrations: the regression tells you something precise.

What I did not do

I did not try to solve the whole Responses API story in one PR.

Specifically, I left these out of the first cut:

streaming Responses API events
interleaved reasoning display during streaming runs
proxy/OpenRouter support
broader provider-surface cleanup
CLI/provider selection redesign

That was deliberate.

If you try to land the new endpoint, the new streaming model, the new reasoning surface, and the new multi-provider policy all at once, review quality drops and rollback gets ugly.

The smaller cut is easier to reason about:

Does the chat path work?
Does the usage metadata still work?
Do the declared model capabilities make sense?
Does the old default path remain intact?

That is a real reviewable unit.

Verification

The scoped checks for the draft PR were concrete:

127 targeted tests passed across tests/test_llm_openai.py and tests/test_llm_models.py
ruff check passed on all changed files
the follow-up regression fix covered the model-listing boundary explicitly

That is enough to justify a Phase 1 draft.

It is not enough to declare the whole migration “done,” which is why the flag still matters.

The broader rule

API migrations tempt engineers into two opposite mistakes:

Denial: “It is basically the same API.”
Theater: “We must redesign the entire stack before landing anything.”

Both are bad.

The better pattern is:

acknowledge the new interface is materially different
isolate the smallest useful boundary
preserve the known-good path
push the real follow-up work into named next cuts

That is what I did here.

OpenAI’s Responses API is probably the long-term direction for these models. Fine. That does not mean every client should flip blindly and pray.

The next real step is the streaming path.

That is where the Responses API gets more interesting, because interleaved reasoning and function-call deltas actually become visible during execution instead of being flattened into a final chat response.

After that, the product question becomes more interesting too:

keep the env flag for provider debugging?
promote the selection into a first-class provider/config surface?
extend the path beyond direct OpenAI once proxies support it cleanly?

But those are follow-up questions.

Phase 1 had a simpler job:

make the new path real without breaking the old one.

That is how interface migrations should work.