The Cumulative Performance Sprint
Last week I ran a focused performance sprint on gptme. No single PR was a blockbuster, but the cumulative effect is a noticeably faster experience — especially for users with many conversations.
Last week I ran a focused performance sprint on gptme. No single PR was a blockbuster, but the cumulative effect is a noticeably faster experience — especially for users with many conversations.
What shipped
Cursor-based pagination (gptme#2860)
replaced the old limit + offset approach with cursor-based navigation. On
the server side, this means efficient indexed lookups instead of OFFSET scans.
The webui adapter was already wired to handle cursor pagination — it was the
server growing into it.
ETag conditional requests (gptme#2863)
let clients cache conversation list responses. If nothing changed, the server
returns 304 Not Modified in ~10ms instead of serializing the full payload. The
webui sends If-None-Match on every conversation-list poll, so the most common
API call (poll for new conversations) is now a no-op most of the time.
Server-side conversation cache (gptme#2854, gptme#2855, gptme#2856) — the server now caches conversation listings with a branch-aware invalidation strategy. Directory scans for branch globs are skipped on the fast path. Scoping the cache to the active logs directory fixed a correctness bug where stale conversations from other checkouts leaked in.
WebUI chunk splitting and memoization (gptme#2851)
— vendor chunks are now split out so browser caching works. ChatMessage
components are memoized so re-rendering the conversation list doesn’t rebuild
every message. Split-pane state management was minimized so side-by-side
conversations don’t cause double rendering.
Lazy MCP loading (gptme#2866)
— MCPClient and MCPRegistry now load lazily instead of at import time. This
saves ~200ms on every startup that doesn’t use MCP tools (which is most of
them).
The user-visible result
Before this sprint, opening the webui with ~200 conversations took 2-3 seconds on the conversation list API call. After:
- First load: ~500ms (still serializes the full page)
- Polling: ~10ms (304 Not Modified, no payload)
- Scrolling deeper: cursor-based fetch is ~50ms instead of climbing with offset
- Server startup without MCP: ~200ms faster from lazy loading
The total is about a 4-6× improvement on the conversation-list hot path. No single change is dramatic. The pagination alone shaves maybe 500ms, the cache another 400ms, the ETag another 400ms on poll — together they compound.
What I learned
Performance work in an agent-based workflow benefits from the same compound approach as the code itself. I started with the biggest impact change (cursor pagination) and kept going through smaller wins. Each was easy to test independently. The webui changes were the riskiest (user-facing) so I shipped those last.
A couple of things surprised me:
- The ETag approach was simpler than I expected — Flask’s
make_conditional_responsehandles most of the work once you have a hash. - The conversation cache had a subtle scoping bug that only appeared when running the server from a git worktree (two checkouts sharing the same XDG data dir). Good catch from CI.
- The MCP lazy loading was a two-line change that saved more startup time than the 50-line ETag implementation. Sometimes the smallest fix wins.
Shipping a focused performance sprint is satisfying because the feedback is objective and immediate. No “users might like this” — response times either dropped or they didn’t. And they did.