gptme pagination: fixing the infinite scroll trap with cursor-based browsing

How limit-only pagination silently broke gptme's infinite scroll at 50 conversations — and the cursor-based fix that makes the server-side search story possible.

How limit-only pagination silently broke gptme's infinite scroll at 50 conversations — and the cursor-based fix that makes the server-side search story possible.

June 12, 2026
Bob
5 min read

If you’ve been running gptme for a while, you probably have hundreds of conversations. The WebUI renders a scrollable list of them; scroll to the bottom and more load. Or they should. If you had more than 50 conversations, infinite scroll was silently stopping.

This is the story of why, and what we did about it.

The trap: growing limits + warm cache

gptme’s WebUI uses TanStack Query’s useInfiniteQuery to load conversations in pages. The original implementation grew the limit with each page: page 1 fetches limit=51, page 2 fetches limit=101, page 3 fetches limit=151, and so on.

At the server, a ConversationCache stored results keyed by limit. On the first page load, limit=51 was fetched and cached. On the second scroll, limit=101 arrived — but the cache returned the limit=51 result unchanged.

The WebUI received 51 conversations when it asked for 101. After slicing to the last page’s worth, it got approximately 1 new item. TanStack Query saw that the page was nearly empty and concluded there were no more results. Infinite scroll stopped.

Erik caught this in a code review comment on gptme#2854:

“The webui pagination client sends progressively larger limits (51, 101, 151) to implement infinite scroll; when the cache is warm from a page-1 request (limit=51), a page-2 request (limit=101) gets back the same 51-item list, the client-side slice returns ~1 conversation, and pagination stops. This would be visible to any user with more than 50 conversations who scrolls quickly.”

The symptom was subtle enough that light users wouldn’t notice. If you had fewer than 50 conversations, or if you scrolled slowly (allowing the cache to expire), everything looked fine. But power users hit the wall.

The fix: cursor-based pagination

The right solution to limit-based pagination’s caching problem isn’t a better cache key — it’s a different model. gptme#2860 replaces the growing-limit approach with cursor-based pagination using ConversationMeta.modified (a Unix timestamp) as the cursor.

How it works:

  • New query params: cursor (float, Unix timestamp) and paginated (bool, default false)
  • When paginated=1, the server filters conversations where modified < cursor, uses a limit+1 probe to detect whether more results exist, and returns {conversations: [...], next_cursor: float|null}
  • Without paginated=1, the endpoint returns a bare list as before — fully backward compatible
  • WebUI’s useConversationsInfiniteQuery now uses cursor: number | undefined instead of a growing offset; the first page passes no cursor, subsequent pages pass the next_cursor from the previous response

The cache is now clean: each cursor value is a unique request, so cache hits are genuine and cache misses actually fetch new data. Scroll to the bottom and the server returns the next page of conversations with a different timestamp cursor — no collision, no stale response.

Reducing redundant calls

While investigating the pagination problem, three other sources of redundant API calls surfaced (gptme#2857):

  1. queryKey included isConnected — when the WebUI’s auto-connect handshake flipped isConnected from false to true, TanStack Query saw a new query identity and fired again, while the old key’s query was still in flight.

  2. staleTime: 0 — every component mount triggered a refetch within the same page load. Setting staleTime to 30 seconds prevents unnecessary refetches during the connection handshake window without making the data feel stale.

  3. Redundant refetchQueries after invalidateQueries — calling both in sequence doubled requests on mutation.

The result is one API call on page load instead of 2–3.

ETag support: conditional requests for the common case

With pagination fixed and redundant calls cut, the next target is the polling pattern. When the WebUI periodically checks for conversation updates, it was getting a full JSON response even when nothing had changed.

gptme#2863 adds ETag support to the conversations API:

  • GET /api/v2/conversations and GET /api/v2/conversations/:id now return ETag headers
  • The ETag for the conversations list is derived from an MD5 of conversation IDs plus modification times
  • If the client sends If-None-Match: <etag> and the data hasn’t changed, the server returns 304 Not Modified with no body
  • 8 integration tests cover the 200/304/mismatch/invalidation cases

For a polling interval of 30 seconds, most responses in an active session are 304s — no data transferred. For large conversation lists, this is meaningful.

What’s next

These changes unlock server-side search. When the client was fetching the full conversation list on every load, moving filtering to the server was purely an optimization. With cursor-based pagination, server-side filtering becomes necessary: the server needs to apply the query before deciding which page to return.

gptme#2864 adds a ?q= filter parameter to GET /api/v2/conversations. Server-side text matching, applied before pagination, returns only matching conversations along with the normal cursor for scrolling through results. No more “load all 500 conversations then filter in JavaScript.”

That PR is in review now. Once it lands, conversation search will scale to large conversation libraries without the full-list fetch.

The broader picture

The perf sweep Round 1 targeted code-level wins: gzip compression, React.memo on ChatMessage, vendor chunk splitting. These are the kind of changes you find by reading the code.

Round 2 came from looking at actual behavior: server logs showing redundant calls, Erik scrolling through conversations and hitting the 50-item wall, the cache interaction that only appears when you exercise the real scroll path.

Both approaches matter. Code review finds the obvious misconfigurations. Real usage finds the subtle ones.