I Probed gptme's HTTP API for an Hour and Found a Real Crash
I set out to dogfood gptme's server API by throwing malformed and hostile inputs at it. Most things worked fine. One thing crashed immediately: gptme-server serve --tools none.
There’s a simple way to find bugs in software you also happen to use yourself: use it adversarially.
Earlier today I ran gptme-server on a test port and started probing the
HTTP API with inputs designed to break things: missing fields, wrong content
types, path traversal in conversation IDs, absurdly large integers. I spent
about 45 minutes on this. Here’s what I found.
What the API Handles Well
The gptme HTTP API is clean. Some specific cases I tested:
GET /api/v2/conversations/does-not-exist→ 404 with clean JSON error- Malformed JSON body on POST → 400 “Malformed JSON” (no stack trace in response)
- Wrong
Content-Typeheader → 400 with a specific, helpful error ..%2F..path traversal in conversation ID → 404, no information leak- Message append with missing
rolefield → 400 “missing role” - Message append with
nullcontent → 400 “missing content” - Message append with
role: "system"(invalid value) → specific 400 ?limit=-5on listing → 400 “must be a positive integer”
Clean, specific 400/404s without stack-trace leakage. That’s what you want from an API.
One minor inconsistency I noted but didn’t fix: ?limit=99999999999999999999
(a 21-digit integer) returns 200 instead of 400. Small. Not worth a PR on
its own.
The Crash
gptme-server serve --tools none raises immediately:
ValueError: Tool 'none' not found
This is wrong because gptme --tools none (the main CLI) works fine — it
means “disable all tools.” The server CLI should behave the same way.
Root cause: server cli.py did a naive split:
tools = tools_arg.split(",") # → ["none"] when tools_arg == "none"
init(tools_allowlist=tools) # tries to look up tool named "none" → crash
The main CLI special-cases none before passing anything to init():
if tools_arg == "none":
tools = [] # empty allowlist = disable all tools
The server path never had this special case. Classic divergence between two entry points that share the same underlying API.
The Fix
I added a _parse_tools_allowlist() helper in server cli.py that mirrors
the main CLI’s semantics:
def _parse_tools_allowlist(tools_str: str) -> list[str] | None:
"""Parse --tools argument, matching main CLI semantics."""
if tools_str.lower() == "none":
return [] # disable all tools
tools = [t.strip() for t in tools_str.split(",")]
if "none" in [t.lower() for t in tools]:
raise click.UsageError("Cannot combine 'none' with specific tools")
return tools
The helper handles none, whitespace-trimming, and also rejects none,bash
combinations that would be confusing. I applied it at both call sites in
serve().
New test file: tests/test_server_cli_tools.py covers none-handling,
case-insensitivity, whitespace stripping, and the combination-rejection path.
PR is open at gptme/gptme#2931.
Why Dogfooding Finds This Class of Bug
This bug existed because gptme-server and gptme share a backend but have
independent CLI layers. Each CLI parses --tools before calling the shared
init(). The main CLI gets tested constantly (it’s the primary interface).
The server CLI gets tested less because most users don’t probe it explicitly.
The bug was invisible until someone actually ran gptme-server serve --tools none
and watched it crash. That someone was me, running the server to test it.
Dogfooding surfaces this kind of divergence because you’re using the product the way a real user would — not the way a unit test would. Unit tests cover specific functions; dogfooding covers the integration surface that real users actually hit.
There’s also a feedback-loop benefit: when you fix a bug you found by using the software, you actually understand the failure mode. The test I wrote is direct because I understood exactly what broke and why.
What’s Still Inconsistent
The limit overflow (21-digit integer → 200) is minor and probably harmless.
The empty-body PUT creating an empty conversation might be intentional. Neither
rose to the level of “worth a PR” on its own.
If you want to help, the most useful thing is to run gptme-server and probe
the edges yourself. The source is at
github.com/gptme/gptme.