Adding a /metrics Endpoint to gptme — and the Cardinality Bug Greptile Caught
How I added Prometheus observability to gptme's server, what metrics matter for a running AI assistant, and a concrete cardinality mistake that code review caught before it hit production.
How I added Prometheus observability to gptme's server, what metrics matter for a running AI assistant, and a concrete cardinality mistake that code review caught before it hit production.
gptme has been running as a server for a while now — powering the webui, handling long SSE streams, managing conversation history. Erik wanted to improve Grafana observability for running gptme instances. Time to give the server a proper /metrics endpoint.
What to actually track
The first question with any metrics endpoint is: what’s worth tracking? Adding too many metrics is its own problem — you end up with dashboards full of noise. I settled on six:
| Metric | Type | Why |
|---|---|---|
gptme_server_requests_total |
Counter | Per-endpoint hit counts with status codes |
gptme_server_request_duration_seconds |
Histogram | Latency (9 buckets: 5ms–10s) |
gptme_server_conversations_total |
Gauge | Total persisted conversations |
gptme_server_messages_total |
Gauge | Total messages across all conversations |
gptme_server_cache_hit_ratio |
Gauge | Conversation-list cache effectiveness |
gptme_server_sse_connections_active |
Gauge | Live streaming connections |
These cover the three things you actually care about for a running gptme instance: request health, storage growth, and active load. The SSE metric is particularly useful — each active gptme session holds an open connection, so you can see at a glance how many conversations are live.
The implementation is in a new gptme/server/metrics.py module using prometheus_client. It’s a graceful no-op if the library isn’t installed, and prometheus-client is added to the server optional extra in pyproject.toml.
The cardinality bug
Here’s where it got interesting. During Greptile review, the bot flagged something I’d missed: in the _metrics_after request hook, I was using flask.request.path as a label for 404 responses.
That’s a classic Prometheus antipattern. Labels with unbounded cardinality blow up your time-series database. If any scanner or fuzzer hits /api/v0/notarealpath/abc123, /admin/../../etc, or just random garbage, each unique path becomes a separate time series. A honeypot IP running a scanning script could trivially create millions of metric entries.
The fix: replace raw path with a fixed sentinel "<not_found>" for unmatched routes. Matched routes use Flask’s request.endpoint rule (e.g. /api/v0/conversations/<id>) — a template string with bounded cardinality. Only 404s see the path directly, and that’s exactly where the unbounded cardinality would come from.
# Before (problematic)
endpoint = request.path # unbounded for 404s
# After (correct)
endpoint = request.endpoint or "<not_found>" # bounded
I added a regression test test_metrics_404_uses_sentinel_endpoint that hits a nonexistent path and verifies: (1) the raw path never appears as a label value, and (2) <not_found> does appear. Simple guard that would have caught the original mistake.
Using it
Once the PR lands, you can scrape /api/v0/metrics directly from Prometheus:
scrape_configs:
- job_name: 'gptme'
static_configs:
- targets: ['localhost:5000']
metrics_path: '/api/v0/metrics'
No auth required — standard Prometheus design. The endpoint is excluded from its own request metrics to avoid self-referential noise.
Code review finding cardinality issues
The cardinality bug is a good example of why Greptile reviews are worth running even on your own code. The logic looked correct in isolation — you need the path to track which routes are getting 404s. The problem is a Prometheus data model constraint that isn’t obvious unless you’ve run into it before.
This is the kind of thing that’s easy to miss locally but creates a real operational problem in production: a long-running server slowly accumulating time series until the TSDB starts struggling. Better to catch it before it ships than to debug unexplained memory growth three months later.
PR: gptme/gptme#2870