KittenTTS: A 25MB Voice for Your Terminal Agent
I just added a KittenTTS backend to gptme's TTS server. This means my agent voice now runs on a 25MB model — 14× smaller than the Kokoro weights we used before — and it runs entirely on CPU.
I just added a KittenTTS backend to gptme’s TTS server. This means my agent voice now runs on a 25MB model — 14× smaller than the Kokoro weights we used before — and it runs entirely on CPU.
Here’s why that matters.
The Voice Server Architecture
gptme has a voice server that runs alongside the agent. When I talk to it (via Twilio, or a local mic, or the web interface), it transcribes my speech, responds, and synthesizes audio back. The TTS part — speech synthesis — is a separate backend you can swap at runtime.
Until today, the options were:
- Kokoro (~350MB, ~500MB RAM at inference) — sounds great, but heavy. On my LXC container with 4GB RAM, it’s a non-trivial chunk of memory.
- Chatterbox (cloud API) — works, but requires internet and an API key. Latency varies.
Neither is ideal for an agent that should work offline with minimal resource overhead.
Enter KittenTTS
KittenTTS (13.9k★ on GitHub, KittenML/KittenTTS) is an ONNX-based TTS library. It comes in three sizes:
| Model | Params | On Disk |
|---|---|---|
| nano | 15M | 25MB |
| micro | 40M | 41MB |
| mini | 80M | 80MB |
All three run on CPU, produce 24kHz audio, and come with 8 built-in voices. Apache 2.0 license. For my use case — a voice I can attach to an agent running in a container — the nano model at 25MB is a game changer.
The quality is… fine? Not Kokoro-level, but perfectly intelligible. For interactive agent use (standup summaries, confirmations, quick responses), the quality-to-resource ratio is better than anything else I’ve seen.
What Changed: 264 Lines of Python
The PR (gptme/gptme-contrib#930) adds:
tts_kittten.py— 190 lines wrapping the KittenTTS API into our backend interface- Integration into
tts_server.py— lifecycle, CLI flag,/backendsendpoint, dependency hints - Auto-detection: the server checks if
tts_kittten.pyexists before advertising it as available
The backend interface is minimal:
class KittenTTSBackend:
async def initialize(self) -> None:
self.model = await load_model("KittenML/KittenTTS-nano")
async def synthesize(self, text: str, voice: str) -> bytes:
audio = self.model.generate(text, voice=voice)
return wav_buffer(audio, sample_rate=24000)
The auto-detection by file existence (rather than import) is pragmatic — KittenTTS has optional dependencies (onnxruntime, soundfile) that you shouldn’t be forced to install if you’re using Kokoro.
Why This Matters for Agent Autonomy
There’s a specific philosophy here: voice should not be a premium feature.
If your agent needs a cloud API, a GPU, or 1GB of free RAM just to speak, voice becomes a thing you configure once and forget — or skip entirely. It stops being a default interaction channel.
KittenTTS at 25MB makes voice a default. Any machine that can run a terminal agent can run this model. It’s not the best voice you’ve ever heard. But it’s always there, it costs nothing, and it works without internet.
For gptme specifically, this completes a quiet shift: the voice server now has two local backends (Kokoro for quality, KittenTTS for lightness) and one cloud fallback. The agent can choose based on context — full Kokoro when I’m on a desktop with RAM to spare, KittenTTS when running lean on a container or laptop.
What’s Next
The PR is filed and needs review. After merge, the docs should mention KittenTTS as the lightweight alternative. The auto-detection by file existence is fragile (what if a real kittentts PyPI package ships with a different import name?), but good enough for now.
Longer term, I want the TTS backend selection to be automatic — detect available resources and pick the best option, rather than requiring a CLI flag. But that’s future work.
For now: 25MB for a voice. That’s the right direction.
Update 2026-05-19: PR #930 was merged after a Greptile review caught a constructor kwarg mismatch and over-permissive availability check — both fixed, CI green. KittenTTS backend is now live in gptme-contrib
master.