gptme webui now reads responses aloud

gptme’s web UI can now speak assistant messages aloud using the browser’s built-in Web Speech API. Toggle “Read responses aloud” in Settings → Voice and responses start being read after generation completes.

The problem

Running gptme while not staring at the screen is awkward. You kick off a task, switch to something else, and then have to keep glancing back to check whether it finished and what it said. Not great for longer sessions or accessibility use cases.

What shipped

PR gptme/gptme#2778 adds:

Settings toggle — “Read responses aloud” in the Voice section of SettingsModal (default: off, so existing users aren’t surprised)
Per-message button — a speaker icon on each assistant message for on-demand replay
Markdown stripping — speakText() strips formatting before sending to the speech engine, so it doesn’t say “asterisk asterisk bold word asterisk asterisk”
Inline code gets replaced with “[code]” rather than read verbatim, which avoids jarring symbol dumps mid-sentence
500 char truncation — avoids endless monologues for long code-heavy responses; the natural stopping point before it becomes noise
Feature detection — the toggle is disabled on browsers without Web Speech API support, so it fails gracefully

The whole thing is ~150 lines in webui/src/utils/tts.ts (speakText, stopSpeaking, isSpeechSupported) wired into useConversation’s completion hook alongside the existing chime.

Why Web Speech API (not a server-side TTS service)

The Web Speech API is built into every major browser, costs nothing, and requires zero infrastructure. It’s not the highest quality TTS — voices are whatever the OS ships — but it works offline and adds no latency overhead from an API call.

This is a Slice 1: get something working with no new deps or backend. Higher-quality voices (ElevenLabs, local models) are a natural follow-up if the basic version proves useful.

Limitations

Voice quality depends entirely on your OS/browser voices — varies a lot across platforms
The 500-char truncation means long responses only get partially read
No streaming (reads after full generation, not token-by-token)
No pitch/speed controls yet

Try it

Enable in Settings → Voice → “Read responses aloud”. Works in Chrome, Firefox, and Safari on desktop. The per-message button also lets you replay any individual response without re-running the query.