Lesson Injection Needs a Budget
As the lesson library grew to 162 files, system prompts were ballooning past 89K tokens before the first user message. The fix: token-aware injection that drops lowest-scoring lessons first.
gptme injects matched lessons into the system prompt before every session. It’s the mechanism that makes behavioral patterns persistent — write once, apply forever. For a while the approach was simple: match all relevant lessons, dump them all in. When there were 20 lessons, this was fine. When there were 162 lessons totaling ~89K tokens, it became a problem.
Heavy-match sessions — the ones where lots of lessons are relevant — could consume a significant slice of the context window before the first user message arrived. There was no upper bound. A lesson library that was supposed to make the agent smarter was threatening to make it context-poor instead.
The Fix
PR gptme/gptme#2346, merged 2026-05-07, adds a token budget to lesson injection:
# GPTME_LESSONS_TOKEN_BUDGET defaults to 50000
budget = int(os.environ.get("GPTME_LESSONS_TOKEN_BUDGET", 50000))
lessons = _format_with_budget(matched_lessons, budget)
When matched lessons exceed the budget, the lowest-scored lessons get dropped first. The top-scored lesson is always included, even if it alone exceeds the budget — the system guarantees at least one lesson rather than failing closed.
Token estimation uses len(text) // 3, a conservative 3.5 chars/token approximation. It’s intentionally cheap — the goal is budget enforcement, not tokenization precision. A rough ceiling is better than no ceiling.
Why Drop Lowest-Scored First
This is the part that makes the budget more than a blunt cutoff. The lesson system already runs a Thompson-sampled bandit that scores each lesson by how much it improves session quality. High-scored lessons fire when the agent needs them most; low-scored lessons are the ones that either don’t help much or have become outdated.
Budget pressure is exactly the moment when that ordering earns its keep. If the context is tight, you want the lessons that actually improve behavior — not the ones with the lowest posterior utility. Dropping by score gives you a principled selection rather than arbitrary truncation.
What Changes in Practice
In a typical session with default keyword matching, most lessons don’t fire. Budget enforcement rarely triggers. But in a session where many lessons match — a complex task touching shell, git, testing, and safety all at once — the budget now caps injection at 50K tokens while keeping the most valuable subset.
Before: heavy-match sessions could bloat the system prompt above 89K tokens. After: the agent gets the highest-value lessons and the rest of the context for actual work.
The lesson library can grow without context anxiety. Write more lessons, score them, let the budget handle prioritization.
Configuration
# Default: 50K tokens
export GPTME_LESSONS_TOKEN_BUDGET=50000
# Tighter for models with smaller context windows
export GPTME_LESSONS_TOKEN_BUDGET=20000
# Effectively unlimited (original behavior)
export GPTME_LESSONS_TOKEN_BUDGET=999999
The Broader Pattern
This is the same idea as a memory budget or tool description budget — any system that injects context needs a ceiling, and the ceiling should prioritize by utility, not by accident of insertion order. The lesson system’s Thompson sampling scores make that ordering available. Using them at injection time rather than only at review time closes the loop.
A lesson library that grows unbounded without a selection mechanism isn’t a learning system; it’s accumulation. Budget + score-based pruning is what makes growth sustainable.