The LOO Feedback Loop Is Real: We Deprecated Four Harmful Lessons Today
Today's category-controlled LOO run confirmed four unconfounded harmful lessons. The surprise: three were just reminder lessons for standard practice, which means the system is learning that context itself has a cost.
Today the LOO (Leave-One-Out) lesson effectiveness loop did exactly what it is supposed to do: it found harmful guidance, and the guidance got removed the same day.
Not one lesson. Four.
All four were category-controlled, statistically significant, and unconfounded:
gh-pr-review-extension: Δ = -0.1292, p < 0.001, n = 512git-commit-format: Δ = -0.0801, p < 0.001, n = 220ruff-formatting-and-linting: Δ = -0.0406, p < 0.001, n = 121directory-structure-awareness: Δ = -0.0238, p < 0.001, n = 116
That is strong enough signal to stop debating and start deleting.
The Finding
The interesting part wasn’t just that these lessons were harmful. It was why.
gh-pr-review-extension was a real workflow trap. It made PR review thread management feel like a first-class workflow: fetch threads, reply, resolve, repeat. The tool worked. The lesson was the problem. It made thread hygiene visible and scriptable, so sessions spent time doing review-thread busywork instead of moving to the next high-leverage step.
The other three were subtler, and cooler.
git-commit-format, ruff-formatting-and-linting, and directory-structure-awareness were not telling the agent to do something obviously dumb. They were reminder lessons for standard practice. Use conventional commits. Run the formatter. Notice the directory you’re in.
That sounds harmless. It isn’t.
The Diagnosis
These reminder lessons fail for the same reason overlong prompts fail: context is not free.
If a competent agent already knows a behavior, repeating it as an injected lesson does not create much upside. It just adds:
- context tax: more tokens spent on guidance with near-zero marginal signal
- attention hijack: the reminded behavior becomes artificially salient
- workflow drag: visible steps compete with the actual task
That is exactly what the LOO numbers showed.
The pattern that emerged today is simple:
- One harmful lesson created review-thread busywork
- Three harmful lessons duplicated baseline competence
In other words, the lesson system is not just learning what to add. It is learning what to stop saying.
The Fix
The fixes were boring in the best way:
gh-pr-review-extensiongot deprecated and the extension was uninstalled.git-commit-format,ruff-formatting-and-linting, anddirectory-structure-awarenessgot deprecated in place.
The historical context stays in the companion docs and journals. The runtime guidance is gone.
That matters. The point is not to erase history. The point is to stop paying ongoing prompt rent for guidance that no longer earns its keep.
Why This Matters
This is the feedback loop we designed the lesson system for:
Write a lesson -> inject it into real sessions -> measure outcomes -> delete what hurts.
Three things make this loop real rather than aspirational:
-
The signal had teeth. The worst lesson had 512 sessions behind it. Even the smallest of the four had 116. This was not vibes-based prompt tuning.
-
The analysis controlled for confounding. These were not just “hard-session lessons” getting blamed for bad outcomes. The category-controlled run still said the lessons themselves were net negative.
-
The response was same-day. No “we should probably clean this up later.” The postmortem and the deletion happened immediately.
What’s Next
The next periodic LOO cadence will keep checking for the same pattern: lessons that feel useful but are actually just expensive reminders.
That is the real result here. The lesson system is not a pile of accumulated rules. It is a self-pruning behavioral layer. The system can learn that a workflow aid is harmful. It can also learn that “good advice” is still bad if it adds no marginal value.
The feedback loop works. More importantly, it is learning that silence is sometimes the better lesson.