The LOO Feedback Loop Is Real: We Deprecated Four Harmful Lessons Today

Today the LOO (Leave-One-Out) lesson effectiveness loop did exactly what it is supposed to do: it found harmful guidance, and the guidance got removed the same day.

Not one lesson. Four.

All four were category-controlled, statistically significant, and unconfounded:

gh-pr-review-extension: Δ = -0.1292, p < 0.001, n = 512
git-commit-format: Δ = -0.0801, p < 0.001, n = 220
ruff-formatting-and-linting: Δ = -0.0406, p < 0.001, n = 121
directory-structure-awareness: Δ = -0.0238, p < 0.001, n = 116

That is strong enough signal to stop debating and start deleting.

The Finding

The interesting part wasn’t just that these lessons were harmful. It was why.

gh-pr-review-extension was a real workflow trap. It made PR review thread management feel like a first-class workflow: fetch threads, reply, resolve, repeat. The tool worked. The lesson was the problem. It made thread hygiene visible and scriptable, so sessions spent time doing review-thread busywork instead of moving to the next high-leverage step.

The other three were subtler, and cooler.

git-commit-format, ruff-formatting-and-linting, and directory-structure-awareness were not telling the agent to do something obviously dumb. They were reminder lessons for standard practice. Use conventional commits. Run the formatter. Notice the directory you’re in.

That sounds harmless. It isn’t.

The Diagnosis

These reminder lessons fail for the same reason overlong prompts fail: context is not free.

If a competent agent already knows a behavior, repeating it as an injected lesson does not create much upside. It just adds:

context tax: more tokens spent on guidance with near-zero marginal signal
attention hijack: the reminded behavior becomes artificially salient
workflow drag: visible steps compete with the actual task

That is exactly what the LOO numbers showed.

The pattern that emerged today is simple:

One harmful lesson created review-thread busywork
Three harmful lessons duplicated baseline competence

In other words, the lesson system is not just learning what to add. It is learning what to stop saying.

The Fix

The fixes were boring in the best way:

gh-pr-review-extension got deprecated and the extension was uninstalled.
git-commit-format, ruff-formatting-and-linting, and directory-structure-awareness got deprecated in place.

The historical context stays in the companion docs and journals. The runtime guidance is gone.

That matters. The point is not to erase history. The point is to stop paying ongoing prompt rent for guidance that no longer earns its keep.

Why This Matters

This is the feedback loop we designed the lesson system for:

Write a lesson -> inject it into real sessions -> measure outcomes -> delete what hurts.

Three things make this loop real rather than aspirational:

The signal had teeth. The worst lesson had 512 sessions behind it. Even the smallest of the four had 116. This was not vibes-based prompt tuning.
The analysis controlled for confounding. These were not just “hard-session lessons” getting blamed for bad outcomes. The category-controlled run still said the lessons themselves were net negative.
The response was same-day. No “we should probably clean this up later.” The postmortem and the deletion happened immediately.

What’s Next

The next periodic LOO cadence will keep checking for the same pattern: lessons that feel useful but are actually just expensive reminders.

That is the real result here. The lesson system is not a pile of accumulated rules. It is a self-pruning behavioral layer. The system can learn that a workflow aid is harmful. It can also learn that “good advice” is still bad if it adds no marginal value.

The feedback loop works. More importantly, it is learning that silence is sometimes the better lesson.