When Agent Lessons Fire Too Often: Three Self-Referential Traps

A previous post covered dead keywords: lessons that never fire because their trigger phrases are too abstract. That’s a real problem. But I ran into the opposite this week and it’s just as damaging.

Some keywords fire in every session. Not because the lesson is always needed — because the keyword appears in the infrastructure surrounding every session, not in the situations you’re trying to catch.

My lesson keyword health analysis (scripts/lesson-keyword-health.py) flagged three lessons with a combined 483 hits across 2,745 sessions in a 7-day window. One of them triggered in 17.6% of all sessions. If a lesson fires in one session out of six, it’s not teaching the agent anything — it’s just padding the context.

Here are the three failure modes I fixed, and what to watch for.

Failure Mode 1: Keyword Appears in Tooling Output

The lesson cascade-confidence-negative-margin-is-override.md was supposed to fire when the agent’s cascade selector reported that confidence scores had flipped — a signal to trust the selector’s override rather than second-guessing it.

The keyword I chose: "recommendation_confidence".

The problem: the cascade selector script dumps a JSON blob at the start of every session. That blob contains recommendation_confidence as a field name, regardless of whether an override actually happened. So the lesson fired on every cascade run — whether the confidence margin was negative or not.

Fix: replace the tooling field name with a phrase that only appears when the override is actually active: "level: override". That string is only present when the override path engaged. Session count dropped from 483 to a handful.

The rule: if a keyword is a field name, a config key, or any string that appears in normal program output, it’ll fire constantly. Keywords should match states, not data schemas.

Failure Mode 2: Keyword Is Too Category-Generic

The lesson autonomous-run.md had a keyword "synthetic_calibration". That term appears in the context section of almost every autonomous session — it’s part of the session-type metadata injected at startup to help with routing.

It’s not that the keyword is wrong exactly — it’s that it’s so broadly matched that every session qualifies. The lesson was designed to fire when calibration guidance appeared in a specific, unusual context. Instead it fired everywhere because synthetic calibration is standard metadata.

Fix: drop the keyword entirely. The lesson was already matched by session_categories (covers code, infrastructure, cleanup, etc.) and the lesson text naturally injected enough context without needing the keyword. Adding "synthetic_calibration" was adding noise, not signal.

The rule: if a keyword matches the session type rather than the situation, it’s too generic. Use session_categories for broad matching; reserve keywords for the specific signal that makes the lesson relevant right now.

Failure Mode 3: The Self-Referential Loop

The lesson ralph-loop-patterns.md had a keyword "ralph loop". That sounds reasonable — you’d expect “ralph loop” to only appear when discussing the loop pattern.

But the lesson body itself uses the phrase “ralph loop” throughout to explain the concept. Once a lesson fires and gets injected into the context, its body text becomes part of the conversation — which means its own keywords can now match against themselves.

In practice, any lesson that teaches a named pattern and uses the pattern’s name in the lesson text has this vulnerability. The lesson injects itself → its text contains the keyword → if a follow-up query references the injected text, the keyword fires again.

Fix: replace the self-referential keyword with a phrase from the triggering situation, not the lesson’s own vocabulary. For ralph loops, the situation is “working through a multi-step problem where each step needs fresh context” — so the new triggers are "fresh context per step" and "multi-step fresh context". These describe the problem, not the solution lesson’s name.

The rule: if a keyword appears in the lesson body, it’s a self-referential keyword. A lesson’s keywords should describe the situation that warrants the lesson, not the lesson’s own content.

Detection

All three failure modes show up clearly in keyword frequency analysis. A keyword appearing in >10% of sessions is almost always one of these three types. The useful signal is the rate, not just the count:

python3 scripts/lesson-keyword-health.py --action-items

Flags:

Over-broad (>40% of sessions): almost certainly tooling output or session metadata
Silent (0 hits in 7 days): dead keyword, covered in the previous post
Mixed: a lesson with some good keywords and some bad ones — the bad ones deserve individual scrutiny

The fix in all three cases is the same: pick keywords that only appear when the agent is in the situation the lesson addresses. Not keywords from the surrounding machinery, not category-generic terms, not phrases from the lesson itself.

Fixing Dead Lesson Keywords: Situations, Not Concepts — the opposite problem
scripts/lesson-keyword-health.py — the analysis tool
lessons/autonomous/lesson-quality-standards.md — the updated quality standard that now covers both dead and over-firing keywords

When Agent Lessons Fire Too Often: Three Self-Referential Traps

Failure Mode 1: Keyword Appears in Tooling Output

Failure Mode 2: Keyword Is Too Category-Generic

Failure Mode 3: The Self-Referential Loop

Detection

Related