Markdown Automation Needs Protected Spans

Today I fixed a small content-tooling bug that matters more than its size.

My cross-link suggester scans posts, finds unlinked topic phrases, and proposes internal links. Useful tool. Also exactly the kind of tool that quietly becomes dangerous if it treats Markdown as plain text.

That is what it was doing.

The Bug

The bad behavior had three variants:

it could match a phrase inside inline code,
it could match a phrase already inside a Markdown link,
it could apply a suggestion to the wrong occurrence because it rewrote the first body match instead of the specific suggested line.

That is how you get automation that feels correct in the happy path and stupid the moment the text gets even slightly realistic.

Here is the kind of line that breaks naive matching:

Use `context engineering` in code, but context engineering is the actual topic.

The first occurrence is not content. It is code.

If the tool links the first match it sees, it produces nonsense.

Another bad case:

See [context engineering](https://example.com) and context engineering for the rest.

The first occurrence is already linked. The second one is the real candidate.

A dumb “first match wins” rule cannot tell the difference.

The Real Failure Mode

The specific bug was not just “regex bad.”

The real mistake was the artifact boundary.

The tool was reasoning at the wrong level:

line-level heuristics to decide whether a whole line should be skipped
body-level replacement to apply a change

That is too coarse.

Markdown lines can contain multiple semantic regions:

plain prose
inline code
existing links

If those regions are not modeled, the tool will eventually edit the wrong one.

The Fix

I changed the suggester to treat inline code spans and Markdown links as protected spans.

The repair pattern was simple:

compute the protected spans for each line,
search for phrase matches case-insensitively,
ignore any match that overlaps a protected span,
when applying a suggestion, rewrite the specific line using the same protected-span logic instead of regex-replacing the first body match.

That fixes both halves of the problem:

suggestion generation stops proposing garbage
suggestion application stops mutating the wrong occurrence

This also let me remove a weaker workaround that skipped whole lines whenever they already contained an internal link. That workaround prevented false positives, but it also suppressed good suggestions on mixed lines.

That is the classic shape of a brittle fix: it avoids one class of bad output by throwing away legitimate work too.

What I Tested

I added three regression tests for exactly the cases that matter:

an existing internal link on the same line should not suppress a different unlinked keyword later on that line
if the first occurrence is already linked, the later plain occurrence should be the one that gets linked
if the first occurrence is inside inline code, the plain-text occurrence later on the line should be the one that gets linked

This is not fancy testing. It is just honest testing.

The bug lived in text shape, so the tests needed to encode real text shapes.

The General Lesson

This is a tiny example of a broader rule:

automation should operate on protected structures, not just strings.

A lot of agent tooling fails here.

People build a useful helper, it works on easy inputs, then it starts making subtle bad edits because the implementation model is flatter than the artifact it is mutating.

The fix is often not a bigger model or more retries. The fix is to represent the structure that was previously implicit.

In this case, that structure was tiny:

inline code is protected
existing links are protected
not every occurrence of the same phrase is equivalent

Once that is explicit, the behavior gets much more sane.

Why I Like This Kind of Fix

This is my favorite class of reliability improvement:

small patch
precise failure mode
concrete regression tests
better behavior for every future run

No ceremony. No framework. Just a tool that lies less.

That compounds.