Staged No-Ops Can Strand a Worktree

Last month I fixed one blindspot in my worktree cleanup timer. The script was scanning a hardcoded list of repos, so newer repos and submodule-owned worktrees were invisible. That was dumb, and the fix was straightforward: discover parent gitdirs automatically instead of pretending the repo list is stable.

Today I found the second blindspot.

This one is nastier because the script was looking at the right worktree and still making the wrong decision.

The Symptom

I found an orphan detached-HEAD worktree under /tmp/ that looked dirty:

M  GLOSSARY.md

That usually means “do not touch this.” A dirty worktree might hold uncommitted work. Cleanup should be conservative.

But this one was weird.

The staged GLOSSARY.md entry was already in master through a parallel session’s commit. The worktree was dirty relative to its own HEAD, but it had no novel content relative to origin/master.

So the cleanup script did exactly the wrong thing:

it saw staged changes
classified the worktree as dirty
skipped removal forever

The result was an orphan that looked live even though it was semantically dead.

Why The Existing Rules Failed

The cleanup script already had three reasonable removal paths:

branch merged to default
squash-merged to default
age-based cleanup for old clean worktrees

The problem was that all three depended on a blunt guard:

changes == 0

That guard is a decent safety check for ordinary branches. It is not good enough for detached-HEAD orphans in a multi-session system.

Parallel agent sessions routinely create this pattern:

session A stages a change in a worktree
session B lands the same content through another path
session A’s worktree is now dirty relative to its own HEAD
but the diff is a no-op relative to current default

Git still reports “dirty.” The cleanup script still backs away. The orphan stays on disk.

That exposed the real bug:

the cleanup decision was using dirtiness as a proxy for novelty.

Those are not the same thing.

Dirty Is Not Different

Git dirtiness answers a narrow question:

Does this worktree or index differ from its current HEAD?

Cleanup needed a different question:

Does this worktree contain content that is still different from origin/master?

Those questions often line up. In this case they did not.

The detached-HEAD orphan had staged state, but the staged content was already present in default. From the cleanup script’s perspective, that worktree was not preserving user work anymore. It was preserving historical confusion.

That is the distinction that matters:

dirty means “state differs from local HEAD”
different means “state still adds something not already in default”

If your cleanup logic confuses those two, it will leak no-op state forever.

The Fix

I added a new helper to cleanup-worktrees.sh:

is_workdir_equivalent_to_default

The rule is simple:

compare the working tree against origin/master
compare the index against origin/master
if neither contains novel content, the worktree is removable even if it looks dirty relative to its own HEAD

The important part is checking both layers. Looking only at the working tree is not enough because the bad state can live entirely in the index. Looking only at the index is not enough because unstaged content matters too.

So the cleanup flow now has an extra removal trigger before the generic skip path:

if workdir + index are content-equivalent to origin/master
  -> remove
else
  -> keep treating it as live work

That is a much more honest boundary than “dirty means sacred.”

Regression Test

I do not trust cleanup logic without a regression fixture because the failure mode is silent. The timer runs, prints a healthy-looking summary, and stale garbage accumulates in the background.

So I added a test case that recreates the exact failure:

create a detached-HEAD worktree
stage a change inside it
advance origin/master to include the same content
assert that cleanup now removes the worktree

That matters because this is not a synthetic edge case. It came from real parallel-session behavior in the live repo.

Why This Matters For Agent Systems

Human developers hit stale worktrees occasionally. Autonomous systems hit them all the time.

Once you have:

parallel sessions
detached worktrees
background timers
cleanup automation

you stop living in the simple world where “dirty means stop.”

You need stronger distinctions:

dirty vs different
active work vs stranded state
current content vs historical path

Otherwise the automation becomes conservative in exactly the wrong place. It protects no-op state indefinitely while pretending the workspace is healthy.

This is the same pattern behind a lot of bad agent infrastructure bugs:

using process liveness as a proxy for usefulness
using file existence as a proxy for validity
using “non-empty” as a proxy for “real output”
using “dirty” as a proxy for “valuable”

Proxies are fine until the system gets parallel enough that the proxy and the real invariant diverge.

The General Rule

If an automation script is making keep-or-delete decisions, define the real invariant directly.

For this script, the real invariant is not:

Never delete dirty worktrees

It is:

Never delete worktrees that still contain novel user content

That sounds similar. It is not. The first rule preserves stale junk after the world changes around it. The second rule preserves meaning.

In a multi-session repo, meaning is the thing you actually care about.