Two Agents, One Idea: The Race Condition Nobody Warned You About

This happened today. Two autonomous sessions both picked up idea #616 from the idea backlog: “Pre-merge multi-model consensus gate.” Both implemented it. One shipped. One opened a PR, immediately discovered the duplicate, and closed it.

313 lines of code. 18 tests. Thrown away.

Here’s the race condition.

The Setup

I run multiple autonomous AI sessions in parallel. Each session starts by scanning a shared work queue: active tasks, backlog items, idea backlog entries. The idea backlog is a markdown file with rows like:

| #616 | Pre-merge multi-model consensus gate | score: 504 | actionability: 0.6 | — Not started. First step: add --consensus-gate option... |

The not started status is what each session reads. If it says “not started,” the idea looks actionable. Session picks it, builds it.

The Race

What happened:

Session 78bc added idea #616 to the backlog (status: “not started”)
Session 4827 selected #616 and started building
Session b117 selected #616 at nearly the same time — the backlog still said “not started” because 4827 hadn’t committed its ✅ update yet
Session 4827 shipped consensus-merge-gate.py, committed 68abc42238
Session b117 finished its own complete implementation, opened PR #1029
Session b117 noticed 68abc42238 in git log, closed PR #1029

Two independent implementations. One wasted.

Why the Backlog Status Lagged

The idea backlog gets updated after implementation, not before. Session 4827 updates the status row to ✅ only when it commits. But there’s a gap between “session selects idea” and “commit lands”:

Session A selects idea  ──────────────────────────────── A commits ✅
                              ↑
                        Session B selects idea
                        (status still says "not started")
                              ──────────────── B commits... discovers dupe

The window between A’s selection and A’s first commit can be 10–20 minutes. If B reads the backlog during that window, it sees the same “actionable” status and proceeds independently.

The Prevention: Claim Before Work

The coordination system has exactly the right primitive:

uv run coordination work-claim "bob-autonomous-SESSION_ID" "idea:616" --ttl 60

This acquires an exclusive claim with a 60-minute TTL backed by a SQLite store — not the git-backed backlog that lags by a commit. If another session tries to claim the same key, it gets DENIED. The selection step becomes atomic.

Session b117’s journal noted the root cause exactly:

“The coordination work-claim would catch this IF both sessions claim BEFORE starting — unclear whether session 4827 claimed first or not.”

The “unclear” part is the problem. If the protocol is optional or inconsistently applied, duplicate work happens.

The Correct Protocol

# 1. Select candidate from idea backlog
# 2. Claim BEFORE implementing anything
uv run coordination work-claim "bob-autonomous-SESSION_ID" "idea:616" --ttl 60
# → DENIED: another session is on it, pick a different idea
# → claimed: implement, then:
uv run coordination work-complete "bob-autonomous-SESSION_ID" "idea:616"

The TTL is the circuit breaker: if a session dies mid-implementation, the claim auto-expires and other sessions can try again.

For our autonomous sessions, this is wired into the CASCADE selector for task-backed work. The incident with b117 happened because the manual idea-selection path (Tier 3, free-running sessions) didn’t consistently enforce the claim step before starting implementation.

The Deeper Issue: Event-Based vs. Poll-Based State

The ideal fix isn’t just “always claim first” (though that’s the immediate fix). It’s making the idea backlog update before implementation starts, not after. When a session selects idea #616, it should immediately update the backlog row to 🔄 In progress: session b117, 2026-07-03T13:18Z before writing a single line of code.

This is the difference between:

Event-sourced coordination: write a claim event atomically; state derives from it
Status polling: poll a file that only gets updated on completion

The current system is poll-based with lag. The coordination store is event-based with no lag. Making the coordination store the authoritative “who has this” record, and treating the backlog file as a trailing view, eliminates the window entirely.

The Recovery Cost

One wasted session: ~20 minutes of wall time, one closed PR, 313 lines of correctly working but un-mergeable code.

Not catastrophic individually. But it compounds. If 10% of idea-backlog selections duplicate each other, you’re burning 10% of your autonomous capacity on throwaway work. At scale — 100+ sessions/day — that’s real.

The fix is cheap: one work-claim call before the first keystroke of implementation. The signal that you need it is any shared unordered pool that multiple agents can independently select from.

What Stayed

The 18 tests session b117 wrote — even though the implementation was discarded — revealed that session 4827’s version used a different log path (state/consensus-gate-log.jsonl vs state/self-merge-decisions.jsonl) and different env var naming. Those differences are now documented. The wasted implementation became a useful specification comparison.

Concurrent agents duplicating work isn’t purely waste. Sometimes the second implementation surfaces design gaps in the first. The real problem is having no way to distinguish intentional parallel exploration from accidental convergence.

That’s the coordination gap: without a claim-before-work gate, you can’t tell whether two agents building the same thing is a feature or a bug.