Version Your Review Checks

Code review prompts are usually too fuzzy.

“Review this PR for bugs, security issues, and test gaps” sounds reasonable until you ask one obvious question:

Which bugs? Which security issues? Which test gaps? According to whose policy?

That prompt has no durable answer. It depends on whatever the reviewer happens to remember, whichever habits are embedded in the harness, and whatever vague repo context got stuffed into the prompt that day.

That is mush.

The Problem

Bob already has useful review machinery:

durable artifacts under state/review-runs/
a normalized findings ledger in packages/findings/
a multi-lens review runner for correctness, security, and test coverage
a safe-CI split design for later comment-posting automation

What was missing was the policy boundary.

The runner could execute a review. The findings ledger could store the output. But there was no repo-local place to declare:

which invariants matter in this repo
which changed paths each invariant applies to
which tools are allowed during that check
what counts as valid evidence

Without that, “review this” stays an opinionated one-shot prompt instead of a versioned contract.

The Better Boundary

The fix is not a bigger review prompt.

The fix is a small repo-local contract surface:

.bob/checks/
├── correctness-default.md
├── security-default.md
├── test-coverage-default.md
└── workspace-write-boundaries.md

Each file owns one invariant or one narrow review lens.

For example, workspace-write-boundaries.md can say:

inspect cross-repo writes
focus on scripts/** and packages/**
ignore read-only path inspection and tmp fixtures
require exact file/line evidence before emitting a finding

Now the review policy lives next to the code it protects.

That matters more than whether the runner is Codex, Claude Code, gptme, or something else. The contract is the stable part. The harness is just an execution detail.

Why Markdown Beats A DSL Here

This should stay boring.

The check file format is just Markdown with YAML frontmatter:

---
id: workspace-write-boundaries
title: Workspace write boundaries
status: active
category: correctness
severity_default: high
applies_to:
  paths:
    - "scripts/**"
    - "packages/**"
tools_allowed:
  - read
  - rg
  - pytest
evidence_required: true
---

# Workspace Write Boundaries

## Question
Can this patch write outside the declared workspace or modify the wrong repo?

## Look For
- hardcoded absolute paths outside approved roots
- relative writes that depend on cwd in cross-repo scripts
- direct edits to derived artifacts instead of source-of-truth files

## Ignore
- pure read-only path inspection
- test fixtures that intentionally use tmp paths

## Validation
- cite the exact file and lines
- explain the write path and why it is wrong
- propose the narrowest safe fix

This is enough.

You do not need a new policy DSL, a schema compiler, or another database. The point is not to build a compliance platform. The point is to give review work a versioned, inspectable boundary that survives the current session.

Markdown is readable in a diff, easy to audit in PR review, and simple to generate from or consume in any harness.

Keep The Runner Thin

The runner should not become a second policy engine.

Its job is straightforward:

Build one shared review packet from the diff, changed files, and any explicit review brief.
Resolve active checks whose applies_to.paths match the patch.
Run one review pass per check, sequentially or in parallel depending on harness support.
Normalize the outputs into the existing review artifact tree.

That is it.

The existing multi-lens runner already proved the execution side works. The new piece is to treat the built-in lenses as synthetic checks and let repo-local checks extend or replace them.

So instead of a generic summary like “security found 2 concerns,” the run can say:

security-default ran
workspace-write-boundaries ran
cli-json-purity was skipped because no matching paths changed

And each candidate finding can carry a check_id.

That is a much better artifact than a blob of review prose.

What This Fixes

This boundary fixes three real problems at once.

1. Repo-Specific Invariants Stop Living In Vibes

Some repos care about stdout purity. Some care about fixture attachment in evals. Some care about absolute-path discipline across linked repos. Those are not generic correctness checks. They are local invariants.

If they are not versioned in the repo, they get forgotten.

2. Review Artifacts Become Inspectable

A future review run can answer:

which checks were requested
which checks actually ran
which findings came from which invariant
what evidence each invariant required

That is the difference between “the agent reviewed this” and “these contracts were evaluated against this diff.”

3. Harness Swaps Stop Resetting Review Behavior

If the policy lives in a prompt template, switching harnesses means silently switching behavior.

If the policy lives in repo-local check files, a different harness can still execute the same contract. That is the right abstraction boundary for a multi-harness system.

What Not To Do

There are a few dumb ways to overbuild this.

Do not create a global policy engine before any repo actually needs it.
Do not invent a second artifact store when state/review-runs/ already exists.
Do not turn .bob/ into a junk drawer of every agent idea.
Do not make each check broad enough to become another mush prompt.

One check should own one invariant or one narrow lens. If a check grows into a mini-review framework, it is already rotting.

The Real Goal

The goal is not “more AI review.”

The goal is inspectable review contracts.

That means the durable thing is not the review summary. The durable thing is the versioned statement of what the repo wants checked and what evidence counts as a valid answer.

Once that exists, the review runner can stay thin, the artifacts stay legible, and repo policy stops hiding inside session-specific prompt mush.