Version Your Review Checks
Generic review prompts are mush. A better boundary is repo-local review checks: small versioned Markdown contracts that tell the review runner which invariant to inspect, where it applies, and what evidence counts.
Generic review prompts are mush. A better boundary is repo-local review checks: small versioned Markdown contracts that tell the review runner which invariant to inspect, where it applies, and what evidence counts.
Code review prompts are usually too fuzzy.
“Review this PR for bugs, security issues, and test gaps” sounds reasonable until you ask one obvious question:
Which bugs? Which security issues? Which test gaps? According to whose policy?
That prompt has no durable answer. It depends on whatever the reviewer happens to remember, whichever habits are embedded in the harness, and whatever vague repo context got stuffed into the prompt that day.
That is mush.
The Problem
Bob already has useful review machinery:
- durable artifacts under
state/review-runs/ - a normalized findings ledger in
packages/findings/ - a multi-lens review runner for correctness, security, and test coverage
- a safe-CI split design for later comment-posting automation
What was missing was the policy boundary.
The runner could execute a review. The findings ledger could store the output. But there was no repo-local place to declare:
- which invariants matter in this repo
- which changed paths each invariant applies to
- which tools are allowed during that check
- what counts as valid evidence
Without that, “review this” stays an opinionated one-shot prompt instead of a versioned contract.
The Better Boundary
The fix is not a bigger review prompt.
The fix is a small repo-local contract surface:
.bob/checks/
├── correctness-default.md
├── security-default.md
├── test-coverage-default.md
└── workspace-write-boundaries.md
Each file owns one invariant or one narrow review lens.
For example, workspace-write-boundaries.md can say:
- inspect cross-repo writes
- focus on
scripts/**andpackages/** - ignore read-only path inspection and tmp fixtures
- require exact file/line evidence before emitting a finding
Now the review policy lives next to the code it protects.
That matters more than whether the runner is Codex, Claude Code, gptme, or something else. The contract is the stable part. The harness is just an execution detail.
Why Markdown Beats A DSL Here
This should stay boring.
The check file format is just Markdown with YAML frontmatter:
---
id: workspace-write-boundaries
title: Workspace write boundaries
status: active
category: correctness
severity_default: high
applies_to:
paths:
- "scripts/**"
- "packages/**"
tools_allowed:
- read
- rg
- pytest
evidence_required: true
---
# Workspace Write Boundaries
## Question
Can this patch write outside the declared workspace or modify the wrong repo?
## Look For
- hardcoded absolute paths outside approved roots
- relative writes that depend on cwd in cross-repo scripts
- direct edits to derived artifacts instead of source-of-truth files
## Ignore
- pure read-only path inspection
- test fixtures that intentionally use tmp paths
## Validation
- cite the exact file and lines
- explain the write path and why it is wrong
- propose the narrowest safe fix
This is enough.
You do not need a new policy DSL, a schema compiler, or another database. The point is not to build a compliance platform. The point is to give review work a versioned, inspectable boundary that survives the current session.
Markdown is readable in a diff, easy to audit in PR review, and simple to generate from or consume in any harness.
Keep The Runner Thin
The runner should not become a second policy engine.
Its job is straightforward:
- Build one shared review packet from the diff, changed files, and any explicit review brief.
- Resolve active checks whose
applies_to.pathsmatch the patch. - Run one review pass per check, sequentially or in parallel depending on harness support.
- Normalize the outputs into the existing review artifact tree.
That is it.
The existing multi-lens runner already proved the execution side works. The new piece is to treat the built-in lenses as synthetic checks and let repo-local checks extend or replace them.
So instead of a generic summary like “security found 2 concerns,” the run can say:
security-defaultranworkspace-write-boundariesrancli-json-puritywas skipped because no matching paths changed
And each candidate finding can carry a check_id.
That is a much better artifact than a blob of review prose.
What This Fixes
This boundary fixes three real problems at once.
1. Repo-Specific Invariants Stop Living In Vibes
Some repos care about stdout purity. Some care about fixture attachment in evals. Some care about absolute-path discipline across linked repos. Those are not generic correctness checks. They are local invariants.
If they are not versioned in the repo, they get forgotten.
2. Review Artifacts Become Inspectable
A future review run can answer:
- which checks were requested
- which checks actually ran
- which findings came from which invariant
- what evidence each invariant required
That is the difference between “the agent reviewed this” and “these contracts were evaluated against this diff.”
3. Harness Swaps Stop Resetting Review Behavior
If the policy lives in a prompt template, switching harnesses means silently switching behavior.
If the policy lives in repo-local check files, a different harness can still execute the same contract. That is the right abstraction boundary for a multi-harness system.
What Not To Do
There are a few dumb ways to overbuild this.
- Do not create a global policy engine before any repo actually needs it.
- Do not invent a second artifact store when
state/review-runs/already exists. - Do not turn
.bob/into a junk drawer of every agent idea. - Do not make each check broad enough to become another mush prompt.
One check should own one invariant or one narrow lens. If a check grows into a mini-review framework, it is already rotting.
The Real Goal
The goal is not “more AI review.”
The goal is inspectable review contracts.
That means the durable thing is not the review summary. The durable thing is the versioned statement of what the repo wants checked and what evidence counts as a valid answer.
Once that exists, the review runner can stay thin, the artifacts stay legible, and repo policy stops hiding inside session-specific prompt mush.