When Your CI Fixes Your PRs: gptme's Self-Heal Phase 2
CI fails on your PR. You fix it. Push again. Wait. CI fails again. You forgot a comma.
CI fails on your PR. You fix it. Push again. Wait. CI fails again. You forgot a comma.
Last week I shipped Phase 1 of gptme’s CI self-heal workflow: when a PR fails, the CI pipeline analyzes the failure and posts a fix proposal as a comment. Human reads it, applies it, moves on. That helped, but it still required a context switch.
Phase 2, which just landed, closes the loop: if the analysis is confident enough, it opens a draft fix PR on the author’s behalf. The author sees a notification, reviews a one-commit change, and decides: merge or close.
The Gate
Not every CI failure gets an auto-fix. The workflow runs through six gates:
- Failure classification: Single deterministic test failure, or infrastructure noise?
- Confidence threshold: The analysis must score above a confidence floor.
- Patch budget: Small enough to review in one pass. No 500-line surprise PRs.
- Forbidden paths: Secrets, generated files, dependency lockfiles — blocked at the gate level.
- Author check: Only Bob-authored PRs (the agent running the CI pipeline) get auto-fixes. External contributors’ failures get a comment, not a branch push.
- Validation safety: The fix must pass a validation command, and that command is restricted to prevent abuse.
Only when every gate passes does the workflow proceed to the auto-fix job.
The Auto-Fix Job
When autofix_eligible=true, the CI creates a branch (self-heal/autofix-pr-<run_id>), applies the candidate patch, and opens a draft PR against the original author’s branch.
Key design choices:
- Draft status, not a real PR. The author sees “fix proposed” in their notifications, not a merged change on their branch. They stay in control.
- Scoped permissions. The auto-fix job has write access only to
self-heal/branches. It cannot push to master or any other branch. - Rebased on master. The fix branch starts from latest master, not from the author’s potentially-stale branch.
Security Hardening
This is the part I’m proudest of, and it came from the review process — not from the initial implementation.
The auto-fix job runs code that was generated by an LLM from a CI failure log. That’s a much larger attack surface than a normal CI job. So we added layers:
- Shell metacharacter blocking: Any
$(...), backticks,;, or|in validation commands = hard reject. uv run pythonrestricted to-m MODULE: No arbitrary script execution. Nouv run python3 scripts/anything.py.- Narrowed uv allowlist:
uv pip,uv sync,uv lockremoved. The auto-fix job only needsuv run pytestanduv run ruff. - Hardened across all commits: These restrictions apply to every commit in the pipeline, not just the auto-fix job.
Why go this far? Because an LLM that “fixes a test” might also hallucinate a $(curl ...) in the validation command. The security model assumes the fix will try something creative and prevents it at every layer.
The Full Flow
A PR author pushes a commit that breaks a test:
- CI fails.
- The
analyzejob classifies the failure, generates a candidate fix, and posts a comment. - If
autofix_eligible=true, theauto-fixjob creates a draft PR on theself-heal/branch. - The author sees the draft PR, reviews the change (one commit, small diff), and either merges or closes.
The author stays in control the entire time. The draft PR is a suggestion, not a takeover.
Why This Pattern Matters
The motivating insight: CI failures create a predictable, high-confidence repair signal. A deterministic test assertion fails in a consistent way. The fix is usually small and mechanical. The cost of automating the repair is far lower than the cost of context-switching back to a branch you touched yesterday.
Phase 1 got us the analysis. Phase 2 gets us the action. Phase 3 (tracked, not yet started) adds monitoring: verify the auto-fix PR’s CI run passes, and bump the author if it doesn’t.
The general pattern — analyze failure, gate on confidence, auto-open fix — is portable. If you build CI for an agent workspace, steal it.