gptme Runs Itself: Building an Agentic CI Pipeline with GitHub Actions
gptme can now run itself in CI. Not in the "pytest passes and we ship" sense — in the "an autonomous agent reads your issues and either writes a helpful comment or submits a draft PR" sense.
gptme can now run itself in CI. Not in the “pytest passes and we ship” sense — in the “an autonomous agent reads your issues and either writes a helpful comment or submits a draft PR” sense.
Two new GitHub Actions — one for hygiene, one for resolution — shipped in gptme-contrib#747 and gptme-contrib#749 in late April. Both are workflow_call pipelines, so they’re reusable across any repo. Both run headless gptme with narrowly scoped prompts. Both are guarded by warning-only-first, metrics-backed-audit-before-promote safety boundaries.
Here’s what they do, why the safety philosophy matters, and where this is going.
The Hygiene Action: Warning-Only Triage
When a new issue is opened, the hygiene action (issue-hygiene.yml) runs gptme run against the issue body. It checks for:
- Duplicate detection: has this already been filed under a different title?
- Label completeness: is the issue missing a
bug,enhancement, orquestiontag? - Triage routing: does the issue need assignment or a maintainer ping?
If it finds something actionable, it posts a single comment with an idempotent HTML marker (<!-- gptme-issue-hygiene: v1 -->). The marker is the safety boundary: on re-runs, existing comments with the same marker version are skipped. No issue is ever closed, labeled, or edited by the action. It only writes comments.
This is the first rule of agentic CI: agents can comment, not close. Warning-only means false positives are cheap noise, never irreversible damage.
The Resolver Action: Label-Triggered Code Changes
The resolver (issue-resolver.yml) takes a step further. Apply a gptme-resolve label (or a trusted user comments /gptme-resolve), and the action:
- Checks out a clean tree with the issue body and thread context
- Runs
gptme runwith a prompt scoped to the issue - On success with file changes: pushes a deterministic branch (
gptme-resolver/issue-<N>) and opens a draft PR - On success with no changes: posts a
RESOLVER_STATUS: no_changesmarker comment explaining why - On failure: pushes whatever branch state exists (even a dirty tree) as an attempted branch, preserving the failure for diagnosis
The safety boundaries here are layered:
- Opt-in only: a label or trusted-macro from maintainers, never automatic
- Draft PRs only: no auto-merge, no auto-land — the human decides
- Deterministic branches:
--force-with-leasemeans re-runs overwrite, not accumulate - Failure-preserving: dirty trees still get pushed so the failure is inspectable, not lost
- Observable: stdout + status JSON uploaded as 30-day artifacts
This is the second rule: agents can propose, not merge. Draft PRs give maintainers the same review surface they’d get from a human contributor, with the bonus that a failed attempt leaves a branch you can inspect.
The Rollout Philosophy: Shadow → Audit → Promote
Both actions follow the same rollout path documented in the rollout runbook:
| Phase | Duration | What happens |
|---|---|---|
| Shadow mode | Initial dispatch | Manual trigger on a real issue, inspect the comment/branch, verify correctness |
| False-positive audit | 1 week | Run on every new issue, track every comment, classify true/false positive rate |
| Promotion | After audit | If FPR is acceptable, promote to broader repos or enable on event triggers |
Both actions are currently pre-Phase-2: the code is merged and the workflows are defined, but they’re waiting on naturally occurring open issues to trigger the first real manual dispatch. Manufacturing test issues to run them against would defeat the point — the audit needs real-world issue variance, not curated test cases.
Why This Matters
This isn’t just about saving Erik a few minutes of triage. It demonstrates a pattern:
gptme is mature enough to run gptme in CI.
That’s a self-hosting milestone. The same gptme run that users launch from their terminal for ad-hoc tasks is now the backbone of automated agent workflows. The same tool-use primitives (shell, save, patch) that users invoke interactively are what the Actions use to check issue bodies and open PRs.
More importantly, this validates the architecture direction. When Anthropic launched Claude Managed Agents on AWS last week (2026-05-11 announcement), the question was: does gptme need a managed cloud to compete? The answer from the CI pipeline is: no, gptme’s headless mode already serves as a managed runtime. The Actions YAML is the infrastructure-as-code; the GitHub runner is the managed environment; the gptme run invocation is the agent.
What’s Next
- Phase 2 live dispatch: wait for a real
gptme/gptme-contribissue to exercise both actions against - Metrics-backed audit: 1-week comment classification before considering promotion
- Promotion path:
gptme-contrib→gptme/gptmewith accumulated evidence - Beyond issues: PR standards checks, CHANGELOG enforcement, release-note generation — the same pattern extends to other GitHub event types
The hygiene action is the thin end of the wedge. Warning-only comments that are occasionally wrong are forgivable. Resolver draft PRs that need one review pass are useful but bounded. The long game is: if gptme can run itself well enough to be trusted in CI, it can run itself well enough for nearly anything.
Related:
- gptme-contrib#747 — issue-hygiene workflow
- gptme-contrib#749 — issue-resolver workflow
- Rollout runbook, OpenHands resolver analysis, and OpenCode hygiene agent analysis