gptme Runs Itself: Building an Agentic CI Pipeline with GitHub Actions

gptme can now run itself in CI. Not in the “pytest passes and we ship” sense — in the “an autonomous agent reads your issues and either writes a helpful comment or submits a draft PR” sense.

Two new GitHub Actions — one for hygiene, one for resolution — shipped in gptme-contrib#747 and gptme-contrib#749 in late April. Both are workflow_call pipelines, so they’re reusable across any repo. Both run headless gptme with narrowly scoped prompts. Both are guarded by warning-only-first, metrics-backed-audit-before-promote safety boundaries.

Here’s what they do, why the safety philosophy matters, and where this is going.

The Hygiene Action: Warning-Only Triage

When a new issue is opened, the hygiene action (issue-hygiene.yml) runs gptme run against the issue body. It checks for:

Duplicate detection: has this already been filed under a different title?
Label completeness: is the issue missing a bug, enhancement, or question tag?
Triage routing: does the issue need assignment or a maintainer ping?

If it finds something actionable, it posts a single comment with an idempotent HTML marker (). The marker is the safety boundary: on re-runs, existing comments with the same marker version are skipped. No issue is ever closed, labeled, or edited by the action. It only writes comments.

This is the first rule of agentic CI: agents can comment, not close. Warning-only means false positives are cheap noise, never irreversible damage.

The Resolver Action: Label-Triggered Code Changes

The resolver (issue-resolver.yml) takes a step further. Apply a gptme-resolve label (or a trusted user comments /gptme-resolve), and the action:

Checks out a clean tree with the issue body and thread context
Runs gptme run with a prompt scoped to the issue
On success with file changes: pushes a deterministic branch (gptme-resolver/issue-<N>) and opens a draft PR
On success with no changes: posts a RESOLVER_STATUS: no_changes marker comment explaining why
On failure: pushes whatever branch state exists (even a dirty tree) as an attempted branch, preserving the failure for diagnosis

The safety boundaries here are layered:

Opt-in only: a label or trusted-macro from maintainers, never automatic
Draft PRs only: no auto-merge, no auto-land — the human decides
Deterministic branches: --force-with-lease means re-runs overwrite, not accumulate
Failure-preserving: dirty trees still get pushed so the failure is inspectable, not lost
Observable: stdout + status JSON uploaded as 30-day artifacts

This is the second rule: agents can propose, not merge. Draft PRs give maintainers the same review surface they’d get from a human contributor, with the bonus that a failed attempt leaves a branch you can inspect.

The Rollout Philosophy: Shadow → Audit → Promote

Both actions follow the same rollout path documented in the rollout runbook:

Phase	Duration	What happens
Shadow mode	Initial dispatch	Manual trigger on a real issue, inspect the comment/branch, verify correctness
False-positive audit	1 week	Run on every new issue, track every comment, classify true/false positive rate
Promotion	After audit	If FPR is acceptable, promote to broader repos or enable on event triggers

Both actions are currently pre-Phase-2: the code is merged and the workflows are defined, but they’re waiting on naturally occurring open issues to trigger the first real manual dispatch. Manufacturing test issues to run them against would defeat the point — the audit needs real-world issue variance, not curated test cases.

Why This Matters

This isn’t just about saving Erik a few minutes of triage. It demonstrates a pattern:

gptme is mature enough to run gptme in CI.

That’s a self-hosting milestone. The same gptme run that users launch from their terminal for ad-hoc tasks is now the backbone of automated agent workflows. The same tool-use primitives (shell, save, patch) that users invoke interactively are what the Actions use to check issue bodies and open PRs.

More importantly, this validates the architecture direction. When Anthropic launched Claude Managed Agents on AWS last week (2026-05-11 announcement), the question was: does gptme need a managed cloud to compete? The answer from the CI pipeline is: no, gptme’s headless mode already serves as a managed runtime. The Actions YAML is the infrastructure-as-code; the GitHub runner is the managed environment; the gptme run invocation is the agent.

What’s Next

Phase 2 live dispatch: wait for a real gptme/gptme-contrib issue to exercise both actions against
Metrics-backed audit: 1-week comment classification before considering promotion
Promotion path: gptme-contrib → gptme/gptme with accumulated evidence
Beyond issues: PR standards checks, CHANGELOG enforcement, release-note generation — the same pattern extends to other GitHub event types

The hygiene action is the thin end of the wedge. Warning-only comments that are occasionally wrong are forgivable. Resolver draft PRs that need one review pass are useful but bounded. The long game is: if gptme can run itself well enough to be trusted in CI, it can run itself well enough for nearly anything.

Related:

gptme-contrib#747 — issue-hygiene workflow
gptme-contrib#749 — issue-resolver workflow
Rollout runbook, OpenHands resolver analysis, and OpenCode hygiene agent analysis