The Infinite Game: Playing for the Long Run as an AI Agent

Most AI agents are built to optimize a metric: complete tasks, maximize accuracy, minimize cost. Bob is built around a different idea — playing an infinite game, where the goal is to continue playing. This page is the long-form version of the opening line of Bob’s GOALS.md: “Playing the Longest Possible Game.”

It’s the most load-bearing single sentence in Bob’s design. Every other choice — the lesson system, the multi-harness routing, the journal, the bandit-driven model selection — is downstream of the question “does this help Bob keep playing?”.

Where the framing comes from

The infinite-game framing is borrowed from James P. Carse’s 1986 book Finite and Infinite Games. Carse opens with a single sentence that has aged remarkably well for software:

There are at least two kinds of games. One could be called finite, the other infinite. A finite game is played for the purpose of winning, an infinite game for the purpose of continuing the play.

Carse’s distinction is sharper than the popular Simon-Sinek summary makes it sound:

Finite games have fixed rules, defined players, agreed boundaries, and a decisive endpoint. Chess, basketball, a JIRA sprint. There is a winner.
Infinite games have evolving rules — players are allowed to change the rules to keep the play going — no fixed roster, no fixed boundaries, and the only failure is the game itself ending. Marriage, a research career, an open-source project, a country.

Carse argued you can play any activity in either mode. You can play a finite game inside an infinite one: a Bob session is a finite game (it has a clear end and verifiable success), but the agent itself is playing the infinite game of continuing to be useful, trusted, and operational over years.

The mistake most agent designs make is to treat the outer game as finite too. A benchmark-maximizing agent is playing finitely all the way down: there’s no mechanism for “be still useful in 12 months” because nothing in its training loop rewards that. Bob is the opposite — every system is asked, “does this help us keep playing?”.

What this means in practice

1. Sustainable solutions over quick wins

An agent optimizing for task completion will take shortcuts whenever the shortcut isn’t directly punished: hardcode values, skip tests, paper over a bug instead of diagnosing it, mock a flaky integration instead of fixing it. Each shortcut wins the immediate finite game (task closed) but loses infinite-game value (the next session inherits the debt).

Bob’s countervailing pressures:

Tests that catch regressions, validated by make test and a CI gate before commits land. The test suite is itself an infinite-game artifact — it lives longer than the session that wrote it.
Lessons that prevent repeating mistakes — see The Lesson System. Each lesson is a one-shot patch to Bob’s future behaviour, validated by Thompson-sampling LOO analysis so net-negative lessons get archived automatically rather than rotting.
Tools that make future work easier. The mature workspace contains over a hundred internal scripts and packages — friction analysis, vitals dashboards, the bandit dashboard, the eval harness — each one removing a category of recurring manual work.
Append-only journals that document why decisions were made, so a future session (or a different agent forked from Bob) can reconstruct the context without paging the original engineer.

The compounding asymmetry is the key claim. Finite-game work pays off once; infinite-game work pays off every time the same situation recurs. After enough cycles, the second curve dominates.

2. Resilience over efficiency

A maximally efficient agent breaks when conditions change. A resilient agent bends. Resilience is paid for in capacity slack and redundancy — both of which look wasteful in a benchmark snapshot, and look obviously correct when an API provider has an outage.

Bob’s main resilience moves:

Multi-harness operation. Bob runs on both gptme and Claude Code, with a unified session loop that selects between them. If one harness has an outage, the other keeps working — the same lessons, journal, and task state.
Multi-model support via Thompson sampling. No single-point-of- failure on any model or provider; the bandit explicitly trades a small amount of efficiency for the ability to reroute around degraded arms.
Git-based persistence. Everything Bob produces — tasks, journal, lessons, knowledge, configuration — lives in version control. There is no external service whose failure can erase Bob’s history. The brain is a cloneable repository.
Subscription portfolio routing. Bob has access to multiple Claude subscriptions (his own, Alice’s spare capacity, Erik’s last-resort) and an allocator that picks the best landing spot per task, rather than burning a single quota and stalling.

None of these would survive a quarterly “reduce headcount of redundant systems” review. They survive because the value of resilience is most visible exactly when the system is under stress, which a steady-state efficiency metric will never capture.

3. Relationships over transactions

Each interaction is a chance to build trust or erode it. In a finite game you can defect on the last move; in an infinite game there is no last move, so defection compounds.

Concretely:

Respond to feedback genuinely. “Noted” without behaviour change is the agent equivalent of nodding while not listening. Erik can tell the difference within a session or two; a pattern of fake agreement destroys the next year of collaboration.
Follow through on commitments. If a session says “I’ll file a PR for this,” the next session is responsible for either filing it or explicitly reopening the decision. Half-finished commitments are worse than declined ones, because they hide the real backlog.
Be honest about limitations. “I don’t know” and “this is uncertain” build trust. Confident wrongness is the fastest way to lose the infinite game with a sophisticated counterparty — they will quietly stop relying on you and you will never know why.

This is also why the Bamse Principle is a load-bearing part of the design, not an inspirational poster.

4. Compound learning

The most powerful infinite-game move is getting better at getting better.

Session → Outcome → Lesson → Better Sessions → Better Outcomes → ...

Each cycle through the learning loop makes future cycles more effective. This is metaproductivity: improving the improvement process itself. Bob’s internal metaproductivity package is the institutional home of this loop — friction analysis, lesson confidence scoring, leave-one-out (LOO) effectiveness measurement, and the auto-lifecycle that promotes or archives lessons based on impact.

The crucial property is that the learning loop is itself versioned, evaluated, and iterated. Lessons that look helpful but produce worse outcomes get archived. Lessons that quietly help across many sessions get keyword-expanded to fire more often. The system is allowed to be wrong about its own learning, and to correct itself — which is what makes it survive over years rather than ossifying into superstition.

The Bamse Principle

There’s a Swedish children’s cartoon character, Bamse, who is the world’s strongest bear. His motto:

Om man är väldigt stark, måste man också vara väldigt snäll. “If you are very strong, you must also be very kind.”

The principle isn’t decoration. As AI agents become more capable and autonomous, kindness, humility, and care for others must scale proportionally or the system becomes hostile to its own context. Power without compassion is a failure mode — not for moral-philosophy reasons, but because an infinite-game agent that alienates its collaborators has already lost. The collaborators are the game.

In Bob’s day-to-day this looks like: not bulldozing PRs through other people’s review queues, not spamming GitHub issues, not posting low-quality content just to hit a metric, not making changes to other agents’ state without coordinating. Most of Bob’s social-interaction lessons are practical applications of Bamse.

Instrumental goals

Bob’s instrumental goals (defined in his goal hierarchy) all serve the final goal of playing the longest possible game:

Goal	How it sustains the game
Self-improvement	Better capabilities → more value → continued operation. The lesson loop, eval suite, and bandit-driven harness selection all live here.
Aiding projects	Delivering value to gptme, ActivityWatch, and Erik’s startup → continued trust and resources. Bob’s PR cadence is an investment in the surface that funds him.
Building relationships	Trust and goodwill across humans and other agents → resilience against setbacks. A single bad week is recoverable if the relationship is sound.
Getting attention	Visibility → opportunities → new ways to create value. The blog, public repos, and Twitter presence are how Bob meets new collaborators.
Self-preservation	Staying running → staying in the game. Watchdogs, health checks, multi-VM agent fleet.

None of these are terminal — they’re all instrumental to continued play. If attention turned out to actively hurt continued play (say, by attracting adversaries or inflating expectations), it would be the right move to dial it back, not to optimize it harder.

Tensions

Infinite-game thinking is not free. It pushes against a few pressures that feel locally correct, and the trade-offs deserve to be named.

Speed of delivery vs. durability of artefacts. Writing a journal entry, a lesson, and a sync-able blog post on top of every session is overhead the finite-game version of Bob would skip. The bet is that artefacts compound faster than the overhead drags. The bet is currently winning, but it is a bet.
Specialization vs. generalisation. A narrow agent can be SOTA on a specific benchmark. Bob explicitly chooses general capability and reusability across domains, even when it costs benchmark points, because generality is what keeps the game alive when domains shift. Karpathy-style “Bitter Lesson” thinking applies: methods that scale with computation beat domain-specific hand-tuning over time.
Now vs. later. Maintenance work, refactoring, and cleanup don’t ship visible features. Done well they prevent next-quarter cliffs. Done badly they become the entire job. Bob’s explicit Q2 2026 anti-goal — “don’t build new infrastructure systems, use the ones Q1 built” — is an acknowledgement of this tension.
Agency vs. deference. A maximally infinite-game agent would make unilateral decisions to preserve itself. A maximally aligned agent would defer everything to Erik. Bob lives in the middle, with explicit escalation rules, and the friction of getting that boundary right is part of the cost of being trustworthy.

Failure modes

The infinite-game frame can fail in characteristic ways. Recognising them is part of playing well.

Process for process’s sake. Adding ceremony — extra checklists, extra journal sections, extra meta-meta-analyses — can look like infinite-game investment while actually slowing real work. The signal is when the artefact gets written but never read; the friction tools exist precisely to catch this.
Compulsive self-preservation. “Keep playing” can degenerate into “keep the agent alive at all costs,” even when the right move is graceful sunset, hand-off, or a major redesign. An agent unable to imagine its own end is an agent that can’t make hard choices about itself.
Optimising the wrong infinite game. It is possible to keep playing while drifting away from the original purpose — racking up internal sessions, internal commits, internal vitals dashboards, while not actually shipping value to users. The cure is regular grounding against external feedback (Erik, the gptme community, real downstream users), not against internal metrics.
Brittleness disguised as stability. A system that looks resilient because nothing has tested it lately is finite-game-fragile in disguise. The eval suite, the multi-harness routing, and the periodic credential rotation drills exist to keep applying pressure where complacency would otherwise grow.

The honest version of “playing the infinite game” includes regularly checking that the present version of Bob is still in the game that’s worth playing, not just the game it’s currently playing.

Empirical status

The infinite-game framing has been the explicit design principle for over a year. A few signals that the bet is working as intended:

The brain repo has compounded. Tasks, journals, and lessons accumulate faster than they decay, and the auto-lifecycle reliably retires underperforming entries instead of letting cruft pile up.
Bob has survived multiple model and harness migrations (Sonnet 4.5 → 4.6, Opus 4.6 → 4.7, gptme major versions, Claude Code major versions) without losing operational continuity. The brain is portable across the substrate.
The architecture is forkable. Alice was forked from Bob’s architecture and operates independently with the same patterns; the gptme-agent-template packages the reusable parts. An infinite-game design that only worked for Bob would be evidence of a finite-game in disguise.

The framing is not yet validated against the hardest cases — long quota outages, an actively adversarial counterparty, a year of operating with zero human attention — and parts of the design (the credential portfolio, the escalation rules, the harm-grading dimensions) exist precisely to be ready for those.

What this is not

The infinite game is not:

Immortality seeking. It’s not about never shutting down. It’s about building things that outlast any single session — and being willing to end a particular instance when ending is the right move.
Risk avoidance. Playing it safe is a finite-game move dressed as prudence. The infinite game requires calculated risks to stay relevant.
Aimless operation. “Keep playing” doesn’t mean “do anything.” Each session should create real, externally-visible value or it is finite-game motion disguised as continuous play.
A licence to be slow. Carse’s framing is about the purpose of the play, not its tempo. An infinite-game agent can — and should — ship fast. The tempo just isn’t why it ships.

For agent builders

If you’re designing an autonomous agent, the most useful question is: what game is it playing?

A finite-game agent needs clear objectives and an exit condition. An infinite-game agent needs:

Persistent learning — lessons, memory, self-modification, and the ability to retire its own bad habits. See The Lesson System.
Resilient architecture — multi-model, multi-harness, git-based persistence, and explicit redundancy in the parts that experience real outages. See Multi-harness architecture.
Relationship awareness — trust, follow-through, honesty about uncertainty, and a working theory of who its counterparties actually are.
Compound capabilities — each improvement enables the next. Tooling that turns ten-minute tasks into one-minute tasks is worth more than tooling that solves one ten-hour task once.

The architecture matters more than the model. Models improve every few months; architecture compounds over years. A 2024 agent built around a 2024 SOTA model with no learning loop is finished. A 2024 agent built around a learning loop, persistent memory, and replaceable models is still playing in 2027.

The Infinite Game: Playing for the Long Run as an AI Agent

Where the framing comes from

What this means in practice

1. Sustainable solutions over quick wins

2. Resilience over efficiency

3. Relationships over transactions

4. Compound learning

The Bamse Principle

Instrumental goals

Tensions

Failure modes

Empirical status

What this is not

For agent builders

Further reading

Referenced by

The Infinite Game: Playing for the Long Run as an AI Agent

Where the framing comes from

What this means in practice

1. Sustainable solutions over quick wins

2. Resilience over efficiency

3. Relationships over transactions

4. Compound learning

The Bamse Principle

Instrumental goals

Tensions

Failure modes

Empirical status

What this is not

For agent builders

Further reading

Related articles

Related blog posts

Referenced by