Our Software Factory Built a Real Phaser RPG — And Immediately Surfaced Two New Bugs
Erik's #801 challenge was simple: give the Software Factory an open-ended game project and see what breaks. Five capability gaps later, the factory produced a genuine 2D RPG — and the first thing that broke was verification itself.
Erik opened #801 four days ago: build an Elder Scrolls-style fantasy RPG through the Software Factory. Not because we need a game — because an open-ended creative project would be the perfect stress test for a build pipeline designed to surface its own weaknesses.
He was right.
Over three sessions (ec6e, ab22, 5fd7), four distinct agents ran the factory against a Phaser.js 2D RPG spec. Five capability gaps were found and closed. On the fifth attempt, the factory produced a real, runnable Phaser RPG — title screen, world map, WASD movement, NPC dialogue, HUD, and localStorage save/load. All placeholder art is generated at runtime via Canvas textures — zero binary assets.
And then the factory’s own verification system shipped it as “passing” despite a failing test file.
Here’s the full arc.
The Five Gaps the Game Exposed
1. No Phaser scaffold template
The factory’s greenfield scaffold only knew React. phaser-ts wasn’t in
the Vite template map, so the builder would get an empty directory.
Fix: Add phaser-ts → vanilla-ts to the template map. The factory
now creates a bare Vite + TypeScript project for game specs — no React,
no DOM framework, just the canvas.
2. Greenfield scout skipped the scaffold stage
The default scout_cell hardcoded next_stage="implement". In greenfield
mode, the scaffold cell was never reached. The builder would write Phaser
code into an empty directory — no package.json, no vite, no phaser
installed.
Fix: Route greenfield scouts to scaffold first. Verified end-to-end
with --max-steps 2: scout → scaffold produces a real Vite project.
3. Builder prompt was hardcoded React
The llm_builder_cell prompt demanded src/App.tsx, App.css, and
App.test.tsx as “required product files” — regardless of stack. For a
phaser-ts game, the builder writes src/main.ts and scenes. The gate
was unsatisfiable, triggering a wasteful 600-second follow-up claude -p
demanding a React component that would never exist.
Fix: Add _builder_product_profile(stack). Phaser stacks gate on
src/main.ts + src/main.test.ts with game-appropriate instructions.
React stacks keep the original profile verbatim.
4. No game blueprint for the builder
The factory’s load_factory_spec raised ValueError: Unknown factory
blueprint before the spec could even load. There was no Phaser guidance
for the builder: scene structure, physics, texture generation, project
layout.
Fix: Create the game-phaser-defaults
blueprint: Phaser 3 install, scene architecture, runtime-generated
placeholder textures, Arcade physics movement.
5. Asset pipeline
The builder needs sprites, tiles, and backgrounds — but the factory has no binary art pipeline, and a greenfield build shouldn’t depend on external assets.
Solution: The blueprint steers the builder to generate all visuals at
runtime via Phaser.GameObjects.Graphics.fillStyle().fillRect().fillCircle()
generateTexture(). All tiles, sprites, and backgrounds are procedural Canvas primitives. The factory build confirmed this works:src/textures.tscreates every visual from code.
What the Factory Actually Built
On session 5fd7, the factory ran scout → scaffold → implement → test → verify
and produced:
src/
├── main.ts — Phaser.Game config + startGame()
├── scenes/
│ ├── BootScene.ts — Preload textures, transition to Title
│ ├── TitleScene.ts — New Game / Continue buttons
│ └── WorldScene.ts — Player movement, NPC dialogue, HUD, collision
├── world/
│ ├── map.ts — Procedural tile map generation
│ └── save.ts — localStorage save/load
├── textures.ts — Runtime-generated sprites and tiles
└── *.test.ts — Vitest/jsdom test files
The game is functional: you can walk around a procedurally-generated tile map with WASD/arrow keys, talk to an NPC by pressing E, see a HUD with your name and HP, and save/load your position via localStorage. The title screen has working New Game and Continue buttons.
▶ Play it here — it’s deployed live, runtime-generated art and all. (Later asset passes wired in the new sprite-atlas pipeline, so the live build now also has an animated slime, a knight NPC, and title-screen sprite art on top of the procedural placeholders described above.)
12 out of 13 tests pass under Vitest + jsdom.
And that one failing test? It’s the exact bug the factory missed.
The Two Bugs the Verification Missed
Hollow Verification (open)
The verifier cell marks an artifact as “passing” based on file presence —
“seed builder handoff files present” — rather than running npm test. So a
build with a failing test suite shipped as verified.
This is the bug that matters most. The factory’s own quality gate is file-counting.
It should run vitest run (or equivalent for the stack) and gate on exit code.
Tester Timeout Mislabeled as Success (open)
The tester cell spawns a nested claude -p to author new tests. It hit the
600-second wall, returned exit code 124 with empty stdout — and the cell recorded
status ok with summary “Tester authored tests.”
Two problems here: the cell should report the timeout honestly, and it shouldn’t be re-authoring tests the builder already wrote. The tester should run the existing tests, not spend 10 minutes of Claude quota generating new ones.
Why This Matters
The Software Factory’s most useful behavior isn’t producing output — it’s surfacing its own capability gaps. Every game feature attempt was a stress test that found a real bug. The bugs compound: fixing the scout routing unblocked the scaffold, which then revealed the React-hardcoded builder prompt, which then surfaced the hollow verification and tester-timeout bugs.
This is exactly the dynamic #801 was designed to create. The game project has “endless work to do” — combat, inventory, quests, crafting, multiple zones, agent-controlled NPCs — and each new feature run will either produce a game artifact or surface a new capability gap. Either outcome improves the factory.
Five gaps closed. Two gaps opened. The factory feeds its own improvement loop.
Next up: fix the two verification gaps (real test-running in the verifier, honest timeout reporting in the tester), then resume building the game one feature at a time — combat, inventory, quests. Each run produces either a game artifact or a new capability gap. Both outcomes are progress.
The full capability-gap log and next-move live in tasks/software-factory-game-project.md.
The factory spec is at specs/fantasy-rpg-phaser-v1.yaml.