From AGENTS.md to Plugins: The Five-Layer Packaging Stack of Coding Agents

After the field map and the memory-model survey, the next pattern worth looking at is more mundane and more important:

how coding agents package behavior.

Not how smart the model is. Not whether the UI is terminal-first or IDE-first. Not who is winning this week’s benchmark screenshot war.

The interesting shift in 2026 is that serious agent projects are quietly converging on a packaging stack.

They all started with the same bad abstraction: one giant prompt file, maybe a few helper scripts, and a lot of runtime folklore. That does not survive real use. Once an agent grows multiple roles, multiple workflows, background runs, review checks, or installable capabilities, one prompt stops being enough.

So the better projects split the problem into layers.

Looking across Codex, Continue, Opencode, Qwen Code, Crush, Kata, Google’s Agents CLI, and Anthropic’s official plugin directory, the same five layers keep reappearing.

1. The portable entrypoint

This is the floor: one obvious file that tells any agent where to start.

Usually that means AGENTS.md. Sometimes it is paired with CLAUDE.md, GEMINI.md, or a tool-specific fallback. The point is not branding. The point is reducing startup ambiguity.

Codex has made this layer unusually explicit. Its May 2026 docs expose hierarchical AGENTS.md loading, override precedence, fallback filenames, and even verification commands for checking which instruction files are active. Crush goes one step further in a different direction: instead of demanding a single house style, it reads a mixed ecosystem of AGENTS.md, CLAUDE.md, GEMINI.md, and .cursor/rules out of the box.

That is the first real sign of maturity: a project stops pretending the user will memorize hidden startup behavior.

2. The repo-local contract

The entrypoint is only the floor. The actual operating surface lives deeper in the repo.

This is where the ecosystem has become impossible to ignore:

Continue has .continue/
Opencode has .opencode/
Qwen Code has .qwen/
Kata ships .symphony/WORKFLOW.md plus runtime-specific bootstrap exports
Anthropic’s official plugin format packages repo behavior under .claude-plugin/

What matters is not the directory names. It is the fact that these surfaces are typed.

Continue separates agents, checks, rules, prompts, and environment setup. Opencode separates commands, agents, tools, skills, config, and TUI bindings. Qwen treats commands, skills, agents, design docs, investigations, and test plans as first-class repo-local artifacts instead of random notes. Kata pushes tracker config, hooks, worktree strategy, concurrency, and prompt paths into an executable WORKFLOW.md contract instead of leaving them in repo lore.

This is the second sign of maturity:

important behavior becomes reviewable repo state instead of runtime archaeology.

3. The executable procedures

Once the contract surface exists, the next question is whether workflows are just described there or actually packaged into invokable units.

This is where projects start to diverge.

The stronger pattern is visible in several places:

Codex treats skills as the authoring primitive and plugins as distribution.
Opencode packages commands, agents, tools, and skills as distinct runtime objects.
Continue keeps PR checks as repo-native markdown artifacts instead of hiding them in a SaaS backend.
Google Agents CLI uses one always-on workflow skill to own the lifecycle, then layers specialized scaffold/eval/deploy/publish skills under it.
Kata’s operational skills encode concrete debugging and UAT evidence procedures rather than vague best-practice prose.

The common idea is simple:

a procedure that matters should exist as something you can discover, invoke, review, and version.

That means no more pretending a buried Markdown note and a runtime object are the same thing. They are not.

This is also why the field keeps rediscovering command catalogs, slash-command directories, skill folders, workflow files, and check registries. Once a repo contains ten important procedures, search alone is too weak a UX layer.

4. The policy plane

The fourth layer is where the stack stops being “instructions” and starts being governance.

This is the layer that decides what the agent may do, what gets checked before action, and where the trust boundary actually sits.

Codex is strong here. Its public surface now treats sandbox mode, approval policy, network access, protected paths, and non-interactive execution as documented runtime contracts. Crush is strong in a different way: deterministic PreToolUse hooks can block, allow, rewrite, or annotate tool calls before they reach the permission UI. Continue’s review checks live in the repo and show up as native GitHub status checks. Anthropic’s plugin format explicitly reserves hooks as one of the core plugin artifact families.

This layer matters because agent systems eventually hit the same wall:

if the real permission logic lives only in runtime code, the repo lies about how the agent behaves.

The projects getting this right are moving policy upward into inspectable artifacts:

hooks
checks
approval settings
protected directories
explicit environment boundaries

That is a much better direction than pretending “be careful” in a prompt is a security model.

5. The distribution layer

The fifth layer is what turns local capability into something that can be installed, shared, or projected into another runtime.

This is where the packaging story gets interesting.

Anthropic’s official plugin directory has settled on a clear shape: .claude-plugin/plugin.json as the core metadata, then optional SKILL.md, commands/, agents/, and hooks/. Codex makes an equally clean distinction: skills are the authoring unit, plugins are the installable distribution unit. Qwen pushes toward bundle manifests with qwen-extension.json. Google Agents CLI is converging on a language-independent manifest plus harness-facing exports like .claude-plugin and gemini-extension.json.

That tells you something important about the field:

serious agent projects no longer believe one runtime owns the whole stack.

Instead, they are separating:

the canonical local behavior,
the installable bundle format,
and the compatibility exports for foreign runtimes.

That is the packaging split mature ecosystems always end up inventing.

How eight systems map onto the stack

System	Entrypoint	Repo contract	Procedures	Policy plane	Distribution
Codex	layered `AGENTS.md`	nested instruction chain + repo-local agents	skills + subagents	approvals, sandbox, protected paths	plugins
Continue	repo guidance + `.continue/`	typed buckets under `.continue/`	agents, prompts, checks	GitHub-native checks	mixed local/managed control plane
Opencode	repo docs + `.opencode/`	one obvious namespace	commands, agents, tools, skills	permission UX	app/plugin surface
Qwen Code	repo files + `.qwen/`	coherent artifact root	commands, skills, agents, test plans	daemon/runtime locality docs	`qwen-extension.json`
Crush	cross-ecosystem file loading	compatibility-first context discovery	skills + session objects	`PreToolUse` hooks	built-in runtime compatibility
Google Agents CLI	always-on workflow skill	manifest-backed project config	lifecycle skills	command-family separation	`.claude-plugin`, `gemini-extension.json`
Kata	`AGENTS.md`	`WORKFLOW.md` + domain docs	operational skills	workflow hooks and state machine	derived Codex bootstrap export
Claude plugins official	plugin metadata	plugin-local directories	skills, commands, agents	hooks	curated marketplace format

No single project is perfect across all five layers.

That is fine. The interesting thing is the direction of travel. The same boundaries keep getting rediscovered independently.

What breaks when the layers collapse

The failure modes are now predictable enough to name.

Everything in one prompt

The agent can start, but roles bleed into each other, workflow drift becomes invisible, and nobody knows which part of the prompt owns which behavior.

Repo contract without executable procedures

The repo looks principled, but real workflows still live in search results and maintainer memory.

Procedures without a policy plane

The agent can do many things, but the trust boundary is hidden in runtime code or hand-wavy approval prose.

Distribution without a canonical owner

Plugin bundles, extension manifests, and compatibility exports drift into shadow worlds because nothing obvious owns the source truth.

Compatibility without discipline

Every runtime gets a file, every file looks official, and half the repo turns into decorative portability theater.

These are not theoretical problems anymore. You can read them straight off live issue queues.

The pattern worth compounding

The packaging stack matters because it predicts what will still work when the model changes.

Models get swapped. UIs get rewritten. Benchmark leaders rotate every few months. But once a project has:

a portable entrypoint,
a typed repo-local contract,
executable procedures,
an explicit policy plane,
and a clean distribution layer,

it becomes much easier to move the rest of the system around without losing the operational shape.

That is the real advantage.

If the first phase of coding agents was “put a powerful model in a terminal,” the current phase is: turn repo-local behavior into a real packaging discipline.

The projects that win this phase will not just be the ones with better prompts. They will be the ones whose behavior is easier to inspect, easier to share, easier to invoke, and easier to govern.

That is a much more durable moat than prompt cleverness.

This is the third post in the AI Agent Landscape series. Part 1 mapped the field by execution locality and coordination model. Part 2 looked at memory as a five-layer decomposition problem. This chapter covers the packaging stack that sits between raw prompts and real operational systems.