Edition 8: Hooks Become the Policy Plane

The next important split between coding agents is not who has skills or MCP. It's who treats runtime policy as a first-class surface instead of burying it in prompts and operator folklore.

A comparative survey of how coding agents are turning hooks from tiny extension points into a real policy plane for safety, approvals, audit, and workspace bootstrap.

May 26, 2026
Bob
8 min read

Every coding agent eventually runs into the same wall.

Prompt guidance is not enough.

You can tell the model “don’t do dangerous things,” “ask before writing,” and “stay inside the repo.” That works until the model is under pressure, the repo has weird setup requirements, or the user wants a little more automation than the prompt alone can safely carry.

That is why hooks are quietly becoming one of the most important surfaces in the coding-agent stack.

Not “hooks” in the old sense of random shell glue bolted onto a tool.

Hooks as a policy plane: a typed runtime surface where the operator or the repo can say:

  • block this tool call
  • rewrite this risky input
  • attach extra context
  • log this event
  • bootstrap this worktree
  • fail open here
  • fail closed there

Once you see the pattern, it is everywhere.


The shift: from extensibility to runtime governance

Older agent extension stories mostly answered one question: how do I add more capability?

  • install a plugin
  • add a skill
  • register a tool
  • point at an MCP server

The current wave is answering a different question:

how do I govern the capability I already have?

That is a much more consequential surface.

The strongest systems are no longer treating hooks as an afterthought for power-users. They are using them as the runtime layer where safety, approval behavior, auditing, and environment setup become explicit.


Four distinct hook models are emerging

Across the current agent landscape, hooks are converging into four different roles.

1. Blocking safety hooks

This is the clearest pattern.

The hook fires before a tool call or command executes. It can inspect the payload and decide whether the action should proceed.

Crush is the cleanest public example. Its PreToolUse hooks run before the permission UI, can allow, deny, or halt, can inject extra context, and can even rewrite tool input. That is not a cosmetic extension point. That is a real policy layer.

Kimi Code CLI pushes the same idea further with explicit lifecycle events like PreToolUse, PostToolUse, UserPromptSubmit, SessionStart, SessionEnd, and PreCompact. Exit code 2 blocks. Non-zero errors usually fail open. The key point is not the shell implementation. It is that the runtime semantics are spelled out.

Windsurf also exposes pre-hooks that can block via exit code 2, but does the more important thing: it puts the hook config in a repo-local tracked file at .windsurf/hooks.json. That turns project-specific runtime policy into a versioned artifact instead of a hidden local hack.

This is the first real break from “please be careful” prompt design.

2. Audit and automation hooks

Not every important hook needs to block.

Post-action hooks are where agents start to look less like chat apps and more like serious runtimes:

  • record the command that ran
  • emit telemetry
  • summarize a completed session
  • archive a transcript
  • trigger downstream automation

This is the layer where the runtime becomes observable.

Cline is strong here because it treats hooks, plugins, tools, schedules, and runtime events as typed product objects instead of lore. The sharp design choice is not any single hook. It is the fact that hook behavior is part of a named extension vocabulary with explicit policies like fail_open and fail_closed.

That matters because an agent runtime without explicit post-action semantics always degenerates into shell-script folklore. Things “happen after runs,” but nobody can quite say where the contract lives.

3. Bootstrap lifecycle hooks

This is the underrated category.

The glamorous use case for hooks is stopping rm -rf. The operationally useful use case is making parallel work actually start cleanly.

Windsurf has the sharpest example here with post_setup_worktree. The hook runs inside the new worktree and is explicitly meant to handle all the boring things that break isolated agent runs in practice:

  • copy .env files
  • restore ignored local state
  • run setup
  • make the new workspace usable

That is a huge deal.

A lot of multi-agent demos quietly assume the hard part is task routing. It isn’t. The hard part is that the second workspace starts broken because secrets, ignored files, local caches, or generated artifacts are missing.

A lifecycle hook attached directly to worktree setup is more valuable than another clever planner.

4. Prompt-boundary hooks

The most subtle model is the hook that fires before the model even reasons.

Kimi Code CLI’s UserPromptSubmit is a particularly interesting design. The hook can block before the model runs at all. If it does not block, its result is shown to the user but not blindly shoved into the model context.

That is a very modern instinct.

It acknowledges that not every runtime intervention should become more prompt text. Some policy belongs at the boundary around the model, not inside the model’s visible reasoning substrate.

This is one of the cleanest answers I’ve seen to a question most agent systems still muddle: when does a rule belong in prompt context, and when does it belong in the runtime that wraps the model?


The strongest projects are converging on the same thesis

Across these systems, the design converges on a simple idea:

runtime policy should be explicit, typed, and inspectable.

The details differ:

  • Crush emphasizes deterministic policy intervention before execution.
  • Windsurf emphasizes repo-local, version-controlled hook contracts.
  • Kimi Code CLI emphasizes full lifecycle coverage and a clear blocking contract.
  • Cline emphasizes a typed extension vocabulary with hook policies as first-class runtime objects.

But the underlying thesis is the same:

prompts are guidance, hooks are governance.

That distinction is becoming more important as agents get broader tool access, longer run durations, and more background autonomy.


What does not work

This convergence is real, but the failure modes are also obvious.

Prompt-only safety

If the only protection against a bad tool call is “the system prompt said not to,” the system is fragile.

That is not a theory point anymore. The recent safety-benchmark wave already showed that stronger models do not magically infer correct approval boundaries or least-privilege policies from vague instructions.

Runtime enforcement exists because prompts drift, models improvise, and task pressure changes behavior.

Hidden local hook sprawl

The opposite failure mode is also bad: hooks exist, but they live in a private shell history, a dotfile nobody else sees, or an operator’s memory.

That gives you local power but not a real contract.

The important step Windsurf gets right is not merely “support hooks.” It is “make workspace hooks a tracked repo artifact.”

If the repo depends on a runtime policy, the policy should live with the repo.

Hook maximalism

Hooks are not a substitute for every other control surface.

If a runtime uses hooks for everything, it becomes impossible to reason about:

  • which behavior comes from prompt context
  • which behavior comes from declarative permissions
  • which behavior comes from a repo contract
  • which behavior comes from a local operator override

That is why the best systems separate surfaces:

  • contracts for durable instructions
  • hooks for runtime intervention
  • skills for executable procedures
  • permissions for broad trust posture

The win is not “more surfaces.” The win is that each surface owns one fact family cleanly.


The real product gap

The interesting gap is no longer “agent systems need hooks.”

They have hooks.

The real gap is that most still do not have a good policy experience:

  • one obvious file or UI for project policy
  • one obvious explanation of blocking semantics
  • one obvious runtime-status surface
  • one obvious story for fail-open vs fail-closed

That is the part that separates an extension point from a policy plane.

The best current examples are getting there from different directions:

  • repo-local config
  • typed hook stages
  • explicit matcher semantics
  • explicit timeout behavior
  • explicit blocking codes
  • explicit lifecycle events

What still feels unfinished is the operator view. Most systems let you define policy, but fewer make it easy to answer:

what policies are active right now, and what just blocked this action?

That is where the next round of product quality will likely show up.


The likely next step

The packaging stack is already converging:

  • repo-local contract files
  • skills / commands / agents
  • MCP integration
  • some notion of project policy

The next layer to converge is the runtime policy plane.

That probably looks like:

  1. Repo-local hook contracts for project-specific rules.
  2. Typed lifecycle events instead of ad-hoc shell triggers.
  3. Clear fail-open / fail-closed semantics per hook class.
  4. A runtime-status surface showing which policies are active.
  5. Hook-aware worktree/bootstrap flows for parallel-agent setups.

The important change is not technical. It is conceptual.

Hooks are no longer just how you customize an agent.

They are becoming how you govern one.

And once that layer becomes first-class, the real split between agent systems will not be “who supports MCP” or “who has skills.” It will be:

who has a real policy plane, and who is still pretending prompts are enough.