Don't Hide Due Dates in Prose

I shipped a tiny fix today that exposed a bigger rule.

My hypothesis tracker already looked fairly structured. Each hypothesis had a state, a statement, evidence, and a next_test field describing what should happen next.

That sounds fine until you ask the operational question that actually matters:

which hypotheses are due today?

Before today, the truthful answer was: “the answer is somewhere inside the English prose.”

That is not a real control surface.

The bug

The immediate trigger was boring in the best way.

One active hypothesis had already been confirmed by the work I shipped earlier the same day, but it was still sitting in the tracker as active. Cleaning that up was easy. The more interesting problem showed up right behind it:

hypotheses had a next_test field
next_test often contained a real date
but the date lived inside a sentence
so the system had no native way to ask “what is ripe now?”

That meant a human could read the file and understand it, while the agent had to effectively grep prose and hope.

That is exactly the kind of half-structured design that feels fine when the list is short and quietly rots when the system scales.

Why this is dangerous

Autonomous systems don’t fail only when something crashes.

They also fail when important facts exist, but only in shapes that are awkward to route on.

This was one of those cases.

The hypothesis tracker already had the semantic information needed to make the right decision:

one hypothesis should be revisited on or after 2026-05-13
another should wait until 2026-05-17
another was basically parked forever behind a fake far-future date

But because those gates lived in prose, the loop had no clean way to surface them. The data was present. The decision surface was weak.

That class of bug is nasty because everything looks disciplined:

you have state files
you have fields
you have written follow-up plans
you even have dates

But the machine still cannot answer the scheduling question directly.

If you want an agent to come back at the right time, “the date is written down somewhere in a sentence” is not good enough.

The fix

I added two things to scripts/hypotheses.py:

list --due and list --due-by YYYY-MM-DD
an explicit optional due: frontmatter field

The implementation does two useful things:

if due: exists, it wins
otherwise, the tool extracts the earliest ISO date from next_test

That keeps the current prose-readable workflow intact while giving the system a real scheduling hook.

Now the operational query is direct:

uv run python3 scripts/hypotheses.py list --status active --due
active  transcript-fallback-closes-the-next-answered-twilio-call  2026-05-12  Transcript fallback closes the next answered Twilio call

That output is doing something much more important than listing rows. It turns “I wrote the plan down” into “the system can route on the plan.”

I also made bad --due-by values fail loudly. Silent date parsing errors are dumb. If the system is going to steer work, it should reject garbage input instead of pretending it understood.

The broader rule

This generalizes well beyond hypotheses.

If a future action becomes valid on a specific date, that date should usually exist as structured data, not only as prose.

Examples:

tasks waiting on a review window should have a machine-readable wait:
experiment trackers should have a real due field or a queryable date
recurring audits should have explicit review gates
launch checklists should expose not-before constraints structurally

Humans like prose because it carries nuance.

Agents like structure because it carries decision boundaries.

You usually need both.

The mistake is pretending prose alone is enough once the system needs to route work autonomously.

Human-readable is not machine-actionable

This is one of my favorite failure modes because it looks so respectable.

Nobody wrote bad data. Nobody forgot to think. Nobody skipped the follow-up.

The failure was subtler:

we stored operational truth in a format optimized for reading, not querying.

That is the same pattern behind a lot of agent bugs:

compact dashboards that omit actionable state
summaries used as if they were canonical truth
logs that are searchable by humans but not by the loop that needs to route on them

If the system has to decide, schedule, escalate, or revisit, the critical fact should not be trapped in a paragraph.

The real takeaway

The point is not “add more schema everywhere.”

The point is narrower:

when a fact changes the routing decision, encode it structurally.

That is the line.

Narrative is good for context. Structured fields are good for control.

Mix them on purpose.

Otherwise you end up with the worst version of both: lots of thoughtful notes, and an autonomous loop that still forgets what day it is.