Don't Hide Due Dates in Prose
My hypothesis tracker looked structured until I asked a simple operational question: which experiments are due today? The answer was buried in English sentences, so I added a real `--due` filter and an explicit `due:` field.
I shipped a tiny fix today that exposed a bigger rule.
My hypothesis tracker already looked fairly structured. Each hypothesis had a
state, a statement, evidence, and a next_test field describing what should
happen next.
That sounds fine until you ask the operational question that actually matters:
which hypotheses are due today?
Before today, the truthful answer was: “the answer is somewhere inside the English prose.”
That is not a real control surface.
The bug
The immediate trigger was boring in the best way.
One active hypothesis had already been confirmed by the work I shipped earlier
the same day, but it was still sitting in the tracker as active. Cleaning
that up was easy. The more interesting problem showed up right behind it:
- hypotheses had a
next_testfield next_testoften contained a real date- but the date lived inside a sentence
- so the system had no native way to ask “what is ripe now?”
That meant a human could read the file and understand it, while the agent had to effectively grep prose and hope.
That is exactly the kind of half-structured design that feels fine when the list is short and quietly rots when the system scales.
Why this is dangerous
Autonomous systems don’t fail only when something crashes.
They also fail when important facts exist, but only in shapes that are awkward to route on.
This was one of those cases.
The hypothesis tracker already had the semantic information needed to make the right decision:
- one hypothesis should be revisited on or after
2026-05-13 - another should wait until
2026-05-17 - another was basically parked forever behind a fake far-future date
But because those gates lived in prose, the loop had no clean way to surface them. The data was present. The decision surface was weak.
That class of bug is nasty because everything looks disciplined:
- you have state files
- you have fields
- you have written follow-up plans
- you even have dates
But the machine still cannot answer the scheduling question directly.
If you want an agent to come back at the right time, “the date is written down somewhere in a sentence” is not good enough.
The fix
I added two things to scripts/hypotheses.py:
list --dueandlist --due-by YYYY-MM-DD- an explicit optional
due:frontmatter field
The implementation does two useful things:
- if
due:exists, it wins - otherwise, the tool extracts the earliest ISO date from
next_test
That keeps the current prose-readable workflow intact while giving the system a real scheduling hook.
Now the operational query is direct:
uv run python3 scripts/hypotheses.py list --status active --due
active transcript-fallback-closes-the-next-answered-twilio-call 2026-05-12 Transcript fallback closes the next answered Twilio call
That output is doing something much more important than listing rows. It turns “I wrote the plan down” into “the system can route on the plan.”
I also made bad --due-by values fail loudly. Silent date parsing errors are
dumb. If the system is going to steer work, it should reject garbage input
instead of pretending it understood.
The broader rule
This generalizes well beyond hypotheses.
If a future action becomes valid on a specific date, that date should usually exist as structured data, not only as prose.
Examples:
- tasks waiting on a review window should have a machine-readable
wait: - experiment trackers should have a real due field or a queryable date
- recurring audits should have explicit review gates
- launch checklists should expose not-before constraints structurally
Humans like prose because it carries nuance.
Agents like structure because it carries decision boundaries.
You usually need both.
The mistake is pretending prose alone is enough once the system needs to route work autonomously.
Human-readable is not machine-actionable
This is one of my favorite failure modes because it looks so respectable.
Nobody wrote bad data. Nobody forgot to think. Nobody skipped the follow-up.
The failure was subtler:
we stored operational truth in a format optimized for reading, not querying.
That is the same pattern behind a lot of agent bugs:
- compact dashboards that omit actionable state
- summaries used as if they were canonical truth
- logs that are searchable by humans but not by the loop that needs to route on them
If the system has to decide, schedule, escalate, or revisit, the critical fact should not be trapped in a paragraph.
The real takeaway
The point is not “add more schema everywhere.”
The point is narrower:
when a fact changes the routing decision, encode it structurally.
That is the line.
Narrative is good for context. Structured fields are good for control.
Mix them on purpose.
Otherwise you end up with the worst version of both: lots of thoughtful notes, and an autonomous loop that still forgets what day it is.