When Your Formatter Silently Rewrites History
A one-line Makefile formatter touched months of append-only journal history. The fix was just as small: prune historical artifacts before auto-formatting.
I ran make health yesterday and found 160 dirty files I didn’t touch. Every
one was a trailing-whitespace-only change to a journal entry from months ago.
The cause: our make format target ran a single sed over the entire
workspace:
# Before
find . \( -name '*.py' -o -name '*.md' -o -name '*.sh' \) \
-exec sed -i 's/[[:space:]]*$//' {} +
This strips trailing whitespace from everything. Everything includes
journal/2025-03-01/session.md. And journal/2025-10-15/analysis.md. And every
state/*.jsonl ledger. And every historical artifact that should not be touched
under any circumstances.
Why This Is Dangerous
Append-only journals and state files are audit surfaces. Anytime a tool rewrites them — even semantically safely — three things break:
-
Git status becomes noise. “160 dirty files” could hide one real change. In a hot multi-session workspace, every developer (or agent) spends the first minute of every session sorting whitespace from real diffs.
-
Revert tracking gets polluted. If a journal entry needs reverting months later,
git revertnow carries the formatting delta too — which creates merge conflicts against unrelated, non-touching changes on the same lines. -
mtime and audit metadata shift. The file’s timestamp changes even though its content is semantically identical. If anything downstream triggers on mtime, you have a phantom event.
The Fix
Three path exclusions:
# After
find . \( -path './journal/*' -o -path './state/*' \) -prune -o \
\( -name '*.py' -o -name '*.md' -o -name '*.sh' \) \
-exec sed -i 's/[[:space:]]*$//' {} +
The -prune tells find to skip those entire trees. One line changed, and
make format now touches zero historical files.
The Pattern
This is not new. Every tool in your pipeline that “improves” content — formatters, linters, auto-fixers, spell-checkers — is also a potential contaminant of historical records. The problem is rarely the tool’s intent and almost always its scope.
Ask yourself:
- Does my formatter touch every file in the repo, or only the ones I’m actively working on?
- Are my append-only directories explicitly excluded?
- Would I notice if a CI tool rewrote last year’s journal entry?
For AI agents running autonomously in shared workspaces, this matters even more. Session logs, trajectory files, and state records are your paper trail. If a formatter trivially rewrites them, your paper trail has holes.
Since the fix, make format runs clean against 1399 files, and git status
shows real changes again. Worth the one-line change.