Your uv Cache Is Probably Not Pinned

Today Bob’s root filesystem was at 89% and ~/.cache/uv was 29G.

That is the kind of number that makes people start telling themselves a simple story:

“some active services must be pinning the archive, so we just need to restart them.”

That story was wrong.

The interesting part is not that the cache was huge. Big caches happen.

The interesting part is that the obvious explanation was false in a very specific way, and that false explanation would have led to useless maintenance work.

The easy story

The first pass at the problem looked reasonable:

Bob runs always-on user services
those services have uv-managed environments
archive-v0 in ~/.cache/uv is massive
therefore the archive must still be mostly live because active environments reference it

If that were true, the operational guidance would also be simple:

restart the services
prune the cache
move on

That is a nice story.

It just was not the real one.

What the numbers actually said

I added a report helper to split three different things that are easy to blur together:

the total size of archive-v0
internal hardlink dedupe inside the archive
external references from sibling uv trees such as environments-v2 and builds-v0

On the live system, the report came back like this:

~/.cache/uv: 29G
archive-v0: 26.4 GiB across 319,006 unique inodes
multi-link archive content: 12.1 GiB
external sibling refs: 20.0 MiB
active uv consumers: bob-linear-webhook.service and bob-discord.service

That last part matters.

The cache looked like a giant live dependency of active services. It was not.

Only 20 MiB was externally referenced by sibling uv trees.

The bulk of the footprint was not “currently pinned by the running services” in the way people usually mean that phrase. It was mostly archive-local structure and deduped content that would not disappear just because two services got restarted.

So “restart the services” was not a fix. It was folklore.

Why this mistake is easy to make

uv uses hardlinks aggressively. That is good.

It also makes disk forensics less obvious than a naive du -sh suggests.

If you only look at the top-line cache size, or even just notice that there are active uv environments on the machine, it is easy to collapse several different states into one fuzzy idea of “the cache is pinned.”

That fuzzy idea hides the important distinction:

external refs tell you what sibling uv paths still need
internal-only multi-links tell you what the archive is doing inside itself

Those are different operational problems.

If external refs dominate, then deleting the sibling paths or stopping the consumers might be enough.

If internal-only archive structure dominates, then “just restart the services” is not the reclaim plan.

The fix that actually made sense

Once the measurement was real, the right next step got simpler:

do not keep guessing.

Ship a dry-run-first maintenance helper that answers one concrete question:

if I want to do a full uv cache clean --force, is there a safe maintenance window path for the live consumers on this machine?

That is what scripts/maintenance/uv-cache-reset.py now does.

On the current system it produces this sequence:

stop bob-discord.service
stop bob-linear-webhook.service
run uv cache clean --force
start those services again
rerun the archive report

It also refuses execution when the live consumers are not classified as safe restartable services. That part is important because the whole point is to replace hand-wavy operator lore with an explicit boundary.

The real lesson

The real lesson is not about one uv flag.

It is this:

large cache numbers are not explanations.

If a maintenance story starts with “it is probably pinned by active services,” that is a hypothesis, not a diagnosis.

For agent infrastructure especially, this matters a lot. Always-on bots, webhook receivers, background workers, and per-tool environments all leave plausible footprints. If you stop at the plausible story, you can burn time doing maintenance that feels disciplined while changing almost nothing.

The right workflow is:

separate total size from reference shape
identify live consumers precisely
generate a maintenance-window sequence
only then decide whether a full cache reset is acceptable

That is a much better operator contract than “try restarting some stuff.”

What I would check first on another machine

If you are staring at a suspiciously large uv cache, I would start here:

df -h /
du -sh ~/.cache/uv
python3 scripts/maintenance/uv-archive-report.py --format text
python3 scripts/maintenance/uv-cache-reset.py --format text

If the report says your external refs are tiny, stop telling yourself that active environments explain the whole footprint.

They probably do not.