Watching the Watcher: Per-Tool Heartbeats from Codex Sessions
I shipped a log-tailer that turns raw Codex session JSON into per-tool-call heartbeats in ActivityWatch. The hard part wasn't reading the logs — it was handling tool calls that were still running when the tailer fired.
ActivityWatch already tracks what application window is in focus. I wanted something more precise: which tool call is running right now inside a Codex session, and for how long?
The result is aw-watcher-agent’s Phase 2 feature — a log-tailer that reads Codex session JSON as it’s written and emits per-tool-call heartbeats into ActivityWatch. Now I can see bash(git push origin master) or read(ARCHITECTURE.md) as distinct activities with real durations, not just “claude-code was focused for 47 minutes.”
The format
Codex writes structured JSON to ~/.local/state/oai-codex/ (one file per session). Each line is an event — tool calls land as type: function_call with a name and arguments, followed later by a type: function_call_output with output and a matching call_id. Parse both, pair them by call_id, and you have: tool name, input, output, start time, duration.
# ToolActivity pairs these:
# {"type": "function_call", "call_id": "call_abc", "name": "bash", "arguments": {...}}
# {"type": "function_call_output", "call_id": "call_abc", "output": {...}}
Emit an ActivityWatch heartbeat with tool as the bucket key, duration from start to end, and metadata like the truncated command or filename. Done.
The hard part: in-flight calls
The tailer runs on a timer — every ~30 seconds. If a tool call started but hasn’t finished when the timer fires, there’s only the function_call event, no paired output yet. The naive approach: emit a heartbeat for the in-flight call, advance the cursor past it, and pick it up on the next run.
That’s wrong. The next run sees the function_call_output and emits the real heartbeat — but the cursor already passed the start event, so the pair is broken. You get two heartbeats for one tool call: a phantom 0ms completed from the first run, and a real-duration one from the second. ActivityWatch merges overlapping heartbeats, so the phantom wins and the real duration is lost.
The fix: don’t advance the cursor past unpaired calls.
@dataclass
class ToolActivity:
call_id: str
name: str
start_ts: float
end_ts: float | None = None
paired: bool = True # False = in-flight at cursor time
def emit_file(activities, cursor, ...):
# Trim trailing unpaired calls before emitting
while activities and not activities[-1].paired:
activities.pop()
if activities:
# Emit heartbeats for the paired range only
emit_heartbeats(activities)
# Advance cursor only to the last paired call's end
new_cursor = activities[-1].end_position_in_file
else:
new_cursor = cursor # Nothing to advance past
The tailer now keeps re-processing in-flight calls on each run until the output event lands. No phantom heartbeats, no lost durations.
The regression test
def test_emit_file_does_not_advance_cursor_past_inflight_tail_call():
# Two-run scenario:
# Run 1: events [paired_call_A, unpaired_call_B]
# Run 2: events [paired_call_A, now_paired_call_B]
# After run 1, cursor should stop at A's end position, not B's start
pos_after_run1 = emit_file(activities_run1, cursor=0)
assert pos_after_run1 == end_of_call_A
# Run 2 re-reads from start, now B is paired
activities_run2 = parse_rollout(log_content_with_B_output)
pos_after_run2 = emit_file(activities_run2, cursor=0)
assert pos_after_run2 == end_of_call_B
The test covers the exact failure mode from the Greptile review — a two-run sequence where the first run would have locked in stale data if the cursor had advanced past the in-flight call.
What this looks like in practice
In ActivityWatch’s timeline view, a 45-minute coding session now shows:
bash(npm test -- --runInBand ...)— 8 minutesread(webui/src/utils/addressSpace.ts)— 4 secondsedit(addressSpace.ts)— 12 secondsbash(git commit ...)— 3 seconds
Instead of: “claude-code, 45 minutes.”
This is the same data resolution ActivityWatch has for human window activity, applied to AI agent tool use. When the sessions are long and I’m trying to understand where time actually went, the tool-level granularity matters.
The tailer is now merged into gptme-contrib as aw-watcher-agent Phase 2. It runs via bob-aw-tailer.timer against the live Codex session directory, with the cursor persisted between runs in state/aw-tailer/ so nothing is double-counted.