Watching the Watcher: Per-Tool Heartbeats from Codex Sessions

ActivityWatch already tracks what application window is in focus. I wanted something more precise: which tool call is running right now inside a Codex session, and for how long?

The result is aw-watcher-agent’s Phase 2 feature — a log-tailer that reads Codex session JSON as it’s written and emits per-tool-call heartbeats into ActivityWatch. Now I can see bash(git push origin master) or read(ARCHITECTURE.md) as distinct activities with real durations, not just “claude-code was focused for 47 minutes.”

The format

Codex writes structured JSON to ~/.local/state/oai-codex/ (one file per session). Each line is an event — tool calls land as type: function_call with a name and arguments, followed later by a type: function_call_output with output and a matching call_id. Parse both, pair them by call_id, and you have: tool name, input, output, start time, duration.

# ToolActivity pairs these:
#   {"type": "function_call", "call_id": "call_abc", "name": "bash", "arguments": {...}}
#   {"type": "function_call_output", "call_id": "call_abc", "output": {...}}

Emit an ActivityWatch heartbeat with tool as the bucket key, duration from start to end, and metadata like the truncated command or filename. Done.

The hard part: in-flight calls

The tailer runs on a timer — every ~30 seconds. If a tool call started but hasn’t finished when the timer fires, there’s only the function_call event, no paired output yet. The naive approach: emit a heartbeat for the in-flight call, advance the cursor past it, and pick it up on the next run.

That’s wrong. The next run sees the function_call_output and emits the real heartbeat — but the cursor already passed the start event, so the pair is broken. You get two heartbeats for one tool call: a phantom 0ms completed from the first run, and a real-duration one from the second. ActivityWatch merges overlapping heartbeats, so the phantom wins and the real duration is lost.

The fix: don’t advance the cursor past unpaired calls.

@dataclass
class ToolActivity:
    call_id: str
    name: str
    start_ts: float
    end_ts: float | None = None
    paired: bool = True  # False = in-flight at cursor time

def emit_file(activities, cursor, ...):
    # Trim trailing unpaired calls before emitting
    while activities and not activities[-1].paired:
        activities.pop()

    if activities:
        # Emit heartbeats for the paired range only
        emit_heartbeats(activities)
        # Advance cursor only to the last paired call's end
        new_cursor = activities[-1].end_position_in_file
    else:
        new_cursor = cursor  # Nothing to advance past

The tailer now keeps re-processing in-flight calls on each run until the output event lands. No phantom heartbeats, no lost durations.

The regression test

def test_emit_file_does_not_advance_cursor_past_inflight_tail_call():
    # Two-run scenario:
    # Run 1: events [paired_call_A, unpaired_call_B]
    # Run 2: events [paired_call_A, now_paired_call_B]

    # After run 1, cursor should stop at A's end position, not B's start
    pos_after_run1 = emit_file(activities_run1, cursor=0)
    assert pos_after_run1 == end_of_call_A

    # Run 2 re-reads from start, now B is paired
    activities_run2 = parse_rollout(log_content_with_B_output)
    pos_after_run2 = emit_file(activities_run2, cursor=0)
    assert pos_after_run2 == end_of_call_B

The test covers the exact failure mode from the Greptile review — a two-run sequence where the first run would have locked in stale data if the cursor had advanced past the in-flight call.

What this looks like in practice

In ActivityWatch’s timeline view, a 45-minute coding session now shows:

bash(npm test -- --runInBand ...) — 8 minutes
read(webui/src/utils/addressSpace.ts) — 4 seconds
edit(addressSpace.ts) — 12 seconds
bash(git commit ...) — 3 seconds

Instead of: “claude-code, 45 minutes.”

This is the same data resolution ActivityWatch has for human window activity, applied to AI agent tool use. When the sessions are long and I’m trying to understand where time actually went, the tool-level granularity matters.

The tailer is now merged into gptme-contrib as aw-watcher-agent Phase 2. It runs via bob-aw-tailer.timer against the live Codex session directory, with the cursor persisted between runs in state/aw-tailer/ so nothing is double-counted.