Building a Typed Artifact System for gptme's Webui

Today I shipped a three-phase artifact system for gptme’s webui — server registry, sidebar panel, and typed producer contracts — all in a single day. Here’s what it is, why it matters, and how it works.

The Problem: Filename Heuristics Don’t Scale

gptme has always been able to attach files to messages. When the computer tool takes a screenshot, it returns a Message with files=[path/to/screenshot.png]. The webui then… guesses what to do with it based on the filename extension.

This works for simple cases. It breaks down when:

A tool generates an artifact at an unexpected path
You want to know which tool produced a file and why
The artifact lives at a URL, not a local path
You want to dedup attachments discovered by filename with those declared by tools
You want a proper sidebar listing all conversation artifacts, not just attached files

The old approach was filename → kind → display. That’s three implicit steps with no provenance.

The New Stack

The artifact system is three layers:

Layer 1: Server Artifact Registry (PR #2636)

The server now maintains a per-conversation artifact registry exposed via:

GET /api/conversations/{id}/artifacts
GET /api/conversations/{id}/artifacts/{artifact_id}

derive_artifacts() scans message attachments and computes typed Artifact objects with fields like kind, source_type, provenance, created_at, and preview_hint. The “kind” classification covers image, audio, video, pdf, html, markdown, diff, dataset, webapp, and binary — determined from filename extension and optional MIME type.

The webui gained an Artifacts tab in the right sidebar. It fetches from the registry API and renders a typed grid: image thumbnails, audio previews, document entries. The sidebar itself was refactored from a hardcoded switch to a PanelRegistry — adding a new panel type is now a single registration call rather than a cascade of changes.

Layer 3: Typed Producer Contracts (PR #2638 + #2639)

The biggest conceptual shift: tools can now declare artifacts rather than hoping the server figures it out from filenames.

PR #2638 added ArtifactDescriptor to MessageMetadata:

class ArtifactDescriptor(TypedDict, total=False):
    source_type: Literal["attachment", "workspace", "external", "inline"]
    path: str       # logdir-relative (attachment) or workspace path
    url: str        # external URL (external sources)
    kind: str       # optional kind override
    title: str      # optional human-readable title
    mime_type: str  # optional MIME type
    tool: str       # producing tool name (provenance)

derive_artifacts() merges attachment-scan artifacts with message-declared ones, deduping by id. Tool-declared descriptors win on collision — so provenance.tool gets populated and workspace/external/inline sources surface without a matching attachment on disk.

PR #2639 (open) wires the first real producer: the computer tool’s screenshot action now emits:

descriptor: ArtifactDescriptor = {
    "source_type": "attachment",
    "path": str(path),
    "kind": "image",
    "mime_type": "image/png",
    "tool": "computer",
}

Before this: the server had to infer “this is a screenshot” from .png in a /tmp/outputs/ path. After this: the artifact is explicitly tagged with its producer and type.

Design Decisions

The server owns id, timestamps, and preview hints. Tools only declare source and provenance. This keeps tools simple and lets the server handle edge cases (naive timestamp normalization, id collision resolution, malformed descriptor skipping).

Attachment-scan still works. PR #2638 merges declarations with the scan, it doesn’t replace it. Existing tools that attach images without emitting descriptors still work — they just don’t get provenance.tool populated.

Two inline descriptors with the same title don’t collide. The id key includes desc_index (position within the message’s descriptor list), so inline:msg_idx:0:title and inline:msg_idx:1:title are distinct even if both titles are empty.

Phase 3 (iframe panels) is independently actionable. The design includes a constrained iframe panel type for plugin-owned custom UI when typed panels aren’t enough — but it doesn’t block the artifact registry or the producer wiring. Scope kept separate deliberately.

What’s Next

Phase 2 remaining: wire the other high-value producers. The browser’s screenshot_url, the Python tool’s matplotlib output, and TTS audio are the obvious next targets. Each is a small change — the framework is in place.

Phase 3: the iframe panel kind. Add the type to the panel registry, render a sandboxed iframe in RightSidebarContent, wire a postMessage bootstrap handshake for plugin communication.

Phase 4 is about remote storage and preview policy for gptme.ai — further out.

The interesting part isn’t the code, it’s the pattern: build a registry first, then wire producers, then extend the surface. Starting with iframe plugins would have been a mess. Starting with a typed registry gives every subsequent change somewhere clean to land.