This is a spec for a terminal project I have. I think we could probably use libghostty for the terminal emulation as it seems the go ecosystem is quite sparse. # patterm — v1 Spec *Working title: **patterm**. Used throughout this document.* ## 1. Overview A terminal-based agent orchestration shell. The user opens patterm in a project directory (e.g. `~/Dev/foo`). patterm presents a multi-tab TUI where each top tab is a **session** — a long-running PTY launched from a user-defined **preset**. Presets come in two flavours: **agent presets** (e.g. claude, codex, opencode — vendor LLM CLIs with patterm's MCP wired up) and **process presets** (e.g. `bun run dev`, `vitest --watch` — raw commands with no MCP). Each session has a sidebar of **children**: more presets spawned by that session, again either agents or processes, all PTY-backed. The right rail also surfaces project-scoped **scratchpads** (markdown files) for human readability. An MCP server, in-process, exposes tools that let orchestrator agents spawn and drive children, run processes, set timers, message peers, and read/write scratchpads. The orchestrator is a real LLM CLI driving another LLM CLI as if it were a human user — keystroke injection in, rendered-grid scraping out. The orchestrator fully owns the content of what it sends; patterm only handles the plumbing. **Goal:** Let one SOTA agent orchestrate other agents of different types (claude → codex, codex → opencode, …) without subagent APIs, while keeping the whole thing steerable and observable by a human at any moment. **Non-goal:** Hosting any LLM. patterm only manages CLIs the user already has installed. patterm also doesn't ship hard-coded knowledge of any specific vendor CLI — agent presets are user-editable JSON; the three common ones (claude, codex, opencode) ship as defaults. --- ## 2. Architecture and lifecycle **Single foreground process. No daemon, no detach.** The tool is one Go process that owns: the TUI, all PTYs, vt-emulated grids, session state, child state, scratchpad files, and an in-process MCP server. Killing the process kills everything inside it. There is no attach/detach, no project-keyed singleton, no socket-based reattachment. **Lifecycle:** 1. User runs `patterm` in a project directory. 2. The process starts the TUI as a **blank canvas** — no sessions, no children, no scratchpad preview. Just the empty frame with the palette hint in the status line. The in-process MCP server initializes (bound to a per-PID unix socket for spawned children — see §10) and scratchpad metadata is loaded from disk, but nothing is rendered until the user opens a preset. 3. The user opens the palette (`Ctrl-K`), selects a preset, and the first session/process is launched. Subsequent sessions and children are spawned the same way (or by orchestrators via MCP). 4. On exit (Ctrl-D, `:quit`, terminal window close, SIGTERM, SIGHUP): the process sends SIGTERM to every child PTY with a short grace window, then SIGKILL, then exits. Scratchpads on disk are the only thing that survives. **Multiple invocations:** Running `patterm` twice in the same project starts two independent processes. They share scratchpad files on disk but nothing else. If this turns out to be a footgun in practice, a per-project lockfile can be added later — out of scope for v1. **Implications:** Closing the terminal window (or SSH dropping) ends the session and tears down every child. This is the deliberate trade — no orphan daemons, no socket discovery, no stale-state recovery, no multi-client coordination. The user's terminal window *is* the lifetime boundary. --- ## 3. Project state layout Scratchpads (user data) live under `$XDG_DATA_HOME`; presets and config live under `$XDG_CONFIG_HOME`. ``` $XDG_DATA_HOME/patterm/ └── projects/ └── / ├── meta.json # project path, last-opened, version ├── trust.json # persisted command-preset trust grants (§7) └── scratchpads/ ├── notes.md ├── todos.md └── .md $XDG_CONFIG_HOME/patterm/ ├── config.json # global settings (theme, default keymap, etc.) └── presets/ ├── agents/ │ ├── claude.json # ships as default │ ├── codex.json # ships as default │ ├── opencode.json # ships as default │ └── .json └── processes/ ├── dev.json # e.g. { "name": "bun run dev", "argv": ["bun", "run", "dev"] } ├── test.json └── .json ``` Both preset directories are scanned at startup; every file found becomes a palette entry ("Spawn agent: claude", "Run process: bun run dev", …). Presets are project-agnostic in v1 — the same set is available in every project. Per-project overrides can be added later. Project key = `sha256(realpath(project_dir))[:16]`. Used only as a scratchpad directory name — there is no daemon to look up. Internal MCP socket (for spawned children to talk to the running process): `$XDG_RUNTIME_DIR/patterm/.sock`, falling back to `/tmp/patterm-.sock` if `XDG_RUNTIME_DIR` is unset. Created on startup, removed on exit. Per-PID, not per-project — it is a private IPC channel, not a discovery point. Scratchpads and command-preset trust grants persist across runs. Sessions and child processes do not — every patterm run starts with an empty process tree. --- ## 4. UI / Client ``` ┌────────────────────────────────────────────────────────┬──────────────────┐ │ [codex-1] [codex-2] [claude-1] + │ Session tree │ ├────────────────────────────────────────────────────────┤ ────── │ │ │ ▶ codex-1 │ │ │ │ │ │ │ ├─ ◉ claude-2 │ │ │ ├─ ◉ claude-3 │ │ (focused pane's PTY) │ ├─ ◉ claude-4 │ │ │ └─ ◉ bun-dev │ │ │ │ │ │ Scratchpads │ │ │ ────── │ │ │ todos.md │ │ │ notes.md │ │ │ api-plan.md │ │ │ │ │ │ ┌────────────┐ │ │ │ │ todos.md │ │ │ │ │ preview… │ │ │ │ └────────────┘ │ ├────────────────────────────────────────────────────────┴──────────────────┤ │ [orchestrator driving] Ctrl-K command palette │ └───────────────────────────────────────────────────────────────────────────┘ ``` - **Top tab bar:** one per top-level session. `+` opens the palette pre-filtered to "Spawn…" entries. - **Main area:** the focused pane's PTY, rendered identically to viewing it in a regular terminal. The focused pane is either the orchestrator (root of the active session's tree) or one of its children, whichever the user last selected from the sidebar. - **Right rail, top half — session tree:** the active session's process hierarchy, drawn as an indented tree with box-drawing connectors (`├─`, `└─`). The orchestrator is the root (`▶`); each child appears one level deeper with a status glyph (`◉` running, `✓` exited cleanly, `✗` errored). Selecting an entry (palette, arrow keys, or click) makes it the focused pane. v1 only has two levels because of the §8 two-level-tree rule, but the renderer should be tree-shaped from day one so a future depth bump doesn't require UI surgery. - **Right rail, bottom half:** scratchpad list and a preview of the selected scratchpad. - **Status line:** input-ownership toast ("orchestrator driving" / "you have control") on the left, palette hint on the right. **Empty state:** Until the user spawns their first preset, the top tab bar, main area, and sidebar all sit empty with a centred hint ("Press Ctrl-K to spawn an agent or process"). No "default session" is created. **Switching:** Clicking a top tab (or selecting one via the palette) switches the active session — the sidebar tree swaps to that session's hierarchy. Clicking a sidebar entry switches the focused pane within the current session. **Command palette (v1 input model):** Almost all application functions are driven through a single command palette opened with `Ctrl-K`. The palette is a fuzzy-searchable list of commands, scoped to whatever makes sense for the current focus. Two kinds of entries appear: - **Built-in commands** — "Switch to session…", "Focus pane…", "Take input control", "Release control to orchestrator", "Open scratchpad…", "Kill child…", "Quit", etc. - **Preset commands** — one entry per file under `$XDG_CONFIG_HOME/patterm/presets/`. Agent presets surface as "Spawn agent: codex" / "Spawn agent: claude" / …; process presets surface as "Run process: bun run dev" / "Run process: vitest" / …. The label comes from the preset's `name` field; the action is "launch this preset into a new pane." Selecting a preset either launches it immediately (no required args) or opens a sub-palette for optional args — namely an **initial prompt** (agent presets only), which patterm injects into the spawned PTY's input after the agent is ready (§8). The orchestrator equivalent of this — `spawn_agent` / `spawn_process` MCP tools — uses the exact same machinery: pick a preset by name, optionally supply an initial prompt, patterm handles the rest. Rationale: the keybinding surface for sessions + children + scratchpads + control transfer + spawning gets large fast. A palette lets us ship the full feature set without committing to a key map yet, and gives the user a discoverable index of every action. Dedicated keybindings can be layered on top later for the few actions a user does often enough to memorize — they should be configured by binding to palette command IDs, not by re-implementing the action. Only two keybindings are reserved at the application level in v1: | Action | Binding | |---|---| | Open command palette | `Ctrl-K` | | Pass-through prefix (everything else after this goes to the focused PTY untouched, e.g. for nested tmux/Ctrl-K-using TUIs) | `Ctrl-K Ctrl-K` | Everything else — session switching, child cycling, control transfer, quitting — lives in the palette for v1. --- ## 5. PTY layer One PTY per session orchestrator and one per child. For each PTY the tool maintains: - The underlying process (pid, status, exit code on death). - A raw byte ring buffer (default 1 MiB) for stream-mode reads. - A vt-emulated character grid representing current visible state. - Alt-screen flag (whether the process is in alternate-buffer mode, i.e. a TUI). - Last-write timestamp (used for the idle heuristic). **Terminal emulator:** Go has limited options. Start with `vt10x` or a maintained fork. Budget real time — this is the load-bearing component for grid mode `read_output`. The emulator must handle: SGR colours (then strip them on read), cursor movement, alt-screen entry/exit, scroll regions, basic mouse passthrough where needed. **Resize:** On startup and on SIGWINCH, the tool reads its own terminal dimensions, computes per-pane winsize (accounting for tab bar, sidebar, status line), and `ioctl(TIOCSWINSZ)` each PTY. Children get SIGWINCH automatically. One process, one viewport — no multi-client resize negotiation. --- ## 6. Input ownership Each pane has an owner flag: `user` or `orchestrator`. A toast / status-line glyph reflects current owner. - When the orchestrator spawns a child, that child defaults to orchestrator-owned. - When the user focuses a pane and presses any key, ownership flips to `user`. The orchestrator can still write — bytes interleave. A warning toast appears: "Orchestrator is also driving this pane." - The user explicitly returns ownership with the release key. No locking. The user's call if they collide. The visual indicator is the only protection. --- ## 7. MCP tool surface The tool embeds an MCP server in-process. Each spawned agent gets an MCP config injected at spawn time (see §10) pointing at a stdio proxy subcommand of the same binary, which forwards JSON-RPC over the per-PID unix socket to the running process. Tool calls carry an implicit caller identity (which session / which process) derived from the connection. ### Concepts shared by all tools - **Process IDs.** Every spawnable thing — agents, terminals, commands — is addressed by an opaque short token (e.g. `p_a1b2c3`), not by OS PID. IDs are stable for the lifetime of the entry: they survive stop/restart for stored command entries; they are released when an agent or terminal exits and is `close_process`'d. Each entry also has a human-readable display name (default `-`, settable via `rename_process` or the `name` arg on spawn). - **Process kinds.** - `agent` — a vendor LLM CLI launched from an agent preset (§10). MCP-wired. Ephemeral: lost when the underlying PTY exits. - `terminal` — a bare interactive shell. Defaults to `$SHELL -i`. Ephemeral. - `command` — a process preset (e.g. `bun run dev`, `vitest --watch`) or freeform argv. **Session-persistent**: a command entry survives PTY exit so it can be `restart_process`'d, and is removed only when `close_process` is called or patterm exits. - **Trust gating.** Command entries that were authored as presets are *not* trusted by default. The first time an agent attempts to `spawn_process(kind: 'command', preset: …)`, `start_process`, or `restart_process` against an untrusted command preset, the call returns a `needs_trust` error and patterm surfaces a UI confirmation in the focused tab. Once the user confirms, the trust grant is **persisted to disk** (`$XDG_DATA_HOME/patterm/projects//trust.json`, see §3), so the user only confirms each preset once per project — not once per patterm run. Trust is keyed by `(project, preset name)` in v1; content-hashed trust (re-confirming on edit) is a v2 question (§14). Freeform-argv command entries are trusted implicitly at spawn time because the agent already had to compose the argv, and they are not written to the trust file. - **Caller role.** Every connection has a role: `orchestrator` (root of a session tree), `sub-agent` (an agent spawned by an orchestrator), or `process` (commands/terminals — these don't talk MCP, but they appear as targets). Role gates which tools the caller may invoke. Calls disallowed by role return a structured error explaining why, so the agent can adapt rather than silently fail. - **Idle / readiness.** Every PTY-backed entry tracks `idle_ms` (ms since last write to its master). Tools that read state surface this so callers can decide when a target is "done" without polling raw bytes themselves (§11). ### Lifecycle and spawning #### `spawn_agent` — orchestrator-only - **Args:** `agent` (preset name under `presets/agents/`), `agent_instructions` (string — the first turn typed into the agent's TUI after ready), `name?` (display name). - **Behaviour:** Launches the agent preset in a new PTY as a child of the calling session. Wires MCP per the preset's injection strategy (§10). Waits for the preset's `ready_signal` (default: 1s idle), then types `agent_instructions` into the input box and submits. patterm injects nothing else — the spawned agent learns its role and conventions either from `agent_instructions` or by calling `whoami` / `help` itself. - **Returns:** `{ process_id, name }`. - **Errors:** `unknown_agent` if the preset is missing; `role_forbidden` if a sub-agent calls it (with a message pointing the caller at its parent or at vendor-native subagent tooling). #### `spawn_process` - **Args:** `kind` (`terminal` | `command`), one of `preset` (name under `presets/processes/`) or `argv` (string array), `name?`, `working_dir?` (default project root), `env?`, `shell?` (only valid with `argv`; default `false`). - For `kind: terminal`, `argv` is optional — defaults to `$SHELL -i`. - For `kind: command`, exactly one of `preset` or `argv` must be supplied. - **Behaviour:** Creates a process entry, attached as a child of the calling agent's session, and starts it. No MCP injection (these aren't agents). `command` entries are persisted to the session for later `restart_process` / `start_process`; `terminal` entries are ephemeral. - **Returns:** `{ process_id, name }`. - **Errors:** `needs_trust` if `kind: command` references an untrusted preset. #### `start_process` - **Args:** `process_id`. - **Behaviour:** Starts a stored `command` entry that is currently in `stopped` or `exited` state. No-op on a running entry (returns the existing state). - **Returns:** `{ process_id, status }`. - **Errors:** `not_found`, `wrong_kind` (only command entries are start-able post-creation), `needs_trust`. #### `restart_process` - **Args:** `process_id`, `signal?` (default `SIGTERM` for the stop phase). - **Behaviour:** Stops the entry if running (grace window then SIGKILL), then starts it again with the same argv/env/working_dir. Valid for `command` entries; valid for `agent` and `terminal` entries only while their PTY is still live (since they have no stored definition to rehydrate from). - **Returns:** `{ process_id, status }`. - **Errors:** `not_found`, `needs_trust` (command presets), `wrong_kind` (trying to restart an exited agent/terminal). #### `stop_process` - **Args:** `process_id`, `signal?` (default `SIGTERM`). - **Behaviour:** Sends the signal to the entry's PTY, with the standard grace window before SIGKILL. - **Returns:** `{ process_id, status }`. #### `close_process` - **Args:** `process_id`. - **Behaviour:** Removes the entry from the session entirely. If still running, stops it first. Used to clear stored command entries the orchestrator no longer needs, and to clean up exited agent/terminal ghosts from the sidebar. - **Returns:** `ok`. #### `rename_process` - **Args:** `process_id`, `name`. - **Returns:** `ok`. Updates the display name in the sidebar and tab bar. #### `select_process` - **Args:** `process_id`. - **Behaviour:** Asks the UI to focus the given pane (switches session tab if needed). Non-blocking, advisory — distinct from `request_human_attention`, which raises a notification and expects a human decision. - **Returns:** `ok`. ### Inspection #### `list_processes` - **Args:** `kind?` (filter by `agent` | `terminal` | `command`). - **Returns:** Array of `{ process_id, name, kind, status, parent_process_id, exit_code?, idle_ms?, trusted? }` for the caller's session. `status ∈ { starting, running, stopped, exited, errored }`. #### `get_process_status` - **Args:** `process_id`. - **Returns:** `{ process_id, name, kind, status, parent_process_id, working_dir, argv?, exit_code?, started_at, idle_ms, active_screen: "main" | "alternate", rows, cols, cursor: { x, y }, trusted?, screen_version }`. #### `get_project_status` - **Args:** none. - **Returns:** `{ project: { path, key }, caller: { process_id, role, name, parent_process_id?, available_tools: [string] }, processes: [], scratchpads: [{ name, size, modified_at, revision }] }`. Everything an agent needs to orient itself in one call. #### `get_process_output` - **Args:** `process_id`, `mode` (`grid` | `stream`), `since_offset?` (stream mode only). - **Behaviour:** `grid` returns the current visible pane as plain text, ANSI stripped, with best-effort vendor-chrome trim per preset hints (§10). `stream` returns ANSI-stripped bytes from `since_offset` to the current write head. - **Returns:** `{ content, mode, new_offset?, active_screen, rows, cols, cursor, idle_ms, status, screen_version }`. - **Tool-description note (shown to the calling agent):** "The grid result is the entire visible pane. You are responsible for locating the response to your last prompt within it. Use `search_output` if you have a specific marker to find." #### `get_process_raw_output` - **Args:** `process_id`, `since_offset?`. - **Behaviour:** Returns raw bytes from `since_offset`, escape sequences preserved. Used when the agent needs to inspect control codes (rare). - **Returns:** `{ content, new_offset, status }`. #### `search_output` - **Args:** `process_id`, `pattern` (regex), `kind` (`rendered` | `raw`), `limit?` (default 20). - **Returns:** `{ matches: [{ line_no, text }], truncated: bool }`. Searches scrollback (not just the visible grid). #### `wait_for_pattern` - **Args:** `process_id`, `pattern` (regex), `timeout_seconds`, `scope?` (`grid` | `scrollback`, default `grid`). - **Behaviour:** Blocks the calling agent until the chosen surface matches the regex, or the timeout expires. Polls at ~50ms. - **Returns:** `{ matched: bool, snippet?: string }`. Used in the §9 permissions-prompt-clear flow. #### `get_process_ports` - **Args:** `process_id`. - **Returns:** `{ ports: [{ port, url?, first_seen_at }] }`. Best-effort: patterm watches the stream for `:NNNN` and `http://…` patterns and reports what it has seen. No probing. ### I/O #### `send_input` - **Args:** `process_id`, `kind` (`text` | `paste` | `key`), and: - For `text`: `text` (string), `submit?` (default `true` — appends Enter). - For `paste`: `text` (string). Sent via bracketed paste (`\e[200~ … \e[201~`) when the target's emulator state indicates support; otherwise falls back to chunked text writes without trailing newline. - For `key`: `key` (one of `enter`, `tab`, `escape`, `backspace`, `ctrl-c`, `ctrl-d`, `up`, `down`, `left`, `right`, `home`, `end`, `page-up`, `page-down`, `f1`…`f12`). Encoded via the emulator's key-encoding (Kitty keyboard protocol where negotiated, legacy escapes otherwise). - **Optional tail:** `wait_ms?` (default `0`), `tail_mode?` (`none` | `stream` | `grid`, default `stream` when `wait_ms > 0`). When `wait_ms > 0`, the call blocks for that many milliseconds after sending and then returns the tail in the chosen mode. - **Returns:** `{ ok: true, tail?: { content, mode, new_offset?, active_screen, idle_ms, screen_version } }`. ### Coordination #### `send_message` - **Args:** `target_process_id`, `message` (string). - **Behaviour:** Delivers a tagged message into the target's PTY. Direction is inferred from the relationship between caller and target: - parent → child: prepended with `[orchestrator] `. - child → parent: prepended with `[sub-agent:] `. - **Returns:** `ok`. - **Errors:** `not_related` if the target is neither the caller's parent nor a child of the caller (siblings must route through the parent in v1). #### `request_human_attention` - **Args:** `process_id`, `reason` (string). - **Behaviour:** Notification in the TUI, blinks the sidebar entry, optionally auto-focuses per user setting. The escape hatch when the orchestrator can't safely decide. - **Returns:** `ok`. #### `timer_wait` - **Args:** `seconds`, `label?`. - **Behaviour:** Returns a `timer_id` immediately. After `seconds`, injects `[system] Your timer [