This is a spec for a terminal project I have. I think we could probably use libghostty for the terminal emulation as it seems the go ecosystem is quite sparse. # patterm — v1 Spec *Working title: **patterm**. Used throughout this document.* ## 1. Overview A terminal-based agent orchestration shell. The user opens patterm in a project directory (e.g. `~/Dev/foo`). patterm presents a multi-tab TUI where each top tab is a **session** — a long-running PTY launched from a user-defined **preset**. Presets come in two flavours: **agent presets** (e.g. claude, codex, opencode — vendor LLM CLIs with patterm's MCP wired up) and **process presets** (e.g. `bun run dev`, `vitest --watch` — raw commands with no MCP). Each session has a sidebar of **children**: more presets spawned by that session, again either agents or processes, all PTY-backed. The right rail also surfaces project-scoped **scratchpads** (markdown files) for human readability. An MCP server, in-process, exposes tools that let orchestrator agents spawn and drive children, run processes, set timers, message peers, and read/write scratchpads. The orchestrator is a real LLM CLI driving another LLM CLI as if it were a human user — keystroke injection in, rendered-grid scraping out. The orchestrator fully owns the content of what it sends; patterm only handles the plumbing. **Goal:** Let one SOTA agent orchestrate other agents of different types (claude → codex, codex → opencode, …) without subagent APIs, while keeping the whole thing steerable and observable by a human at any moment. **Non-goal:** Hosting any LLM. patterm only manages CLIs the user already has installed. patterm also doesn't ship hard-coded knowledge of any specific vendor CLI — agent presets are user-editable JSON; the three common ones (claude, codex, opencode) ship as defaults. --- ## 2. Architecture and lifecycle **Single foreground process. No daemon, no detach.** The tool is one Go process that owns: the TUI, all PTYs, vt-emulated grids, session state, child state, scratchpad files, and an in-process MCP server. Killing the process kills everything inside it. There is no attach/detach, no project-keyed singleton, no socket-based reattachment. **Lifecycle:** 1. User runs `patterm` in a project directory. 2. The process starts the TUI as a **blank canvas** — no sessions, no children, no scratchpad preview. Just the empty frame with the palette hint in the status line. The in-process MCP server initializes (bound to a per-PID unix socket for spawned children — see §10) and scratchpad metadata is loaded from disk, but nothing is rendered until the user opens a preset. 3. The user opens the palette (`Ctrl-K`), selects a preset, and the first session/process is launched. Subsequent sessions and children are spawned the same way (or by orchestrators via MCP). 4. On exit (Ctrl-D, `:quit`, terminal window close, SIGTERM, SIGHUP): the process sends SIGTERM to every child PTY with a short grace window, then SIGKILL, then exits. Scratchpads on disk are the only thing that survives. **Multiple invocations:** Running `patterm` twice in the same project starts two independent processes. They share scratchpad files on disk but nothing else. If this turns out to be a footgun in practice, a per-project lockfile can be added later — out of scope for v1. **Implications:** Closing the terminal window (or SSH dropping) ends the session and tears down every child. This is the deliberate trade — no orphan daemons, no socket discovery, no stale-state recovery, no multi-client coordination. The user's terminal window *is* the lifetime boundary. --- ## 3. Project state layout Scratchpads (user data) live under `$XDG_DATA_HOME`; presets and config live under `$XDG_CONFIG_HOME`. ``` $XDG_DATA_HOME/patterm/ └── projects/ └── / ├── meta.json # project path, last-opened, version ├── trust.json # persisted command-preset trust grants (§7) └── scratchpads/ ├── notes.md ├── todos.md └── .md $XDG_CONFIG_HOME/patterm/ ├── config.json # global settings (theme, default keymap, etc.) └── presets/ ├── agents/ │ ├── claude.json # ships as default │ ├── codex.json # ships as default │ ├── opencode.json # ships as default │ └── .json └── processes/ ├── dev.json # e.g. { "name": "bun run dev", "argv": ["bun", "run", "dev"] } ├── test.json └── .json ``` Both preset directories are scanned at startup; every file found becomes a palette entry ("Spawn agent: claude", "Run process: bun run dev", …). Presets are project-agnostic in v1 — the same set is available in every project. Per-project overrides can be added later. Project key = `sha256(realpath(project_dir))[:16]`. Used only as a scratchpad directory name — there is no daemon to look up. Internal MCP socket (for spawned children to talk to the running process): `$XDG_RUNTIME_DIR/patterm/.sock`, falling back to `/tmp/patterm-.sock` if `XDG_RUNTIME_DIR` is unset. Created on startup, removed on exit. Per-PID, not per-project — it is a private IPC channel, not a discovery point. Scratchpads and command-preset trust grants persist across runs. Sessions and child processes do not — every patterm run starts with an empty process tree. --- ## 4. UI / Client ``` ┌────────────────────────────────────────────────────────┬──────────────────┐ │ [codex-1] [codex-2] [claude-1] + │ Session tree │ ├────────────────────────────────────────────────────────┤ ────── │ │ │ ▶ codex-1 │ │ │ │ │ │ │ ├─ ◉ claude-2 │ │ │ ├─ ◉ claude-3 │ │ (focused pane's PTY) │ ├─ ◉ claude-4 │ │ │ └─ ◉ bun-dev │ │ │ │ │ │ Scratchpads │ │ │ ────── │ │ │ todos.md │ │ │ notes.md │ │ │ api-plan.md │ │ │ │ │ │ ┌────────────┐ │ │ │ │ todos.md │ │ │ │ │ preview… │ │ │ │ └────────────┘ │ ├────────────────────────────────────────────────────────┴──────────────────┤ │ [orchestrator driving] Ctrl-K command palette │ └───────────────────────────────────────────────────────────────────────────┘ ``` - **Top tab bar:** one per running top-level session. Exited top-level sessions disappear from the tab bar; historical output is still available through the underlying process state / logs where supported, but dead panes do not stay in the navigation chrome. `+` opens the palette pre-filtered to "Spawn…" entries. - **Main area:** the focused pane's PTY, rendered inside patterm's main viewport. The viewport starts below the tab bar and excludes the right rail and bottom status line. The focused pane is either the orchestrator (root of the active session's tree) or one of its running children, whichever the user last selected from the sidebar. - **Right rail, top half — session tree:** the active tab's running process hierarchy, drawn as an indented tree with box-drawing connectors (`├─`, `└─`). The active top-level session is the root (`▶` when focused); direct children appear one level deeper with a running glyph (`◉`). Exited / killed processes are removed from the visible tree. Selecting an entry (palette, arrow keys, or click) makes it the focused pane. v1 only has two levels because of the §8 two-level-tree rule, but the renderer should be tree-shaped from day one so a future depth bump doesn't require UI surgery. - **Right rail, bottom half:** scratchpad list and a preview of the selected scratchpad. - **Status line:** input-ownership toast ("orchestrator driving" / "you have control") on the left, palette hint on the right. **Empty state:** Until the user spawns their first preset, the top tab bar, main area, and sidebar all sit empty with a hint ("Press Ctrl-K to spawn an agent or process") centred in the main viewport, not the full host terminal. No "default session" is created. **Switching:** Clicking a top tab (or selecting one via the palette) switches the active session — the sidebar tree swaps to that tab's hierarchy only. Clicking a sidebar entry switches the focused pane within the current session. If the focused pane exits, focus falls back to another running top-level session if one exists; otherwise the UI returns to the empty state. **Viewport and chrome ownership:** patterm owns the tab bar, right rail, status line, and any empty-state text. Child PTYs own only the main viewport. This is a hard UI boundary: child terminal output must not be allowed to clear, wrap into, or position the cursor inside patterm chrome. When rendering live child output, patterm may rewrite or clip destructive terminal sequences (for example clear-line / clear-screen / cursor-positioning sequences) and must redraw chrome after focused child output so the outer frame wins any rendering conflict. The goal is that agent TUIs behave as if their terminal size is exactly the main viewport, even though patterm is drawing additional UI around them. **Command palette (v1 input model):** Almost all application functions are driven through a single command palette opened with `Ctrl-K`. The palette is a fuzzy-searchable list of commands, scoped to whatever makes sense for the current focus. Two kinds of entries appear: - **Built-in commands** — "Switch to session…", "Focus pane…", "Take input control", "Release control to orchestrator", "Open scratchpad…", "Kill child…", "Quit", etc. - **Preset commands** — one entry per file under `$XDG_CONFIG_HOME/patterm/presets/`. Agent presets surface as "Spawn agent: codex" / "Spawn agent: claude" / …; process presets surface as "Run process: bun run dev" / "Run process: vitest" / …. The label comes from the preset's `name` field; the action is "launch this preset into a new pane." Selecting a preset either launches it immediately (no required args) or opens a sub-palette for optional args — namely an **initial prompt** (agent presets only), which patterm injects into the spawned PTY's input after the agent is ready (§8). The orchestrator equivalent of this — `spawn_agent` / `spawn_process` MCP tools — uses the exact same machinery: pick a preset by name, optionally supply an initial prompt, patterm handles the rest. Rationale: the keybinding surface for sessions + children + scratchpads + control transfer + spawning gets large fast. A palette lets us ship the full feature set without committing to a key map yet, and gives the user a discoverable index of every action. Dedicated keybindings can be layered on top later for the few actions a user does often enough to memorize — they should be configured by binding to palette command IDs, not by re-implementing the action. Only two keybindings are reserved at the application level in v1: | Action | Binding | |---|---| | Open command palette | `Ctrl-K` | | Pass-through prefix (everything else after this goes to the focused PTY untouched, e.g. for nested tmux/Ctrl-K-using TUIs) | `Ctrl-K Ctrl-K` | Everything else — session switching, child cycling, control transfer, quitting — lives in the palette for v1. --- ## 5. PTY layer One PTY per session orchestrator and one per child. For each PTY the tool maintains: - The underlying process (pid, status, exit code on death). - A raw byte ring buffer (default 1 MiB) for stream-mode reads. - A vt-emulated character grid representing current visible state. - Alt-screen flag (whether the process is in alternate-buffer mode, i.e. a TUI). - Last-write timestamp (used for the idle heuristic). **Terminal emulator:** Go has limited options. Start with `vt10x` or a maintained fork. Budget real time — this is the load-bearing component for grid mode `read_output`. The emulator must handle: SGR colours (then strip them on read), cursor movement, alt-screen entry/exit, scroll regions, basic mouse passthrough where needed. **Resize:** On startup and on SIGWINCH, the tool reads its own terminal dimensions, computes the main viewport winsize (accounting for tab bar, sidebar, and status line), and `ioctl(TIOCSWINSZ)` each PTY to that viewport size — never the full host terminal size. The headless emulator for each child is resized to the same grid. Children get SIGWINCH automatically. One process, one viewport — no multi-client resize negotiation. --- ## 6. Input ownership Each pane has an owner flag: `user` or `orchestrator`. A toast / status-line glyph reflects current owner. - When the orchestrator spawns a child, that child defaults to orchestrator-owned. - When the user focuses a pane and presses any key, ownership flips to `user`. The orchestrator can still write — bytes interleave. A warning toast appears: "Orchestrator is also driving this pane." - The user explicitly returns ownership with the release key. No locking. The user's call if they collide. The visual indicator is the only protection. --- ## 7. MCP tool surface The tool embeds an MCP server in-process. Each spawned agent gets an MCP config injected at spawn time (see §10) pointing at a stdio proxy subcommand of the same binary, which forwards JSON-RPC over the per-PID unix socket to the running process. Tool calls carry an implicit caller identity (which session / which process) derived from the connection. ### Concepts shared by all tools - **Process IDs.** Every spawnable thing — agents, terminals, commands — is addressed by an opaque short token (e.g. `p_a1b2c3`), not by OS PID. IDs are stable for the lifetime of the entry: they survive stop/restart for stored command entries; they are released when an agent or terminal exits and is `close_process`'d. Each entry also has a human-readable display name (default `-`, settable via `rename_process` or the `name` arg on spawn). - **Process kinds.** - `agent` — a vendor LLM CLI launched from an agent preset (§10). MCP-wired. Ephemeral: lost when the underlying PTY exits. - `terminal` — a bare interactive shell. Defaults to `$SHELL -i`. Ephemeral. - `command` — a process preset (e.g. `bun run dev`, `vitest --watch`) or freeform argv. **Session-persistent**: a command entry survives PTY exit so it can be `restart_process`'d, and is removed only when `close_process` is called or patterm exits. - **Trust gating.** Command entries that were authored as presets are *not* trusted by default. The first time an agent attempts to `spawn_process(kind: 'command', preset: …)`, `start_process`, or `restart_process` against an untrusted command preset, the call returns a `needs_trust` error and patterm surfaces a UI confirmation in the focused tab. Once the user confirms, the trust grant is **persisted to disk** (`$XDG_DATA_HOME/patterm/projects//trust.json`, see §3), so the user only confirms each preset once per project — not once per patterm run. Trust is keyed by `(project, preset name)` in v1; content-hashed trust (re-confirming on edit) is a v2 question (§14). Freeform-argv command entries are trusted implicitly at spawn time because the agent already had to compose the argv, and they are not written to the trust file. - **Caller role.** Every connection has a role: `orchestrator` (root of a session tree), `sub-agent` (an agent spawned by an orchestrator), or `process` (commands/terminals — these don't talk MCP, but they appear as targets). Role gates which tools the caller may invoke. Calls disallowed by role return a structured error explaining why, so the agent can adapt rather than silently fail. - **Idle / readiness.** Every PTY-backed entry tracks `idle_ms` (ms since last write to its master). Tools that read state surface this so callers can decide when a target is "done" without polling raw bytes themselves (§11). ### Lifecycle and spawning #### `spawn_agent` — orchestrator-only - **Args:** `agent` (preset name under `presets/agents/`), `agent_instructions` (string — the first turn typed into the agent's TUI after ready), `name?` (display name). - **Behaviour:** Launches the agent preset in a new PTY as a child of the calling session. Wires MCP per the preset's injection strategy (§10). Waits for the preset's `ready_signal` (default: 1s idle), then types `agent_instructions` into the input box and submits. patterm injects nothing else — the spawned agent learns its role and conventions either from `agent_instructions` or by calling `whoami` / `help` itself. - **Returns:** `{ process_id, name }`. - **Errors:** `unknown_agent` if the preset is missing; `role_forbidden` if a sub-agent calls it (with a message pointing the caller at its parent or at vendor-native subagent tooling). #### `spawn_process` - **Args:** `kind` (`terminal` | `command`), one of `preset` (name under `presets/processes/`) or `argv` (string array), `name?`, `working_dir?` (default project root), `env?`, `shell?` (only valid with `argv`; default `false`). - For `kind: terminal`, `argv` is optional — defaults to `$SHELL -i`. - For `kind: command`, exactly one of `preset` or `argv` must be supplied. - **Behaviour:** Creates a process entry, attached as a child of the calling agent's session, and starts it. No MCP injection (these aren't agents). `command` entries are persisted to the session for later `restart_process` / `start_process`; `terminal` entries are ephemeral. - **Returns:** `{ process_id, name }`. - **Errors:** `needs_trust` if `kind: command` references an untrusted preset. #### `start_process` - **Args:** `process_id`. - **Behaviour:** Starts a stored `command` entry that is currently in `stopped` or `exited` state. No-op on a running entry (returns the existing state). - **Returns:** `{ process_id, status }`. - **Errors:** `not_found`, `wrong_kind` (only command entries are start-able post-creation), `needs_trust`. #### `restart_process` - **Args:** `process_id`, `signal?` (default `SIGTERM` for the stop phase). - **Behaviour:** Stops the entry if running (grace window then SIGKILL), then starts it again with the same argv/env/working_dir. Valid for `command` entries; valid for `agent` and `terminal` entries only while their PTY is still live (since they have no stored definition to rehydrate from). - **Returns:** `{ process_id, status }`. - **Errors:** `not_found`, `needs_trust` (command presets), `wrong_kind` (trying to restart an exited agent/terminal). #### `stop_process` - **Args:** `process_id`, `signal?` (default `SIGTERM`). - **Behaviour:** Sends the signal to the entry's PTY, with the standard grace window before SIGKILL. - **Returns:** `{ process_id, status }`. #### `close_process` - **Args:** `process_id`. - **Behaviour:** Removes the entry from the session entirely. If still running, stops it first. Used to clear stored command entries the orchestrator no longer needs, and to clean up exited agent/terminal ghosts from the sidebar. - **Returns:** `ok`. #### `rename_process` - **Args:** `process_id`, `name`. - **Returns:** `ok`. Updates the display name in the sidebar and tab bar. #### `select_process` - **Args:** `process_id`. - **Behaviour:** Asks the UI to focus the given pane (switches session tab if needed). Non-blocking, advisory — distinct from `request_human_attention`, which raises a notification and expects a human decision. - **Returns:** `ok`. ### Inspection #### `list_processes` - **Args:** `kind?` (filter by `agent` | `terminal` | `command`). - **Returns:** Array of `{ process_id, name, kind, status, parent_process_id, exit_code?, idle_ms?, trusted? }` for the caller's session. `status ∈ { starting, running, stopped, exited, errored }`. #### `get_process_status` - **Args:** `process_id`. - **Returns:** `{ process_id, name, kind, status, parent_process_id, working_dir, argv?, exit_code?, started_at, idle_ms, active_screen: "main" | "alternate", rows, cols, cursor: { x, y }, trusted?, screen_version }`. #### `get_project_status` - **Args:** none. - **Returns:** `{ project: { path, key }, caller: { process_id, role, name, parent_process_id?, available_tools: [string] }, processes: [], scratchpads: [{ name, size, modified_at, revision }] }`. Everything an agent needs to orient itself in one call. #### `get_process_output` - **Args:** `process_id`, `mode` (`grid` | `stream`), `since_offset?` (stream mode only). - **Behaviour:** `grid` returns the current visible pane as plain text, ANSI stripped, with best-effort vendor-chrome trim per preset hints (§10). `stream` returns ANSI-stripped bytes from `since_offset` to the current write head. - **Returns:** `{ content, mode, new_offset?, active_screen, rows, cols, cursor, idle_ms, status, screen_version }`. - **Tool-description note (shown to the calling agent):** "The grid result is the entire visible pane. You are responsible for locating the response to your last prompt within it. Use `search_output` if you have a specific marker to find." #### `get_process_raw_output` - **Args:** `process_id`, `since_offset?`. - **Behaviour:** Returns raw bytes from `since_offset`, escape sequences preserved. Used when the agent needs to inspect control codes (rare). - **Returns:** `{ content, new_offset, status }`. #### `search_output` - **Args:** `process_id`, `pattern` (regex), `kind` (`rendered` | `raw`), `limit?` (default 20). - **Returns:** `{ matches: [{ line_no, text }], truncated: bool }`. Searches scrollback (not just the visible grid). #### `wait_for_pattern` - **Args:** `process_id`, `pattern` (regex), `timeout_seconds`, `scope?` (`grid` | `scrollback`, default `grid`). - **Behaviour:** Blocks the calling agent until the chosen surface matches the regex, or the timeout expires. Polls at ~50ms. - **Returns:** `{ matched: bool, snippet?: string }`. Used in the §9 permissions-prompt-clear flow. #### `get_process_ports` - **Args:** `process_id`. - **Returns:** `{ ports: [{ port, url?, first_seen_at }] }`. Best-effort: patterm watches the stream for `:NNNN` and `http://…` patterns and reports what it has seen. No probing. ### I/O #### `send_input` - **Args:** `process_id`, `kind` (`text` | `paste` | `key`), and: - For `text`: `text` (string), `submit?` (default `true` — appends Enter). - For `paste`: `text` (string). Sent via bracketed paste (`\e[200~ … \e[201~`) when the target's emulator state indicates support; otherwise falls back to chunked text writes without trailing newline. - For `key`: `key` (one of `enter`, `tab`, `escape`, `backspace`, `ctrl-c`, `ctrl-d`, `up`, `down`, `left`, `right`, `home`, `end`, `page-up`, `page-down`, `f1`…`f12`). Encoded via the emulator's key-encoding (Kitty keyboard protocol where negotiated, legacy escapes otherwise). - **Optional tail:** `wait_ms?` (default `0`), `tail_mode?` (`none` | `stream` | `grid`, default `stream` when `wait_ms > 0`). When `wait_ms > 0`, the call blocks for that many milliseconds after sending and then returns the tail in the chosen mode. - **Returns:** `{ ok: true, tail?: { content, mode, new_offset?, active_screen, idle_ms, screen_version } }`. ### Coordination #### `send_message` - **Args:** `target_process_id`, `message` (string). - **Behaviour:** Delivers a tagged message into the target's PTY. Direction is inferred from the relationship between caller and target: - parent → child: prepended with `[orchestrator] `. - child → parent: prepended with `[sub-agent:] `. - **Returns:** `ok`. - **Errors:** `not_related` if the target is neither the caller's parent nor a child of the caller (siblings must route through the parent in v1). #### `request_human_attention` - **Args:** `process_id`, `reason` (string). - **Behaviour:** Notification in the TUI, blinks the sidebar entry, optionally auto-focuses per user setting. The escape hatch when the orchestrator can't safely decide. - **Returns:** `ok`. #### `timer_wait` - **Args:** `seconds`, `label?`. - **Behaviour:** Returns a `timer_id` immediately. After `seconds`, injects `[system] Your timer [