This is a spec for a terminal project I have.

I think we could probably use libghostty for the terminal emulation as it seems the go ecosystem is quite sparse.

# patterm — v1 Spec

*Working title: **patterm**. Used throughout this document.*

## 1. Overview

A terminal-based agent orchestration shell. The user opens patterm in a project directory (e.g. `~/Dev/foo`). patterm presents a multi-tab TUI where each top tab is a **session** — a long-running PTY launched from a user-defined **preset**. Presets come in two flavours: **agent presets** (e.g. claude, codex, opencode — vendor LLM CLIs with patterm's MCP wired up) and **process presets** (e.g. `bun run dev`, `vitest --watch` — raw commands with no MCP). Each session has a sidebar of **children**: more presets spawned by that session, again either agents or processes, all PTY-backed. The right rail also surfaces project-scoped **scratchpads** (markdown files) for human readability.

An MCP server, in-process, exposes tools that let orchestrator agents spawn and drive children, run processes, set timers, message peers, and read/write scratchpads. The orchestrator is a real LLM CLI driving another LLM CLI as if it were a human user — keystroke injection in, rendered-grid scraping out. The orchestrator fully owns the content of what it sends; patterm only handles the plumbing.

**Goal:** Let one SOTA agent orchestrate other agents of different types (claude → codex, codex → opencode, …) without subagent APIs, while keeping the whole thing steerable and observable by a human at any moment.

**Non-goal:** Hosting any LLM. patterm only manages CLIs the user already has installed. patterm also doesn't ship hard-coded knowledge of any specific vendor CLI — agent presets are user-editable JSON; the three common ones (claude, codex, opencode) ship as defaults.

---

## 2. Architecture and lifecycle

**Single foreground process. No daemon, no detach.**

The tool is one Go process that owns: the TUI, all PTYs, vt-emulated grids, session state, child state, scratchpad files, and an in-process MCP server. Killing the process kills everything inside it. There is no attach/detach, no project-keyed singleton, no socket-based reattachment.

**Lifecycle:**

1. User runs `patterm` in a project directory.
2. The process starts the TUI as a **blank canvas** — no sessions, no children, no scratchpad preview. Just the empty frame with the palette hint in the status line. The in-process MCP server initializes (bound to a per-PID unix socket for spawned children — see §10) and scratchpad metadata is loaded from disk, but nothing is rendered until the user opens a preset.
3. The user opens the palette (`Ctrl-K`), selects a preset, and the first session/process is launched. Subsequent sessions and children are spawned the same way (or by orchestrators via MCP).
4. On exit (Ctrl-D, `:quit`, terminal window close, SIGTERM, SIGHUP): the process sends SIGTERM to every child PTY with a short grace window, then SIGKILL, then exits. Scratchpads on disk are the only thing that survives.

**Multiple invocations:** Running `patterm` twice in the same project starts two independent processes. They share scratchpad files on disk but nothing else. If this turns out to be a footgun in practice, a per-project lockfile can be added later — out of scope for v1.

**Implications:** Closing the terminal window (or SSH dropping) ends the session and tears down every child. This is the deliberate trade — no orphan daemons, no socket discovery, no stale-state recovery, no multi-client coordination. The user's terminal window *is* the lifetime boundary.

---

## 3. Project state layout

Scratchpads (user data) live under `$XDG_DATA_HOME`; presets and config live under `$XDG_CONFIG_HOME`.

```
$XDG_DATA_HOME/patterm/
└── projects/
    └── <project-key>/
        ├── meta.json          # project path, last-opened, version
        ├── trust.json         # persisted command-preset trust grants (§7)
        └── scratchpads/
            ├── notes.md
            ├── todos.md
            └── <agent-written>.md

$XDG_CONFIG_HOME/patterm/
├── config.json                # global settings (theme, default keymap, etc.)
└── presets/
    ├── agents/
    │   ├── claude.json        # ships as default
    │   ├── codex.json         # ships as default
    │   ├── opencode.json      # ships as default
    │   └── <user-defined>.json
    └── processes/
        ├── dev.json           # e.g. { "name": "bun run dev", "argv": ["bun", "run", "dev"] }
        ├── test.json
        └── <user-defined>.json
```

Both preset directories are scanned at startup; every file found becomes a palette entry ("Spawn agent: claude", "Run process: bun run dev", …). Presets are project-agnostic in v1 — the same set is available in every project. Per-project overrides can be added later.

Project key = `sha256(realpath(project_dir))[:16]`. Used only as a scratchpad directory name — there is no daemon to look up.

Internal MCP socket (for spawned children to talk to the running process): `$XDG_RUNTIME_DIR/patterm/<pid>.sock`, falling back to `/tmp/patterm-<pid>.sock` if `XDG_RUNTIME_DIR` is unset. Created on startup, removed on exit. Per-PID, not per-project — it is a private IPC channel, not a discovery point.

Scratchpads and command-preset trust grants persist across runs. Sessions and child processes do not — every patterm run starts with an empty process tree.

---

## 4. UI / Client

```
┌────────────────────────────────────────────────────────┬──────────────────┐
│ [codex-1] [codex-2] [claude-1]                       + │  Session tree    │
├────────────────────────────────────────────────────────┤  ──────          │
│                                                        │  ▶ codex-1       │
│                                                        │  │               │
│                                                        │  ├─ ◉ claude-2   │
│                                                        │  ├─ ◉ claude-3   │
│              (focused pane's PTY)                      │  ├─ ◉ claude-4   │
│                                                        │  └─ ◉ bun-dev    │
│                                                        │                  │
│                                                        │  Scratchpads     │
│                                                        │  ──────          │
│                                                        │  todos.md        │
│                                                        │  notes.md        │
│                                                        │  api-plan.md     │
│                                                        │                  │
│                                                        │  ┌────────────┐  │
│                                                        │  │ todos.md   │  │
│                                                        │  │ preview…   │  │
│                                                        │  └────────────┘  │
├────────────────────────────────────────────────────────┴──────────────────┤
│ [orchestrator driving]                          Ctrl-K  command palette  │
└───────────────────────────────────────────────────────────────────────────┘
```

- **Top tab bar:** one per running top-level session. Exited top-level sessions disappear from the tab bar; historical output is still available through the underlying process state / logs where supported, but dead panes do not stay in the navigation chrome. `+` opens the palette pre-filtered to "Spawn…" entries.
- **Main area:** the focused pane's PTY, rendered inside patterm's main viewport. The viewport starts below the tab bar and excludes the right rail and bottom status line. The focused pane is either the orchestrator (root of the active session's tree) or one of its running children, whichever the user last selected from the sidebar.
- **Right rail, top half — session tree:** the active tab's running process hierarchy, drawn as an indented tree with box-drawing connectors (`├─`, `└─`). The active top-level session is the root (`▶` when focused); direct children appear one level deeper with a running glyph (`◉`). Exited / killed processes are removed from the visible tree. Selecting an entry (palette, arrow keys, or click) makes it the focused pane. v1 only has two levels because of the §8 two-level-tree rule, but the renderer should be tree-shaped from day one so a future depth bump doesn't require UI surgery.
- **Right rail, bottom half:** scratchpad list and a preview of the selected scratchpad.
- **Status line:** input-ownership toast ("orchestrator driving" / "you have control") on the left, palette hint on the right.

**Empty state:** Until the user spawns their first preset, the top tab bar, main area, and sidebar all sit empty with a hint ("Press Ctrl-K to spawn an agent or process") centred in the main viewport, not the full host terminal. No "default session" is created.

**Switching:** Clicking a top tab (or selecting one via the palette) switches the active session — the sidebar tree swaps to that tab's hierarchy only. Clicking a sidebar entry switches the focused pane within the current session. If the focused pane exits, focus falls back to another running top-level session if one exists; otherwise the UI returns to the empty state.

**Viewport and chrome ownership:** patterm owns the tab bar, right rail, status line, and any empty-state text. Child PTYs own only the main viewport. This is a hard UI boundary: child terminal output must not be allowed to clear, wrap into, or position the cursor inside patterm chrome. When rendering live child output, patterm may rewrite or clip destructive terminal sequences (for example clear-line / clear-screen / cursor-positioning sequences) and must redraw chrome after focused child output so the outer frame wins any rendering conflict. The goal is that agent TUIs behave as if their terminal size is exactly the main viewport, even though patterm is drawing additional UI around them.

**Command palette (v1 input model):**

Almost all application functions are driven through a single command palette opened with `Ctrl-K`. The palette is a fuzzy-searchable list of commands, scoped to whatever makes sense for the current focus. Two kinds of entries appear:

- **Built-in commands** — "Switch to session…", "Focus pane…", "Take input control", "Release control to orchestrator", "Open scratchpad…", "Kill child…", "Quit", etc.
- **Preset commands** — one entry per file under `$XDG_CONFIG_HOME/patterm/presets/`. Agent presets surface as "Spawn agent: codex" / "Spawn agent: claude" / …; process presets surface as "Run process: bun run dev" / "Run process: vitest" / …. The label comes from the preset's `name` field; the action is "launch this preset into a new pane."

Selecting a preset either launches it immediately (no required args) or opens a sub-palette for optional args — namely an **initial prompt** (agent presets only), which patterm injects into the spawned PTY's input after the agent is ready (§8). The orchestrator equivalent of this — `spawn_agent` / `spawn_process` MCP tools — uses the exact same machinery: pick a preset by name, optionally supply an initial prompt, patterm handles the rest.

Rationale: the keybinding surface for sessions + children + scratchpads + control transfer + spawning gets large fast. A palette lets us ship the full feature set without committing to a key map yet, and gives the user a discoverable index of every action. Dedicated keybindings can be layered on top later for the few actions a user does often enough to memorize — they should be configured by binding to palette command IDs, not by re-implementing the action.

Only two keybindings are reserved at the application level in v1:

| Action | Binding |
|---|---|
| Open command palette | `Ctrl-K` |
| Pass-through prefix (everything else after this goes to the focused PTY untouched, e.g. for nested tmux/Ctrl-K-using TUIs) | `Ctrl-K Ctrl-K` |

Everything else — session switching, child cycling, control transfer, quitting — lives in the palette for v1.

---

## 5. PTY layer

One PTY per session orchestrator and one per child. For each PTY the tool maintains:

- The underlying process (pid, status, exit code on death).
- A raw byte ring buffer (default 1 MiB) for stream-mode reads.
- A vt-emulated character grid representing current visible state.
- Alt-screen flag (whether the process is in alternate-buffer mode, i.e. a TUI).
- Last-write timestamp (used for the idle heuristic).

**Terminal emulator:** Go has limited options. Start with `vt10x` or a maintained fork. Budget real time — this is the load-bearing component for grid mode `read_output`. The emulator must handle: SGR colours (then strip them on read), cursor movement, alt-screen entry/exit, scroll regions, basic mouse passthrough where needed.

**Resize:** On startup and on SIGWINCH, the tool reads its own terminal dimensions, computes the main viewport winsize (accounting for tab bar, sidebar, and status line), and `ioctl(TIOCSWINSZ)` each PTY to that viewport size — never the full host terminal size. The headless emulator for each child is resized to the same grid. Children get SIGWINCH automatically. One process, one viewport — no multi-client resize negotiation.

---

## 6. Input ownership

Each pane has an owner flag: `user` or `orchestrator`. A toast / status-line glyph reflects current owner.

- When the orchestrator spawns a child, that child defaults to orchestrator-owned.
- When the user focuses a pane and presses any key, ownership flips to `user`. The orchestrator can still write — bytes interleave. A warning toast appears: "Orchestrator is also driving this pane."
- The user explicitly returns ownership with the release key.

No locking. The user's call if they collide. The visual indicator is the only protection.

---

## 7. MCP tool surface

The tool embeds an MCP server in-process. Each spawned agent gets an MCP config injected at spawn time (see §10) pointing at a stdio proxy subcommand of the same binary, which forwards JSON-RPC over the per-PID unix socket to the running process. Tool calls carry an implicit caller identity (which session / which process) derived from the connection.

### Concepts shared by all tools

- **Process IDs.** Every spawnable thing — agents, terminals, commands — is addressed by an opaque short token (e.g. `p_a1b2c3`), not by OS PID. IDs are stable for the lifetime of the entry: they survive stop/restart for stored command entries; they are released when an agent or terminal exits and is `close_process`'d. Each entry also has a human-readable display name (default `<kind>-<n>`, settable via `rename_process` or the `name` arg on spawn).
- **Process kinds.**
  - `agent` — a vendor LLM CLI launched from an agent preset (§10). MCP-wired. Ephemeral: lost when the underlying PTY exits.
  - `terminal` — a bare interactive shell. Defaults to `$SHELL -i`. Ephemeral.
  - `command` — a process preset (e.g. `bun run dev`, `vitest --watch`) or freeform argv. **Session-persistent**: a command entry survives PTY exit so it can be `restart_process`'d, and is removed only when `close_process` is called or patterm exits.
- **Trust gating.** Command entries that were authored as presets are *not* trusted by default. The first time an agent attempts to `spawn_process(kind: 'command', preset: …)`, `start_process`, or `restart_process` against an untrusted command preset, the call returns a `needs_trust` error and patterm surfaces a UI confirmation in the focused tab. Once the user confirms, the trust grant is **persisted to disk** (`$XDG_DATA_HOME/patterm/projects/<key>/trust.json`, see §3), so the user only confirms each preset once per project — not once per patterm run. Trust is keyed by `(project, preset name)` in v1; content-hashed trust (re-confirming on edit) is a v2 question (§14). Freeform-argv command entries are trusted implicitly at spawn time because the agent already had to compose the argv, and they are not written to the trust file.
- **Caller role.** Every connection has a role: `orchestrator` (root of a session tree), `sub-agent` (an agent spawned by an orchestrator), or `process` (commands/terminals — these don't talk MCP, but they appear as targets). Role gates which tools the caller may invoke. Calls disallowed by role return a structured error explaining why, so the agent can adapt rather than silently fail.
- **Idle / readiness.** Every PTY-backed entry tracks `idle_ms` (ms since last write to its master). Tools that read state surface this so callers can decide when a target is "done" without polling raw bytes themselves (§11).

### Lifecycle and spawning

#### `spawn_agent` — orchestrator-only
- **Args:** `agent` (preset name under `presets/agents/`), `agent_instructions` (string — the first turn typed into the agent's TUI after ready), `name?` (display name).
- **Behaviour:** Launches the agent preset in a new PTY as a child of the calling session. Wires MCP per the preset's injection strategy (§10). Waits for the preset's `ready_signal` (default: 1s idle), then types `agent_instructions` into the input box and submits. patterm injects nothing else — the spawned agent learns its role and conventions either from `agent_instructions` or by calling `whoami` / `help` itself.
- **Returns:** `{ process_id, name }`.
- **Errors:** `unknown_agent` if the preset is missing; `role_forbidden` if a sub-agent calls it (with a message pointing the caller at its parent or at vendor-native subagent tooling).

#### `spawn_process`
- **Args:** `kind` (`terminal` | `command`), one of `preset` (name under `presets/processes/`) or `argv` (string array), `name?`, `working_dir?` (default project root), `env?`, `shell?` (only valid with `argv`; default `false`).
  - For `kind: terminal`, `argv` is optional — defaults to `$SHELL -i`.
  - For `kind: command`, exactly one of `preset` or `argv` must be supplied.
- **Behaviour:** Creates a process entry, attached as a child of the calling agent's session, and starts it. No MCP injection (these aren't agents). `command` entries are persisted to the session for later `restart_process` / `start_process`; `terminal` entries are ephemeral.
- **Returns:** `{ process_id, name }`.
- **Errors:** `needs_trust` if `kind: command` references an untrusted preset.

#### `start_process`
- **Args:** `process_id`.
- **Behaviour:** Starts a stored `command` entry that is currently in `stopped` or `exited` state. No-op on a running entry (returns the existing state).
- **Returns:** `{ process_id, status }`.
- **Errors:** `not_found`, `wrong_kind` (only command entries are start-able post-creation), `needs_trust`.

#### `restart_process`
- **Args:** `process_id`, `signal?` (default `SIGTERM` for the stop phase).
- **Behaviour:** Stops the entry if running (grace window then SIGKILL), then starts it again with the same argv/env/working_dir. Valid for `command` entries; valid for `agent` and `terminal` entries only while their PTY is still live (since they have no stored definition to rehydrate from).
- **Returns:** `{ process_id, status }`.
- **Errors:** `not_found`, `needs_trust` (command presets), `wrong_kind` (trying to restart an exited agent/terminal).

#### `stop_process`
- **Args:** `process_id`, `signal?` (default `SIGTERM`).
- **Behaviour:** Sends the signal to the entry's PTY, with the standard grace window before SIGKILL.
- **Returns:** `{ process_id, status }`.

#### `close_process`
- **Args:** `process_id`.
- **Behaviour:** Removes the entry from the session entirely. If still running, stops it first. Used to clear stored command entries the orchestrator no longer needs, and to clean up exited agent/terminal ghosts from the sidebar.
- **Returns:** `ok`.

#### `rename_process`
- **Args:** `process_id`, `name`.
- **Returns:** `ok`. Updates the display name in the sidebar and tab bar.

#### `select_process`
- **Args:** `process_id`.
- **Behaviour:** Asks the UI to focus the given pane (switches session tab if needed). Non-blocking, advisory — distinct from `request_human_attention`, which raises a notification and expects a human decision.
- **Returns:** `ok`.

### Inspection

#### `list_processes`
- **Args:** `kind?` (filter by `agent` | `terminal` | `command`).
- **Returns:** Array of `{ process_id, name, kind, status, parent_process_id, exit_code?, idle_ms?, trusted? }` for the caller's session. `status ∈ { starting, running, stopped, exited, errored }`.

#### `get_process_status`
- **Args:** `process_id`.
- **Returns:** `{ process_id, name, kind, status, parent_process_id, working_dir, argv?, exit_code?, started_at, idle_ms, active_screen: "main" | "alternate", rows, cols, cursor: { x, y }, trusted?, screen_version }`.

#### `get_project_status`
- **Args:** none.
- **Returns:** `{ project: { path, key }, caller: { process_id, role, name, parent_process_id?, available_tools: [string] }, processes: [<list_processes entry>], scratchpads: [{ name, size, modified_at, revision }] }`. Everything an agent needs to orient itself in one call.

#### `get_process_output`
- **Args:** `process_id`, `mode` (`grid` | `stream`), `since_offset?` (stream mode only).
- **Behaviour:** `grid` returns the current visible pane as plain text, ANSI stripped, with best-effort vendor-chrome trim per preset hints (§10). `stream` returns ANSI-stripped bytes from `since_offset` to the current write head.
- **Returns:** `{ content, mode, new_offset?, active_screen, rows, cols, cursor, idle_ms, status, screen_version }`.
- **Tool-description note (shown to the calling agent):** "The grid result is the entire visible pane. You are responsible for locating the response to your last prompt within it. Use `search_output` if you have a specific marker to find."

#### `get_process_raw_output`
- **Args:** `process_id`, `since_offset?`.
- **Behaviour:** Returns raw bytes from `since_offset`, escape sequences preserved. Used when the agent needs to inspect control codes (rare).
- **Returns:** `{ content, new_offset, status }`.

#### `search_output`
- **Args:** `process_id`, `pattern` (regex), `kind` (`rendered` | `raw`), `limit?` (default 20).
- **Returns:** `{ matches: [{ line_no, text }], truncated: bool }`. Searches scrollback (not just the visible grid).

#### `wait_for_pattern`
- **Args:** `process_id`, `pattern` (regex), `timeout_seconds`, `scope?` (`grid` | `scrollback`, default `grid`).
- **Behaviour:** Blocks the calling agent until the chosen surface matches the regex, or the timeout expires. Polls at ~50ms.
- **Returns:** `{ matched: bool, snippet?: string }`. Used in the §9 permissions-prompt-clear flow.

#### `get_process_ports`
- **Args:** `process_id`.
- **Returns:** `{ ports: [{ port, url?, first_seen_at }] }`. Best-effort: patterm watches the stream for `:NNNN` and `http://…` patterns and reports what it has seen. No probing.

### I/O

#### `send_input`
- **Args:** `process_id`, `kind` (`text` | `paste` | `key`), and:
  - For `text`: `text` (string), `submit?` (default `true` — appends Enter).
  - For `paste`: `text` (string). Sent via bracketed paste (`\e[200~ … \e[201~`) when the target's emulator state indicates support; otherwise falls back to chunked text writes without trailing newline.
  - For `key`: `key` (one of `enter`, `tab`, `escape`, `backspace`, `ctrl-c`, `ctrl-d`, `up`, `down`, `left`, `right`, `home`, `end`, `page-up`, `page-down`, `f1`…`f12`). Encoded via the emulator's key-encoding (Kitty keyboard protocol where negotiated, legacy escapes otherwise).
- **Optional tail:** `wait_ms?` (default `0`), `tail_mode?` (`none` | `stream` | `grid`, default `stream` when `wait_ms > 0`). When `wait_ms > 0`, the call blocks for that many milliseconds after sending and then returns the tail in the chosen mode.
- **Returns:** `{ ok: true, tail?: { content, mode, new_offset?, active_screen, idle_ms, screen_version } }`.

### Coordination

#### `send_message`
- **Args:** `target_process_id`, `message` (string).
- **Behaviour:** Delivers a tagged message into the target's PTY. Direction is inferred from the relationship between caller and target:
  - parent → child: prepended with `[orchestrator] `.
  - child → parent: prepended with `[sub-agent:<caller_name>] `.
- **Returns:** `ok`.
- **Errors:** `not_related` if the target is neither the caller's parent nor a child of the caller (siblings must route through the parent in v1).

#### `request_human_attention`
- **Args:** `process_id`, `reason` (string).
- **Behaviour:** Notification in the TUI, blinks the sidebar entry, optionally auto-focuses per user setting. The escape hatch when the orchestrator can't safely decide.
- **Returns:** `ok`.

#### `timer_wait`
- **Args:** `seconds`, `label?`.
- **Behaviour:** Returns a `timer_id` immediately. After `seconds`, injects `[system] Your timer [<label>] has completed.\n` into the caller's pane.
- **Returns:** `{ timer_id }`.

### Scratchpads

All scratchpad reads return a `revision` token (an opaque short hash of the file contents at read time). Writes may optionally supply `expected_revision` for last-write-wins-with-detection; mismatches return `{ ok: false, current_revision }` without writing, so the caller can re-read and merge.

#### `scratchpad_list`
- **Returns:** `[{ name, size, modified_at, revision }]`.

#### `scratchpad_read`
- **Args:** `name`.
- **Returns:** `{ content, revision }`.

#### `scratchpad_write`
- **Args:** `name`, `content`, `expected_revision?`.
- **Returns:** `{ ok: true, revision } | { ok: false, current_revision }`.

#### `scratchpad_append`
- **Args:** `name`, `content`.
- **Returns:** `{ ok: true, revision }`. Appends are unconditional — concurrent appends interleave at write time but never lose data.

### Meta

#### `whoami`
- **Args:** none.
- **Returns:** `{ process_id, name, role, parent_process_id?, project: { path, key }, available_tools: [string] }`. The `available_tools` field is the authoritative answer to "what can I call from here" — agents should consult it rather than guessing from their training distribution.

#### `help`
- **Args:** `topic?` (string).
- **Behaviour:** Returns topic-specific guidance for the caller's role. With no argument, returns the list of topics plus a one-line orientation. Topics in v1: `spawning`, `inspection`, `io`, `coordination`, `scratchpads`, `timers`, `readiness`, `permissions`, `conventions`, `topics`. The `conventions` topic documents the `[orchestrator]` / `[sub-agent:<name>]` / `[system]` tag protocol so a sub-agent that wasn't briefed by its parent can still learn it.
- **Returns:** `{ topic, content, related_tools: [string] }`.

---

## 8. Conversation protocol

patterm does **not** inject any framing or system-prompt text into spawned agents. Whatever an agent sees in its input is exactly what the user typed or what an orchestrator chose to send. The orchestrator (or the human launching it) is responsible for telling a spawned agent what its role is, what tools it has, and what conventions to expect.

That said, when patterm relays messages programmatically between agents or surfaces lifecycle events, it tags them so the receiving agent can distinguish sources. These tags are the patterm convention; agents will encounter them in their input and are expected to recognize them from context (or because their parent explained them in the initial prompt).

- `[orchestrator] <msg>` — prepended when `send_message` delivers a message from a parent to a child.
- `[sub-agent:<name>] <msg>` — prepended when `send_message` delivers a message from a child to its parent.
- `[system] <msg>` — patterm itself (timer fires, child exited, etc.).
- Direct user typing is **not** prefixed. The user sees the pane and types normally; the agent receives the keystrokes as-is.

Agents that weren't briefed by their parent can self-discover their role, parent, project, and the tag conventions by calling `whoami` and `help('conventions')` (§7). This is the supported substitute for the SPEC having no system-prompt injection — the conventions live in the tool surface, not in an injected preamble.

No "ready" handshake. patterm treats the agent as ready once its PTY hits the preset's `ready_signal` (default: 1s idle after launch — see §10). The very first thing the agent receives after that point is whatever the caller passed as `agent_instructions`.

Two-level tree only. Sub-agents cannot call `spawn_agent` — the call returns a `role_forbidden` error that explains the rule and points at vendor-native subagent tooling.

---

## 9. Permissions flow

Sub-agents are launched with vendor permissions **on** — the orchestrator drives their confirmation prompts.

Loop:

1. Orchestrator sends a message to a sub-agent via `send_message`.
2. Sub-agent runs, eventually hits a tool-use confirmation in its TUI ("Allow Bash(rm -rf foo)? [y/N]").
3. Sub-agent goes idle (cursor stops animating, no byte writes for 1s — exposed as `idle_ms` on `get_process_status` / `list_processes`).
4. Orchestrator's loop calls `get_process_output(process_id, mode="grid")`, sees the prompt, decides, and calls `send_input(process_id, kind="key", key="y")` or `"n"` (or `kind="text"` with `text="y"`, `submit=true`).
5. If the orchestrator can't safely decide, it calls `request_human_attention(process_id, "Sub-agent wants to run X, looks destructive, need your call")`. The orchestrator then waits (using `wait_for_pattern` or repeated reads) until the prompt is no longer on screen.

Risks acknowledged: the orchestrator's reading of the prompt is a vision/parsing problem on rendered text. We trust a SOTA model to handle this correctly. The `request_human_attention` punt is the safety valve.

---

## 10. Presets

Presets are user-editable JSON files that describe how to launch something. patterm itself has no hard-coded agent or process types — every spawnable thing is a preset. Two flavours:

### Agent presets

`$XDG_CONFIG_HOME/patterm/presets/agents/<name>.json`. Launches a vendor LLM CLI with MCP wired up and the conversation-protocol addendum injected.

| Field | Purpose |
|---|---|
| `name` | Display name shown in the palette (e.g. "claude", "codex haiku", "opencode-experimental") |
| `argv` | Full launch argv (e.g. `["claude"]`, `["codex", "--no-tui-banner"]`) |
| `env` | Env vars to set (merged over inherited env) |
| `working_dir` | Defaults to the project root |
| `mcp_injection` | How to point this CLI at patterm's stdio proxy. One of: `{ "kind": "flag", "flag": "--mcp-config", "config_path": "..." }`, `{ "kind": "config_file", "path": "~/.codex/config.toml", "merge_key": "mcp_servers" }`, `{ "kind": "env_var", "var": "MCP_CONFIG_PATH" }` |
| `ready_signal` | How to detect the TUI is ready (default: 1s idle after launch). Override per-CLI if needed. |
| `chrome_trim_hints` | Optional regexes / row ranges for stripping vendor chrome in grid reads |

Default presets shipped: `claude`, `codex`, `opencode`. Authoring these is per-vendor research — each CLI has its own MCP config conventions, ready states, and TUI chrome. Users can copy and edit them, or add new ones (e.g. a second `claude` preset that launches with a specific model or system prompt file).

MCP config flow: at startup, for each agent preset, patterm renders a small JSON pointing at its own `mcp-stdio` proxy subcommand (`patterm mcp-stdio --socket <pid-sock> --identity <token>`) into a per-preset temp file. The launch then uses the preset's `mcp_injection` strategy to hand that path to the CLI. The user's global vendor config is never mutated.

### Process presets

`$XDG_CONFIG_HOME/patterm/presets/processes/<name>.json`. Launches a raw command in a PTY — no MCP, no addendum, no system prompt.

| Field | Purpose |
|---|---|
| `name` | Display name shown in the palette (e.g. "bun run dev") |
| `argv` | Launch argv (e.g. `["bun", "run", "dev"]`) |
| `shell` | If `true`, argv is interpreted via `sh -lc`. Default `false`. |
| `env` | Env vars to set |
| `working_dir` | Defaults to the project root |

Process presets are intentionally thin: they're shortcuts for commands the user runs often. Anything more exotic — pipelines, redirections — uses `shell: true`, or the orchestrator can call `spawn_process` with freeform argv.

---

## 11. Done-signal heuristic

A pane is considered "idle" when no bytes have been written to its PTY's master end for **1000 ms**.

Rationale: every supported vendor TUI animates a spinner while busy (during LLM streaming and during tool execution). A genuinely idle pane stops animating.

Caveats and mitigations:

- LLM provider hiccups can cause >1s gaps mid-stream. Per-agent tuning of the idle threshold is allowed in the preset.
- Orchestrators should treat idle as a signal to *read*, not as a guarantee of completion. If the read returns something ambiguous, they can `wait_for_pattern` with a known terminal marker (e.g. the agent's input prompt) for stronger evidence.
- The tool exposes idle state via `list_processes` / `get_process_status` so orchestrators don't need to poll byte streams directly.

---

## 12. Failure modes

| Failure | Behaviour |
|---|---|
| Sub-agent process exits unexpectedly | Sidebar marks child as exited, exit code preserved. Orchestrator's next `get_process_output` returns final grid + exit metadata. |
| Vendor CLI hangs without exiting | Looks idle. Orchestrator must use `wait_for_pattern` or `request_human_attention` to escape. |
| Tool process crashes | All PTYs are children of the tool's process group; OS cleans them up (process-group SIGHUP on terminal close, PTY master close, parent-death signal on Linux). On macOS treat cleanup as best-effort; scratchpads on disk survive. |
| User closes the terminal window / SSH drops | Process receives SIGHUP, cascades SIGTERM → SIGKILL to every child, exits. Everything inside the tool dies with it. This is the intended model. |
| Disk full on scratchpad write | Tool returns error to caller. |
| LLM provider network blip | Pane idles, may trigger false "done" — orchestrator should sanity-check responses. |
| User kills the orchestrator pane | Tool detects PTY close, cascades SIGTERM to that session's children. |
| Concurrent input | Bytes interleave on PTY stdin. Toast warns. User's call. |
| Vt emulator bug on exotic ANSI | Grid rendering corrupts for that pane. Orchestrator's read will be noisy; degrade gracefully, don't crash. |

---

## 13. Out of scope for v1

- Cross-project orchestration.
- Sub-agents spawning sub-agents (trees deeper than 2).
- Daemonized / detachable sessions surviving the terminal window. The tool is intentionally bound to the user's foreground process.
- Multi-client attach to a single session.
- Native ACP support (PTY scraping only).
- Hosting any LLM internally.
- Auth beyond OS-level file permissions on the IPC socket and state dir.
- Web / API control surface.
- Recording / replay of sessions.

---

## 14. Open questions

- **Vt emulator library.** Resolved in the closing note — `libghostty-vt` is the bet, with `vt10x` / `charmbracelet/x/vt` as fallback only.
- **MCP transport.** Resolved — in-process MCP core with a `mcp-stdio` proxy subcommand for spawned children (see §7 and §10). Streamable HTTP can be added later.
- **Scratchpad concurrency.** Resolved — `scratchpad_read` / `scratchpad_write` carry an opaque `revision` token; writes may supply `expected_revision` for optimistic last-write-wins (see §7). Appends are unconditional.
- **Cross-restart trust persistence for command presets.** Resolved — trust state is persisted to disk (see §3) so the user doesn't re-confirm every patterm run. Open: whether trust should be tied to the preset *contents* (hash) so editing a trusted preset re-triggers confirmation. v1 keys trust by preset name; v2 may upgrade to content-hashed trust.
- **Default presets that ship in the box.** claude / codex / opencode is the working set; trimming to two for the first cut is fine since presets are user-editable anyway.
- **Per-project preset overrides.** v1 has a single global preset directory. Whether `./.patterm/presets/` should override per-project is a v2 question.

---

## 15. Suggested build order

1. Single-process skeleton: TUI bootstraps, owns the terminal, handles SIGWINCH / SIGHUP / SIGTERM, exits cleanly.
2. Single PTY per session + vt emulator + tab bar UI + basic input/render.
3. Multi-session, multi-child (sidebar) with raw process spawning, process groups, kill cascade on exit (no MCP yet).
4. In-process MCP server + `mcp-stdio` proxy subcommand + per-PID unix socket + `spawn_process` / `get_process_output` / `send_input` / `stop_process` / `wait_for_pattern` / `list_processes` / `whoami` / `help`.
5. `spawn_agent` for one agent (probably claude), conversation tag conventions, `agent_instructions` injection (typed into the TUI input after ready).
6. Scratchpads (with revisions), `timer_wait`, `request_human_attention`, `send_message`.
7. Second and third agent presets, chrome-trim heuristics.
8. Polish: command palette, status indicators, error UX.


---

Yes — use `libghostty-vt` for the terminal emulation layer. Not full Ghostty, and not as a UI renderer. Use it as a headless VT state machine inside the tool process, wrapped behind your own Go interface.

`libghostty-vt` is basically aimed at exactly your load-bearing problem: it is a C library extracted from Ghostty that handles VT parsing, terminal state, scrollback, line wrapping, resize reflow, input event encoding, and related terminal internals. The current docs also warn that the API is still unstable, so this should be a pinned dependency, not something you casually track at HEAD. ([libghostty.tip.ghostty.org][1])

The right move is:

```go
type Emulator interface {
    WritePTYOutput([]byte)
    Resize(cols, rows uint16)
    PlainText() string
    Cell(x, y int) Cell
    Cursor() Cursor
    ActiveScreen() Screen
}
```

Then implement `GhosttyEmulator` behind that. Keep `vt10x` or `charmbracelet/x/vt` as experimental/fallback only. `vt10x` is pure Go and convenient, but its own package docs describe it as “in development”; Charm’s `x` repo is explicitly experimental with no backwards-compatibility promise. For this project, terminal fidelity is not a nice-to-have; it is the product. ([Go Packages][2])

The best part: `libghostty-vt` already has formatter support for producing plain text from the active screen, which maps cleanly to your `read_output(mode="grid")`; it also exposes key and mouse encoding, which matters once you stop only typing ASCII strings and start needing arrows, Ctrl-C, Tab, Escape, mouse passthrough, and Kitty keyboard protocol support. ([libghostty.tip.ghostty.org][3])

The catch: cgo/build packaging becomes real. Pin a commit, vendor or checksum the library, and put all C ABI calls in one internal package. Do not scatter cgo across the codebase.

Big spec changes I’d make before building:

First, change MCP transport strategy. Implement the in-process MCP core once, then expose it via a tiny stdio proxy subcommand:

```sh
patterm mcp-stdio --socket "$SOCK" --identity "$TOKEN"
```

Each spawned agent gets an MCP config pointing at that command. The vendor CLI thinks it is launching a normal stdio MCP server; the proxy forwards JSON-RPC to the running tool process over its per-PID Unix socket. This avoids relying on every CLI supporting HTTP over Unix sockets, gives you clean per-agent identity, and keeps the tool process as the single owner of state.

Still support Streamable HTTP later, but stdio-proxy-first is more robust for local CLIs. MCP currently defines stdio and Streamable HTTP as standard transports, and Claude Code, Codex, and OpenCode all expose MCP configuration paths that can work with local or HTTP-style servers. ([Model Context Protocol][4])

Second, remove the generic `MCP_CONFIG_PATH` assumption. Each preset needs real vendor-specific MCP config handling. Claude Code supports `--mcp-config` and `--strict-mcp-config`. ([Claude][5]) Codex config uses `~/.codex/config.toml` / project `.codex/config.toml`, with `mcp_servers.<id>.command` for stdio and `mcp_servers.<id>.url` for HTTP. ([OpenAI Developers][6]) OpenCode exposes MCP through its `mcp` config option and `opencode mcp add`, so that preset needs its own path too. ([OpenCode][7])

Third, add a child-to-parent MCP tool. Your conversation protocol mentions `[sub-agent:<name>]` messages reporting back, but the tool surface does not currently include a way for a sub-agent to send one. Add:

```text
report_to_parent(message: string) -> ok
```

Then the tool injects:

```text
[sub-agent:codex-2] <message>
```

into the parent orchestrator pane. Without this, the orchestrator has to scrape the child forever, which is workable but worse.

Fourth, change `spawn_process(command: string)` to an argv form:

```json
{
  "argv": ["bun", "run", "dev"],
  "working_dir": ".",
  "env": {},
  "shell": false
}
```

Let agents explicitly request shell mode:

```json
{
  "argv": ["sh", "-lc", "bun run dev | tee /tmp/dev.log"],
  "shell": true
}
```

A raw command string is quoting hell and makes policy inspection harder.

Fifth, make permission handling more conservative. The orchestrator reading a rendered confirmation prompt is useful, but it is not a safety boundary. A malicious repo or child process can print misleading prompt-like text. Default policy should be: auto-answer only boring, allowlisted prompts; punt writes, deletes, network exfiltration, credential access, `sudo`, package install scripts, and broad shell commands to the human. OpenCode’s own docs say operations are allowed by default unless permissions are configured, so per-agent recipe permissions need to be deliberate rather than assumed safe. ([OpenCode][7])

Sixth, child cleanup on tool exit must be real. There is no daemon to keep PTYs alive — but the OS will not magically reap children either. Put every spawned PTY in the tool's process group (or a dedicated sub-group), set Linux `PR_SET_PDEATHSIG` on children, close PTY masters on exit, and install a SIGHUP/SIGTERM handler that runs the SIGTERM→grace→SIGKILL cascade before the process actually exits. On macOS, parent-death signals don't exist; rely on process-group SIGHUP and PTY master close, and treat any straggler cleanup as best-effort. A stale-process sweep on next startup is unnecessary now that there is no daemon to outlive its children.

Seventh, revise `send_input`. Text plus `append_newline` is too weak. You need:

```json
{
  "kind": "text" | "paste" | "key",
  "text": "...",
  "key": "enter|tab|escape|ctrl-c|left|right|up|down",
  "submit": true
}
```

Use bracketed paste for multi-line prompt injection where the target TUI supports it. Otherwise multi-line prompts can accidentally submit partial content.

Eighth, expose more metadata in `read_output`. Return row numbers, active screen, cursor position, idle state, process status, and maybe a `screen_version`.

```json
{
  "content": "...",
  "mode": "grid",
  "active_screen": "alternate",
  "rows": 38,
  "cols": 120,
  "cursor": {"x": 4, "y": 37},
  "idle_ms": 1420,
  "screen_version": 9182,
  "status": "running"
}
```

Models are better at parsing when you give them stable structure.

For `libghostty-vt`, the implementation detail that matters most is effects. The docs say VT processing handles terminal state by default, but side-effect sequences such as bell, title changes, device queries, and write-back responses need configured callbacks; those callbacks are synchronous and should not block. Wire at least `WRITE_PTY`, bell, title, size/query responses, and active-screen tracking early. ([libghostty.tip.ghostty.org][8])

Recommended revised build order:

1. PTY + `libghostty-vt` spike before any UI work. Spawn `bash`, `vim`, `htop`, Claude/Codex/OpenCode if installed, feed output into Ghostty, dump plain grid. This either validates the core bet or kills it early.

2. Single-process TUI with one PTY session. SIGWINCH-driven resize from the tool's own terminal. No MCP yet.

3. Raw child process spawning, sidebar, process groups, kill cascade on exit/SIGHUP, idle detection.

4. MCP stdio proxy subcommand and core tools: `spawn_process`, `read_output`, `send_input`, `kill`, `list_children`.

5. One orchestrator preset, probably Claude first because it has useful CLI flags for MCP config. Use `--mcp-config` and `--strict-mcp-config` so the user's global Claude config isn't mutated. ([Claude][5])

6. `spawn_agent`, `report_to_parent`, `send_message_to`, and timer injection.

7. Scratchpads with revision IDs. Last-write-wins is okay for v1, but return a revision so agents can avoid blind overwrites:

```json
scratchpad_read -> { "content": "...", "revision": "abc123" }
scratchpad_write -> { "content": "...", "expected_revision": "abc123" }
```

8. Second and third recipes. Keep recipe files declarative, but expect custom Go code for each vendor.

9. Chrome trimming heuristics and golden tests using recorded VT byte streams from each supported CLI.

One more practical point: put scratchpads under XDG data, not config. Something like:

```text
$XDG_DATA_HOME/patterm/projects/<key>/scratchpads/
```

Keep spawn recipes/config under:

```text
$XDG_CONFIG_HOME/patterm/
```

Scratchpads are user data, not configuration. Not fatal, but fixing it now avoids awkward migration later.

Overall: the concept is buildable, but the hard parts are not MCP or the TUI chrome. The hard parts are terminal fidelity, process lifecycle, vendor recipe drift, and permission safety. `libghostty-vt` is the right core bet, provided you isolate it behind an interface and treat its unstable API as a vendored implementation detail.

[1]: https://libghostty.tip.ghostty.org/ "libghostty: libghostty-vt - Virtual Terminal Emulator Library"
[2]: https://pkg.go.dev/github.com/micro-editor/terminal "terminal package - github.com/micro-editor/terminal - Go Packages"
[3]: https://libghostty.tip.ghostty.org/group__formatter.html "libghostty: Formatter"
[4]: https://modelcontextprotocol.io/specification/2025-11-25/basic/transports "Transports - Model Context Protocol"
[5]: https://code.claude.com/docs/en/cli-reference "CLI reference - Claude Code Docs"
[6]: https://developers.openai.com/codex/config-reference "Configuration Reference – Codex | OpenAI Developers"
[7]: https://opencode.ai/docs/config/ "Config | OpenCode"
[8]: https://libghostty.tip.ghostty.org/group__terminal.html "libghostty: Terminal"