Files

Harry Bayliss 8f396706e5 Rework §7 MCP tool surface

Replaces the original child_id-keyed tool set with a soloterm-inspired
process-entry model: opaque process_ids, three kinds (agent/terminal/command),
session-persistent command entries with disk-persisted trust grants, and a
single bidirectional send_message in place of the send_message_to /
report_to_parent split. Adds whoami, help, get_project_status, rename_process,
select_process, close_process, search_output, get_process_ports, and a richer
send_input with key/paste support and optional wait_ms tail. Updates §3
(trust.json), §8 (self-discovery via whoami/help), §9 (renamed tools in the
permissions loop), §11 (idle exposed via list_processes), §14 (resolves the
scratchpad-revision and trust-persistence questions, opens content-hashed
trust), and §15 (build-order tool names).

2026-05-14 13:57:31 +01:00

45 KiB

Raw Blame History

This is a spec for a terminal project I have.

I think we could probably use libghostty for the terminal emulation as it seems the go ecosystem is quite sparse.

patterm — v1 Spec

Working title: patterm. Used throughout this document.

1. Overview

A terminal-based agent orchestration shell. The user opens patterm in a project directory (e.g. ~/Dev/foo). patterm presents a multi-tab TUI where each top tab is a session — a long-running PTY launched from a user-defined preset. Presets come in two flavours: agent presets (e.g. claude, codex, opencode — vendor LLM CLIs with patterm's MCP wired up) and process presets (e.g. bun run dev, vitest --watch — raw commands with no MCP). Each session has a sidebar of children: more presets spawned by that session, again either agents or processes, all PTY-backed. The right rail also surfaces project-scoped scratchpads (markdown files) for human readability.

An MCP server, in-process, exposes tools that let orchestrator agents spawn and drive children, run processes, set timers, message peers, and read/write scratchpads. The orchestrator is a real LLM CLI driving another LLM CLI as if it were a human user — keystroke injection in, rendered-grid scraping out. The orchestrator fully owns the content of what it sends; patterm only handles the plumbing.

Goal: Let one SOTA agent orchestrate other agents of different types (claude → codex, codex → opencode, …) without subagent APIs, while keeping the whole thing steerable and observable by a human at any moment.

Non-goal: Hosting any LLM. patterm only manages CLIs the user already has installed. patterm also doesn't ship hard-coded knowledge of any specific vendor CLI — agent presets are user-editable JSON; the three common ones (claude, codex, opencode) ship as defaults.

2. Architecture and lifecycle

Single foreground process. No daemon, no detach.

The tool is one Go process that owns: the TUI, all PTYs, vt-emulated grids, session state, child state, scratchpad files, and an in-process MCP server. Killing the process kills everything inside it. There is no attach/detach, no project-keyed singleton, no socket-based reattachment.

Lifecycle:

User runs patterm in a project directory.
The process starts the TUI as a blank canvas — no sessions, no children, no scratchpad preview. Just the empty frame with the palette hint in the status line. The in-process MCP server initializes (bound to a per-PID unix socket for spawned children — see §10) and scratchpad metadata is loaded from disk, but nothing is rendered until the user opens a preset.
The user opens the palette (Ctrl-K), selects a preset, and the first session/process is launched. Subsequent sessions and children are spawned the same way (or by orchestrators via MCP).
On exit (Ctrl-D, :quit, terminal window close, SIGTERM, SIGHUP): the process sends SIGTERM to every child PTY with a short grace window, then SIGKILL, then exits. Scratchpads on disk are the only thing that survives.

Multiple invocations: Running patterm twice in the same project starts two independent processes. They share scratchpad files on disk but nothing else. If this turns out to be a footgun in practice, a per-project lockfile can be added later — out of scope for v1.

Implications: Closing the terminal window (or SSH dropping) ends the session and tears down every child. This is the deliberate trade — no orphan daemons, no socket discovery, no stale-state recovery, no multi-client coordination. The user's terminal window is the lifetime boundary.

3. Project state layout

Scratchpads (user data) live under $XDG_DATA_HOME; presets and config live under $XDG_CONFIG_HOME.

$XDG_DATA_HOME/patterm/
└── projects/
    └── <project-key>/
        ├── meta.json          # project path, last-opened, version
        ├── trust.json         # persisted command-preset trust grants (§7)
        └── scratchpads/
            ├── notes.md
            ├── todos.md
            └── <agent-written>.md

$XDG_CONFIG_HOME/patterm/
├── config.json                # global settings (theme, default keymap, etc.)
└── presets/
    ├── agents/
    │   ├── claude.json        # ships as default
    │   ├── codex.json         # ships as default
    │   ├── opencode.json      # ships as default
    │   └── <user-defined>.json
    └── processes/
        ├── dev.json           # e.g. { "name": "bun run dev", "argv": ["bun", "run", "dev"] }
        ├── test.json
        └── <user-defined>.json

Both preset directories are scanned at startup; every file found becomes a palette entry ("Spawn agent: claude", "Run process: bun run dev", …). Presets are project-agnostic in v1 — the same set is available in every project. Per-project overrides can be added later.

Project key = sha256(realpath(project_dir))[:16]. Used only as a scratchpad directory name — there is no daemon to look up.

Internal MCP socket (for spawned children to talk to the running process): $XDG_RUNTIME_DIR/patterm/<pid>.sock, falling back to /tmp/patterm-<pid>.sock if XDG_RUNTIME_DIR is unset. Created on startup, removed on exit. Per-PID, not per-project — it is a private IPC channel, not a discovery point.

Scratchpads and command-preset trust grants persist across runs. Sessions and child processes do not — every patterm run starts with an empty process tree.

4. UI / Client

┌────────────────────────────────────────────────────────┬──────────────────┐
│ [codex-1] [codex-2] [claude-1]                       + │  Session tree    │
├────────────────────────────────────────────────────────┤  ──────          │
│                                                        │  ▶ codex-1       │
│                                                        │  │               │
│                                                        │  ├─ ◉ claude-2   │
│                                                        │  ├─ ◉ claude-3   │
│              (focused pane's PTY)                      │  ├─ ◉ claude-4   │
│                                                        │  └─ ◉ bun-dev    │
│                                                        │                  │
│                                                        │  Scratchpads     │
│                                                        │  ──────          │
│                                                        │  todos.md        │
│                                                        │  notes.md        │
│                                                        │  api-plan.md     │
│                                                        │                  │
│                                                        │  ┌────────────┐  │
│                                                        │  │ todos.md   │  │
│                                                        │  │ preview…   │  │
│                                                        │  └────────────┘  │
├────────────────────────────────────────────────────────┴──────────────────┤
│ [orchestrator driving]                          Ctrl-K  command palette  │
└───────────────────────────────────────────────────────────────────────────┘

Top tab bar: one per top-level session. + opens the palette pre-filtered to "Spawn…" entries.
Main area: the focused pane's PTY, rendered identically to viewing it in a regular terminal. The focused pane is either the orchestrator (root of the active session's tree) or one of its children, whichever the user last selected from the sidebar.
Right rail, top half — session tree: the active session's process hierarchy, drawn as an indented tree with box-drawing connectors (├─, └─). The orchestrator is the root (▶); each child appears one level deeper with a status glyph (◉ running, ✓ exited cleanly, ✗ errored). Selecting an entry (palette, arrow keys, or click) makes it the focused pane. v1 only has two levels because of the §8 two-level-tree rule, but the renderer should be tree-shaped from day one so a future depth bump doesn't require UI surgery.
Right rail, bottom half: scratchpad list and a preview of the selected scratchpad.
Status line: input-ownership toast ("orchestrator driving" / "you have control") on the left, palette hint on the right.

Empty state: Until the user spawns their first preset, the top tab bar, main area, and sidebar all sit empty with a centred hint ("Press Ctrl-K to spawn an agent or process"). No "default session" is created.

Switching: Clicking a top tab (or selecting one via the palette) switches the active session — the sidebar tree swaps to that session's hierarchy. Clicking a sidebar entry switches the focused pane within the current session.

Command palette (v1 input model):

Almost all application functions are driven through a single command palette opened with Ctrl-K. The palette is a fuzzy-searchable list of commands, scoped to whatever makes sense for the current focus. Two kinds of entries appear:

Built-in commands — "Switch to session…", "Focus pane…", "Take input control", "Release control to orchestrator", "Open scratchpad…", "Kill child…", "Quit", etc.
Preset commands — one entry per file under $XDG_CONFIG_HOME/patterm/presets/. Agent presets surface as "Spawn agent: codex" / "Spawn agent: claude" / …; process presets surface as "Run process: bun run dev" / "Run process: vitest" / …. The label comes from the preset's name field; the action is "launch this preset into a new pane."

Selecting a preset either launches it immediately (no required args) or opens a sub-palette for optional args — namely an initial prompt (agent presets only), which patterm injects into the spawned PTY's input after the agent is ready (§8). The orchestrator equivalent of this — spawn_agent / spawn_process MCP tools — uses the exact same machinery: pick a preset by name, optionally supply an initial prompt, patterm handles the rest.

Rationale: the keybinding surface for sessions + children + scratchpads + control transfer + spawning gets large fast. A palette lets us ship the full feature set without committing to a key map yet, and gives the user a discoverable index of every action. Dedicated keybindings can be layered on top later for the few actions a user does often enough to memorize — they should be configured by binding to palette command IDs, not by re-implementing the action.

Only two keybindings are reserved at the application level in v1:

Action	Binding
Open command palette	`Ctrl-K`
Pass-through prefix (everything else after this goes to the focused PTY untouched, e.g. for nested tmux/Ctrl-K-using TUIs)	`Ctrl-K Ctrl-K`

Everything else — session switching, child cycling, control transfer, quitting — lives in the palette for v1.

5. PTY layer

One PTY per session orchestrator and one per child. For each PTY the tool maintains:

The underlying process (pid, status, exit code on death).
A raw byte ring buffer (default 1 MiB) for stream-mode reads.
A vt-emulated character grid representing current visible state.
Alt-screen flag (whether the process is in alternate-buffer mode, i.e. a TUI).
Last-write timestamp (used for the idle heuristic).

Terminal emulator: Go has limited options. Start with vt10x or a maintained fork. Budget real time — this is the load-bearing component for grid mode read_output. The emulator must handle: SGR colours (then strip them on read), cursor movement, alt-screen entry/exit, scroll regions, basic mouse passthrough where needed.

Resize: On startup and on SIGWINCH, the tool reads its own terminal dimensions, computes per-pane winsize (accounting for tab bar, sidebar, status line), and ioctl(TIOCSWINSZ) each PTY. Children get SIGWINCH automatically. One process, one viewport — no multi-client resize negotiation.

6. Input ownership

Each pane has an owner flag: user or orchestrator. A toast / status-line glyph reflects current owner.

When the orchestrator spawns a child, that child defaults to orchestrator-owned.
When the user focuses a pane and presses any key, ownership flips to user. The orchestrator can still write — bytes interleave. A warning toast appears: "Orchestrator is also driving this pane."
The user explicitly returns ownership with the release key.

No locking. The user's call if they collide. The visual indicator is the only protection.

7. MCP tool surface

The tool embeds an MCP server in-process. Each spawned agent gets an MCP config injected at spawn time (see §10) pointing at a stdio proxy subcommand of the same binary, which forwards JSON-RPC over the per-PID unix socket to the running process. Tool calls carry an implicit caller identity (which session / which process) derived from the connection.

Concepts shared by all tools

Process IDs. Every spawnable thing — agents, terminals, commands — is addressed by an opaque short token (e.g. p_a1b2c3), not by OS PID. IDs are stable for the lifetime of the entry: they survive stop/restart for stored command entries; they are released when an agent or terminal exits and is close_process'd. Each entry also has a human-readable display name (default <kind>-<n>, settable via rename_process or the name arg on spawn).
Process kinds.
- agent — a vendor LLM CLI launched from an agent preset (§10). MCP-wired. Ephemeral: lost when the underlying PTY exits.
- terminal — a bare interactive shell. Defaults to $SHELL -i. Ephemeral.
- command — a process preset (e.g. bun run dev, vitest --watch) or freeform argv. Session-persistent: a command entry survives PTY exit so it can be restart_process'd, and is removed only when close_process is called or patterm exits.
Trust gating. Command entries that were authored as presets are not trusted by default. The first time an agent attempts to spawn_process(kind: 'command', preset: …), start_process, or restart_process against an untrusted command preset, the call returns a needs_trust error and patterm surfaces a UI confirmation in the focused tab. Once the user confirms, the trust grant is persisted to disk ($XDG_DATA_HOME/patterm/projects/<key>/trust.json, see §3), so the user only confirms each preset once per project — not once per patterm run. Trust is keyed by (project, preset name) in v1; content-hashed trust (re-confirming on edit) is a v2 question (§14). Freeform-argv command entries are trusted implicitly at spawn time because the agent already had to compose the argv, and they are not written to the trust file.
Caller role. Every connection has a role: orchestrator (root of a session tree), sub-agent (an agent spawned by an orchestrator), or process (commands/terminals — these don't talk MCP, but they appear as targets). Role gates which tools the caller may invoke. Calls disallowed by role return a structured error explaining why, so the agent can adapt rather than silently fail.
Idle / readiness. Every PTY-backed entry tracks idle_ms (ms since last write to its master). Tools that read state surface this so callers can decide when a target is "done" without polling raw bytes themselves (§11).

Lifecycle and spawning

`spawn_agent` — orchestrator-only

Args: agent (preset name under presets/agents/), agent_instructions (string — the first turn typed into the agent's TUI after ready), name? (display name).
Behaviour: Launches the agent preset in a new PTY as a child of the calling session. Wires MCP per the preset's injection strategy (§10). Waits for the preset's ready_signal (default: 1s idle), then types agent_instructions into the input box and submits. patterm injects nothing else — the spawned agent learns its role and conventions either from agent_instructions or by calling whoami / help itself.
Returns: { process_id, name }.
Errors: unknown_agent if the preset is missing; role_forbidden if a sub-agent calls it (with a message pointing the caller at its parent or at vendor-native subagent tooling).

`spawn_process`

Args: kind (terminal | command), one of preset (name under presets/processes/) or argv (string array), name?, working_dir? (default project root), env?, shell? (only valid with argv; default false).
- For kind: terminal, argv is optional — defaults to $SHELL -i.
- For kind: command, exactly one of preset or argv must be supplied.
Behaviour: Creates a process entry, attached as a child of the calling agent's session, and starts it. No MCP injection (these aren't agents). command entries are persisted to the session for later restart_process / start_process; terminal entries are ephemeral.
Returns: { process_id, name }.
Errors: needs_trust if kind: command references an untrusted preset.

`start_process`

Args: process_id.
Behaviour: Starts a stored command entry that is currently in stopped or exited state. No-op on a running entry (returns the existing state).
Returns: { process_id, status }.
Errors: not_found, wrong_kind (only command entries are start-able post-creation), needs_trust.

`restart_process`

Args: process_id, signal? (default SIGTERM for the stop phase).
Behaviour: Stops the entry if running (grace window then SIGKILL), then starts it again with the same argv/env/working_dir. Valid for command entries; valid for agent and terminal entries only while their PTY is still live (since they have no stored definition to rehydrate from).
Returns: { process_id, status }.
Errors: not_found, needs_trust (command presets), wrong_kind (trying to restart an exited agent/terminal).

`stop_process`

Args: process_id, signal? (default SIGTERM).
Behaviour: Sends the signal to the entry's PTY, with the standard grace window before SIGKILL.
Returns: { process_id, status }.

`close_process`

Args: process_id.
Behaviour: Removes the entry from the session entirely. If still running, stops it first. Used to clear stored command entries the orchestrator no longer needs, and to clean up exited agent/terminal ghosts from the sidebar.
Returns: ok.

`rename_process`

Args: process_id, name.
Returns: ok. Updates the display name in the sidebar and tab bar.

`select_process`

Args: process_id.
Behaviour: Asks the UI to focus the given pane (switches session tab if needed). Non-blocking, advisory — distinct from request_human_attention, which raises a notification and expects a human decision.
Returns: ok.

Inspection

`list_processes`

Args: kind? (filter by agent | terminal | command).
Returns: Array of { process_id, name, kind, status, parent_process_id, exit_code?, idle_ms?, trusted? } for the caller's session. status ∈ { starting, running, stopped, exited, errored }.

`get_process_status`

Args: process_id.
Returns: { process_id, name, kind, status, parent_process_id, working_dir, argv?, exit_code?, started_at, idle_ms, active_screen: "main" | "alternate", rows, cols, cursor: { x, y }, trusted?, screen_version }.

`get_project_status`

Args: none.
Returns: { project: { path, key }, caller: { process_id, role, name, parent_process_id?, available_tools: [string] }, processes: [<list_processes entry>], scratchpads: [{ name, size, modified_at, revision }] }. Everything an agent needs to orient itself in one call.

`get_process_output`

Args: process_id, mode (grid | stream), since_offset? (stream mode only).
Behaviour: grid returns the current visible pane as plain text, ANSI stripped, with best-effort vendor-chrome trim per preset hints (§10). stream returns ANSI-stripped bytes from since_offset to the current write head.
Returns: { content, mode, new_offset?, active_screen, rows, cols, cursor, idle_ms, status, screen_version }.
Tool-description note (shown to the calling agent): "The grid result is the entire visible pane. You are responsible for locating the response to your last prompt within it. Use search_output if you have a specific marker to find."

`get_process_raw_output`

Args: process_id, since_offset?.
Behaviour: Returns raw bytes from since_offset, escape sequences preserved. Used when the agent needs to inspect control codes (rare).
Returns: { content, new_offset, status }.

`search_output`

Args: process_id, pattern (regex), kind (rendered | raw), limit? (default 20).
Returns: { matches: [{ line_no, text }], truncated: bool }. Searches scrollback (not just the visible grid).

`wait_for_pattern`

Args: process_id, pattern (regex), timeout_seconds, scope? (grid | scrollback, default grid).
Behaviour: Blocks the calling agent until the chosen surface matches the regex, or the timeout expires. Polls at ~50ms.
Returns: { matched: bool, snippet?: string }. Used in the §9 permissions-prompt-clear flow.

`get_process_ports`

Args: process_id.
Returns: { ports: [{ port, url?, first_seen_at }] }. Best-effort: patterm watches the stream for :NNNN and http://… patterns and reports what it has seen. No probing.

I/O

`send_input`

Args: process_id, kind (text | paste | key), and:
- For text: text (string), submit? (default true — appends Enter).
- For paste: text (string). Sent via bracketed paste (\e[200~ … \e[201~) when the target's emulator state indicates support; otherwise falls back to chunked text writes without trailing newline.
- For key: key (one of enter, tab, escape, backspace, ctrl-c, ctrl-d, up, down, left, right, home, end, page-up, page-down, f1…f12). Encoded via the emulator's key-encoding (Kitty keyboard protocol where negotiated, legacy escapes otherwise).
Optional tail: wait_ms? (default 0), tail_mode? (none | stream | grid, default stream when wait_ms > 0). When wait_ms > 0, the call blocks for that many milliseconds after sending and then returns the tail in the chosen mode.
Returns: { ok: true, tail?: { content, mode, new_offset?, active_screen, idle_ms, screen_version } }.

Coordination

`send_message`

Args: target_process_id, message (string).
Behaviour: Delivers a tagged message into the target's PTY. Direction is inferred from the relationship between caller and target:
- parent → child: prepended with [orchestrator] .
- child → parent: prepended with [sub-agent:<caller_name>] .
Returns: ok.
Errors: not_related if the target is neither the caller's parent nor a child of the caller (siblings must route through the parent in v1).

`request_human_attention`

Args: process_id, reason (string).
Behaviour: Notification in the TUI, blinks the sidebar entry, optionally auto-focuses per user setting. The escape hatch when the orchestrator can't safely decide.
Returns: ok.

`timer_wait`

Args: seconds, label?.
Behaviour: Returns a timer_id immediately. After seconds, injects [system] Your timer [<label>] has completed.\n into the caller's pane.
Returns: { timer_id }.

Scratchpads

All scratchpad reads return a revision token (an opaque short hash of the file contents at read time). Writes may optionally supply expected_revision for last-write-wins-with-detection; mismatches return { ok: false, current_revision } without writing, so the caller can re-read and merge.

`scratchpad_list`

Returns: [{ name, size, modified_at, revision }].

`scratchpad_read`

Args: name.
Returns: { content, revision }.

`scratchpad_write`

Args: name, content, expected_revision?.
Returns: { ok: true, revision } | { ok: false, current_revision }.

`scratchpad_append`

Args: name, content.
Returns: { ok: true, revision }. Appends are unconditional — concurrent appends interleave at write time but never lose data.

8. Conversation protocol

patterm does not inject any framing or system-prompt text into spawned agents. Whatever an agent sees in its input is exactly what the user typed or what an orchestrator chose to send. The orchestrator (or the human launching it) is responsible for telling a spawned agent what its role is, what tools it has, and what conventions to expect.

That said, when patterm relays messages programmatically between agents or surfaces lifecycle events, it tags them so the receiving agent can distinguish sources. These tags are the patterm convention; agents will encounter them in their input and are expected to recognize them from context (or because their parent explained them in the initial prompt).

[orchestrator] <msg> — prepended when send_message delivers a message from a parent to a child.
[sub-agent:<name>] <msg> — prepended when send_message delivers a message from a child to its parent.
[system] <msg> — patterm itself (timer fires, child exited, etc.).
Direct user typing is not prefixed. The user sees the pane and types normally; the agent receives the keystrokes as-is.

Agents that weren't briefed by their parent can self-discover their role, parent, project, and the tag conventions by calling whoami and help('conventions') (§7). This is the supported substitute for the SPEC having no system-prompt injection — the conventions live in the tool surface, not in an injected preamble.

No "ready" handshake. patterm treats the agent as ready once its PTY hits the preset's ready_signal (default: 1s idle after launch — see §10). The very first thing the agent receives after that point is whatever the caller passed as agent_instructions.

Two-level tree only. Sub-agents cannot call spawn_agent — the call returns a role_forbidden error that explains the rule and points at vendor-native subagent tooling.

9. Permissions flow

Sub-agents are launched with vendor permissions on — the orchestrator drives their confirmation prompts.

Loop:

Orchestrator sends a message to a sub-agent via send_message.
Sub-agent runs, eventually hits a tool-use confirmation in its TUI ("Allow Bash(rm -rf foo)? [y/N]").
Sub-agent goes idle (cursor stops animating, no byte writes for 1s — exposed as idle_ms on get_process_status / list_processes).
Orchestrator's loop calls get_process_output(process_id, mode="grid"), sees the prompt, decides, and calls send_input(process_id, kind="key", key="y") or "n" (or kind="text" with text="y", submit=true).
If the orchestrator can't safely decide, it calls request_human_attention(process_id, "Sub-agent wants to run X, looks destructive, need your call"). The orchestrator then waits (using wait_for_pattern or repeated reads) until the prompt is no longer on screen.

Risks acknowledged: the orchestrator's reading of the prompt is a vision/parsing problem on rendered text. We trust a SOTA model to handle this correctly. The request_human_attention punt is the safety valve.

10. Presets

Presets are user-editable JSON files that describe how to launch something. patterm itself has no hard-coded agent or process types — every spawnable thing is a preset. Two flavours:

Agent presets

$XDG_CONFIG_HOME/patterm/presets/agents/<name>.json. Launches a vendor LLM CLI with MCP wired up and the conversation-protocol addendum injected.

Field	Purpose
`name`	Display name shown in the palette (e.g. "claude", "codex haiku", "opencode-experimental")
`argv`	Full launch argv (e.g. `["claude"]`, `["codex", "--no-tui-banner"]`)
`env`	Env vars to set (merged over inherited env)
`working_dir`	Defaults to the project root
`mcp_injection`	How to point this CLI at patterm's stdio proxy. One of: `{ "kind": "flag", "flag": "--mcp-config", "config_path": "..." }`, `{ "kind": "config_file", "path": "~/.codex/config.toml", "merge_key": "mcp_servers" }`, `{ "kind": "env_var", "var": "MCP_CONFIG_PATH" }`
`ready_signal`	How to detect the TUI is ready (default: 1s idle after launch). Override per-CLI if needed.
`chrome_trim_hints`	Optional regexes / row ranges for stripping vendor chrome in grid reads

Default presets shipped: claude, codex, opencode. Authoring these is per-vendor research — each CLI has its own MCP config conventions, ready states, and TUI chrome. Users can copy and edit them, or add new ones (e.g. a second claude preset that launches with a specific model or system prompt file).

MCP config flow: at startup, for each agent preset, patterm renders a small JSON pointing at its own mcp-stdio proxy subcommand (patterm mcp-stdio --socket <pid-sock> --identity <token>) into a per-preset temp file. The launch then uses the preset's mcp_injection strategy to hand that path to the CLI. The user's global vendor config is never mutated.

Process presets

$XDG_CONFIG_HOME/patterm/presets/processes/<name>.json. Launches a raw command in a PTY — no MCP, no addendum, no system prompt.

Field	Purpose
`name`	Display name shown in the palette (e.g. "bun run dev")
`argv`	Launch argv (e.g. `["bun", "run", "dev"]`)
`shell`	If `true`, argv is interpreted via `sh -lc`. Default `false`.
`env`	Env vars to set
`working_dir`	Defaults to the project root

Process presets are intentionally thin: they're shortcuts for commands the user runs often. Anything more exotic — pipelines, redirections — uses shell: true, or the orchestrator can call spawn_process with freeform argv.

11. Done-signal heuristic

A pane is considered "idle" when no bytes have been written to its PTY's master end for 1000 ms.

Rationale: every supported vendor TUI animates a spinner while busy (during LLM streaming and during tool execution). A genuinely idle pane stops animating.

Caveats and mitigations:

LLM provider hiccups can cause >1s gaps mid-stream. Per-agent tuning of the idle threshold is allowed in the preset.
Orchestrators should treat idle as a signal to read, not as a guarantee of completion. If the read returns something ambiguous, they can wait_for_pattern with a known terminal marker (e.g. the agent's input prompt) for stronger evidence.
The tool exposes idle state via list_processes / get_process_status so orchestrators don't need to poll byte streams directly.

12. Failure modes

Failure	Behaviour
Sub-agent process exits unexpectedly	Sidebar marks child as exited, exit code preserved. Orchestrator's next `get_process_output` returns final grid + exit metadata.
Vendor CLI hangs without exiting	Looks idle. Orchestrator must use `wait_for_pattern` or `request_human_attention` to escape.
Tool process crashes	All PTYs are children of the tool's process group; OS cleans them up (process-group SIGHUP on terminal close, PTY master close, parent-death signal on Linux). On macOS treat cleanup as best-effort; scratchpads on disk survive.
User closes the terminal window / SSH drops	Process receives SIGHUP, cascades SIGTERM → SIGKILL to every child, exits. Everything inside the tool dies with it. This is the intended model.
Disk full on scratchpad write	Tool returns error to caller.
LLM provider network blip	Pane idles, may trigger false "done" — orchestrator should sanity-check responses.
User kills the orchestrator pane	Tool detects PTY close, cascades SIGTERM to that session's children.
Concurrent input	Bytes interleave on PTY stdin. Toast warns. User's call.
Vt emulator bug on exotic ANSI	Grid rendering corrupts for that pane. Orchestrator's read will be noisy; degrade gracefully, don't crash.

13. Out of scope for v1

Cross-project orchestration.
Sub-agents spawning sub-agents (trees deeper than 2).
Daemonized / detachable sessions surviving the terminal window. The tool is intentionally bound to the user's foreground process.
Multi-client attach to a single session.
Native ACP support (PTY scraping only).
Hosting any LLM internally.
Auth beyond OS-level file permissions on the IPC socket and state dir.
Web / API control surface.
Recording / replay of sessions.

14. Open questions

Vt emulator library. Resolved in the closing note — libghostty-vt is the bet, with vt10x / charmbracelet/x/vt as fallback only.
MCP transport. Resolved — in-process MCP core with a mcp-stdio proxy subcommand for spawned children (see §7 and §10). Streamable HTTP can be added later.
Scratchpad concurrency. Resolved — scratchpad_read / scratchpad_write carry an opaque revision token; writes may supply expected_revision for optimistic last-write-wins (see §7). Appends are unconditional.
Cross-restart trust persistence for command presets. Resolved — trust state is persisted to disk (see §3) so the user doesn't re-confirm every patterm run. Open: whether trust should be tied to the preset contents (hash) so editing a trusted preset re-triggers confirmation. v1 keys trust by preset name; v2 may upgrade to content-hashed trust.
Default presets that ship in the box. claude / codex / opencode is the working set; trimming to two for the first cut is fine since presets are user-editable anyway.
Per-project preset overrides. v1 has a single global preset directory. Whether ./.patterm/presets/ should override per-project is a v2 question.

15. Suggested build order

Single-process skeleton: TUI bootstraps, owns the terminal, handles SIGWINCH / SIGHUP / SIGTERM, exits cleanly.
Single PTY per session + vt emulator + tab bar UI + basic input/render.
Multi-session, multi-child (sidebar) with raw process spawning, process groups, kill cascade on exit (no MCP yet).
In-process MCP server + mcp-stdio proxy subcommand + per-PID unix socket + spawn_process / get_process_output / send_input / stop_process / wait_for_pattern / list_processes / whoami / help.
spawn_agent for one agent (probably claude), conversation tag conventions, agent_instructions injection (typed into the TUI input after ready).
Scratchpads (with revisions), timer_wait, request_human_attention, send_message.
Second and third agent presets, chrome-trim heuristics.
Polish: command palette, status indicators, error UX.

Yes — use libghostty-vt for the terminal emulation layer. Not full Ghostty, and not as a UI renderer. Use it as a headless VT state machine inside the tool process, wrapped behind your own Go interface.

libghostty-vt is basically aimed at exactly your load-bearing problem: it is a C library extracted from Ghostty that handles VT parsing, terminal state, scrollback, line wrapping, resize reflow, input event encoding, and related terminal internals. The current docs also warn that the API is still unstable, so this should be a pinned dependency, not something you casually track at HEAD. (libghostty.tip.ghostty.org)

The right move is:

type Emulator interface {
    WritePTYOutput([]byte)
    Resize(cols, rows uint16)
    PlainText() string
    Cell(x, y int) Cell
    Cursor() Cursor
    ActiveScreen() Screen
}

Then implement GhosttyEmulator behind that. Keep vt10x or charmbracelet/x/vt as experimental/fallback only. vt10x is pure Go and convenient, but its own package docs describe it as “in development”; Charm’s x repo is explicitly experimental with no backwards-compatibility promise. For this project, terminal fidelity is not a nice-to-have; it is the product. (Go Packages)

The best part: libghostty-vt already has formatter support for producing plain text from the active screen, which maps cleanly to your read_output(mode="grid"); it also exposes key and mouse encoding, which matters once you stop only typing ASCII strings and start needing arrows, Ctrl-C, Tab, Escape, mouse passthrough, and Kitty keyboard protocol support. (libghostty.tip.ghostty.org)

The catch: cgo/build packaging becomes real. Pin a commit, vendor or checksum the library, and put all C ABI calls in one internal package. Do not scatter cgo across the codebase.

Big spec changes I’d make before building:

First, change MCP transport strategy. Implement the in-process MCP core once, then expose it via a tiny stdio proxy subcommand:

patterm mcp-stdio --socket "$SOCK" --identity "$TOKEN"

Each spawned agent gets an MCP config pointing at that command. The vendor CLI thinks it is launching a normal stdio MCP server; the proxy forwards JSON-RPC to the running tool process over its per-PID Unix socket. This avoids relying on every CLI supporting HTTP over Unix sockets, gives you clean per-agent identity, and keeps the tool process as the single owner of state.

Still support Streamable HTTP later, but stdio-proxy-first is more robust for local CLIs. MCP currently defines stdio and Streamable HTTP as standard transports, and Claude Code, Codex, and OpenCode all expose MCP configuration paths that can work with local or HTTP-style servers. (Model Context Protocol)

Second, remove the generic MCP_CONFIG_PATH assumption. Each preset needs real vendor-specific MCP config handling. Claude Code supports --mcp-config and --strict-mcp-config. (Claude) Codex config uses ~/.codex/config.toml / project .codex/config.toml, with mcp_servers.<id>.command for stdio and mcp_servers.<id>.url for HTTP. (OpenAI Developers) OpenCode exposes MCP through its mcp config option and opencode mcp add, so that preset needs its own path too. (OpenCode)

Third, add a child-to-parent MCP tool. Your conversation protocol mentions [sub-agent:<name>] messages reporting back, but the tool surface does not currently include a way for a sub-agent to send one. Add:

report_to_parent(message: string) -> ok

Then the tool injects:

[sub-agent:codex-2] <message>

into the parent orchestrator pane. Without this, the orchestrator has to scrape the child forever, which is workable but worse.

Fourth, change spawn_process(command: string) to an argv form:

{
  "argv": ["bun", "run", "dev"],
  "working_dir": ".",
  "env": {},
  "shell": false
}

Let agents explicitly request shell mode:

{
  "argv": ["sh", "-lc", "bun run dev | tee /tmp/dev.log"],
  "shell": true
}

A raw command string is quoting hell and makes policy inspection harder.

Fifth, make permission handling more conservative. The orchestrator reading a rendered confirmation prompt is useful, but it is not a safety boundary. A malicious repo or child process can print misleading prompt-like text. Default policy should be: auto-answer only boring, allowlisted prompts; punt writes, deletes, network exfiltration, credential access, sudo, package install scripts, and broad shell commands to the human. OpenCode’s own docs say operations are allowed by default unless permissions are configured, so per-agent recipe permissions need to be deliberate rather than assumed safe. (OpenCode)

Sixth, child cleanup on tool exit must be real. There is no daemon to keep PTYs alive — but the OS will not magically reap children either. Put every spawned PTY in the tool's process group (or a dedicated sub-group), set Linux PR_SET_PDEATHSIG on children, close PTY masters on exit, and install a SIGHUP/SIGTERM handler that runs the SIGTERM→grace→SIGKILL cascade before the process actually exits. On macOS, parent-death signals don't exist; rely on process-group SIGHUP and PTY master close, and treat any straggler cleanup as best-effort. A stale-process sweep on next startup is unnecessary now that there is no daemon to outlive its children.

Seventh, revise send_input. Text plus append_newline is too weak. You need:

{
  "kind": "text" | "paste" | "key",
  "text": "...",
  "key": "enter|tab|escape|ctrl-c|left|right|up|down",
  "submit": true
}

Use bracketed paste for multi-line prompt injection where the target TUI supports it. Otherwise multi-line prompts can accidentally submit partial content.

Eighth, expose more metadata in read_output. Return row numbers, active screen, cursor position, idle state, process status, and maybe a screen_version.

{
  "content": "...",
  "mode": "grid",
  "active_screen": "alternate",
  "rows": 38,
  "cols": 120,
  "cursor": {"x": 4, "y": 37},
  "idle_ms": 1420,
  "screen_version": 9182,
  "status": "running"
}

Models are better at parsing when you give them stable structure.

For libghostty-vt, the implementation detail that matters most is effects. The docs say VT processing handles terminal state by default, but side-effect sequences such as bell, title changes, device queries, and write-back responses need configured callbacks; those callbacks are synchronous and should not block. Wire at least WRITE_PTY, bell, title, size/query responses, and active-screen tracking early. (libghostty.tip.ghostty.org)

Recommended revised build order:

PTY + libghostty-vt spike before any UI work. Spawn bash, vim, htop, Claude/Codex/OpenCode if installed, feed output into Ghostty, dump plain grid. This either validates the core bet or kills it early.
Single-process TUI with one PTY session. SIGWINCH-driven resize from the tool's own terminal. No MCP yet.
Raw child process spawning, sidebar, process groups, kill cascade on exit/SIGHUP, idle detection.
MCP stdio proxy subcommand and core tools: spawn_process, read_output, send_input, kill, list_children.
One orchestrator preset, probably Claude first because it has useful CLI flags for MCP config. Use --mcp-config and --strict-mcp-config so the user's global Claude config isn't mutated. (Claude)
spawn_agent, report_to_parent, send_message_to, and timer injection.
Scratchpads with revision IDs. Last-write-wins is okay for v1, but return a revision so agents can avoid blind overwrites:

scratchpad_read -> { "content": "...", "revision": "abc123" }
scratchpad_write -> { "content": "...", "expected_revision": "abc123" }

Second and third recipes. Keep recipe files declarative, but expect custom Go code for each vendor.
Chrome trimming heuristics and golden tests using recorded VT byte streams from each supported CLI.

One more practical point: put scratchpads under XDG data, not config. Something like:

$XDG_DATA_HOME/patterm/projects/<key>/scratchpads/

Keep spawn recipes/config under:

$XDG_CONFIG_HOME/patterm/

Scratchpads are user data, not configuration. Not fatal, but fixing it now avoids awkward migration later.

Overall: the concept is buildable, but the hard parts are not MCP or the TUI chrome. The hard parts are terminal fidelity, process lifecycle, vendor recipe drift, and permission safety. libghostty-vt is the right core bet, provided you isolate it behind an interface and treat its unstable API as a vendored implementation detail.

45 KiB Raw Blame History Unescape Escape

patterm — v1 Spec

1. Overview

2. Architecture and lifecycle

3. Project state layout

4. UI / Client

5. PTY layer

6. Input ownership

7. MCP tool surface

Concepts shared by all tools

Lifecycle and spawning

spawn_agent — orchestrator-only

spawn_process

start_process

restart_process

stop_process

close_process

rename_process

select_process

Inspection

list_processes

get_process_status

get_project_status

get_process_output

get_process_raw_output

search_output

wait_for_pattern

get_process_ports

I/O

send_input