harry/patterm

Fork 0

Files

Harry Bayliss 69ef09aac4 Initial patterm project

2026-05-14 13:37:20 +01:00

36 KiB

Raw Blame History

This is a spec for a terminal project I have.

I think we could probably use libghostty for the terminal emulation as it seems the go ecosystem is quite sparse.

patterm — v1 Spec

Working title: patterm. Used throughout this document.

1. Overview

A terminal-based agent orchestration shell. The user opens patterm in a project directory (e.g. ~/Dev/foo). patterm presents a multi-tab TUI where each top tab is a session — a long-running PTY launched from a user-defined preset. Presets come in two flavours: agent presets (e.g. claude, codex, opencode — vendor LLM CLIs with patterm's MCP wired up) and process presets (e.g. bun run dev, vitest --watch — raw commands with no MCP). Each session has a sidebar of children: more presets spawned by that session, again either agents or processes, all PTY-backed. The right rail also surfaces project-scoped scratchpads (markdown files) for human readability.

An MCP server, in-process, exposes tools that let orchestrator agents spawn and drive children, run processes, set timers, message peers, and read/write scratchpads. The orchestrator is a real LLM CLI driving another LLM CLI as if it were a human user — keystroke injection in, rendered-grid scraping out. The orchestrator fully owns the content of what it sends; patterm only handles the plumbing.

Goal: Let one SOTA agent orchestrate other agents of different types (claude → codex, codex → opencode, …) without subagent APIs, while keeping the whole thing steerable and observable by a human at any moment.

Non-goal: Hosting any LLM. patterm only manages CLIs the user already has installed. patterm also doesn't ship hard-coded knowledge of any specific vendor CLI — agent presets are user-editable JSON; the three common ones (claude, codex, opencode) ship as defaults.

2. Architecture and lifecycle

Single foreground process. No daemon, no detach.

The tool is one Go process that owns: the TUI, all PTYs, vt-emulated grids, session state, child state, scratchpad files, and an in-process MCP server. Killing the process kills everything inside it. There is no attach/detach, no project-keyed singleton, no socket-based reattachment.

Lifecycle:

User runs patterm in a project directory.
The process starts the TUI as a blank canvas — no sessions, no children, no scratchpad preview. Just the empty frame with the palette hint in the status line. The in-process MCP server initializes (bound to a per-PID unix socket for spawned children — see §10) and scratchpad metadata is loaded from disk, but nothing is rendered until the user opens a preset.
The user opens the palette (Ctrl-K), selects a preset, and the first session/process is launched. Subsequent sessions and children are spawned the same way (or by orchestrators via MCP).
On exit (Ctrl-D, :quit, terminal window close, SIGTERM, SIGHUP): the process sends SIGTERM to every child PTY with a short grace window, then SIGKILL, then exits. Scratchpads on disk are the only thing that survives.

Multiple invocations: Running patterm twice in the same project starts two independent processes. They share scratchpad files on disk but nothing else. If this turns out to be a footgun in practice, a per-project lockfile can be added later — out of scope for v1.

Implications: Closing the terminal window (or SSH dropping) ends the session and tears down every child. This is the deliberate trade — no orphan daemons, no socket discovery, no stale-state recovery, no multi-client coordination. The user's terminal window is the lifetime boundary.

3. Project state layout

Scratchpads (user data) live under $XDG_DATA_HOME; presets and config live under $XDG_CONFIG_HOME.

$XDG_DATA_HOME/patterm/
└── projects/
    └── <project-key>/
        ├── meta.json          # project path, last-opened, version
        └── scratchpads/
            ├── notes.md
            ├── todos.md
            └── <agent-written>.md

$XDG_CONFIG_HOME/patterm/
├── config.json                # global settings (theme, default keymap, etc.)
└── presets/
    ├── agents/
    │   ├── claude.json        # ships as default
    │   ├── codex.json         # ships as default
    │   ├── opencode.json      # ships as default
    │   └── <user-defined>.json
    └── processes/
        ├── dev.json           # e.g. { "name": "bun run dev", "argv": ["bun", "run", "dev"] }
        ├── test.json
        └── <user-defined>.json

Both preset directories are scanned at startup; every file found becomes a palette entry ("Spawn agent: claude", "Run process: bun run dev", …). Presets are project-agnostic in v1 — the same set is available in every project. Per-project overrides can be added later.

Project key = sha256(realpath(project_dir))[:16]. Used only as a scratchpad directory name — there is no daemon to look up.

Internal MCP socket (for spawned children to talk to the running process): $XDG_RUNTIME_DIR/patterm/<pid>.sock, falling back to /tmp/patterm-<pid>.sock if XDG_RUNTIME_DIR is unset. Created on startup, removed on exit. Per-PID, not per-project — it is a private IPC channel, not a discovery point.

Scratchpads persist across runs. Sessions and child processes do not.

4. UI / Client

┌────────────────────────────────────────────────────────┬──────────────────┐
│ [codex-1] [codex-2] [claude-1]                       + │  Session tree    │
├────────────────────────────────────────────────────────┤  ──────          │
│                                                        │  ▶ codex-1       │
│                                                        │  │               │
│                                                        │  ├─ ◉ claude-2   │
│                                                        │  ├─ ◉ claude-3   │
│              (focused pane's PTY)                      │  ├─ ◉ claude-4   │
│                                                        │  └─ ◉ bun-dev    │
│                                                        │                  │
│                                                        │  Scratchpads     │
│                                                        │  ──────          │
│                                                        │  todos.md        │
│                                                        │  notes.md        │
│                                                        │  api-plan.md     │
│                                                        │                  │
│                                                        │  ┌────────────┐  │
│                                                        │  │ todos.md   │  │
│                                                        │  │ preview…   │  │
│                                                        │  └────────────┘  │
├────────────────────────────────────────────────────────┴──────────────────┤
│ [orchestrator driving]                          Ctrl-K  command palette  │
└───────────────────────────────────────────────────────────────────────────┘

Top tab bar: one per top-level session. + opens the palette pre-filtered to "Spawn…" entries.
Main area: the focused pane's PTY, rendered identically to viewing it in a regular terminal. The focused pane is either the orchestrator (root of the active session's tree) or one of its children, whichever the user last selected from the sidebar.
Right rail, top half — session tree: the active session's process hierarchy, drawn as an indented tree with box-drawing connectors (├─, └─). The orchestrator is the root (▶); each child appears one level deeper with a status glyph (◉ running, ✓ exited cleanly, ✗ errored). Selecting an entry (palette, arrow keys, or click) makes it the focused pane. v1 only has two levels because of the §8 two-level-tree rule, but the renderer should be tree-shaped from day one so a future depth bump doesn't require UI surgery.
Right rail, bottom half: scratchpad list and a preview of the selected scratchpad.
Status line: input-ownership toast ("orchestrator driving" / "you have control") on the left, palette hint on the right.

Empty state: Until the user spawns their first preset, the top tab bar, main area, and sidebar all sit empty with a centred hint ("Press Ctrl-K to spawn an agent or process"). No "default session" is created.

Switching: Clicking a top tab (or selecting one via the palette) switches the active session — the sidebar tree swaps to that session's hierarchy. Clicking a sidebar entry switches the focused pane within the current session.

Command palette (v1 input model):

Almost all application functions are driven through a single command palette opened with Ctrl-K. The palette is a fuzzy-searchable list of commands, scoped to whatever makes sense for the current focus. Two kinds of entries appear:

Built-in commands — "Switch to session…", "Focus pane…", "Take input control", "Release control to orchestrator", "Open scratchpad…", "Kill child…", "Quit", etc.
Preset commands — one entry per file under $XDG_CONFIG_HOME/patterm/presets/. Agent presets surface as "Spawn agent: codex" / "Spawn agent: claude" / …; process presets surface as "Run process: bun run dev" / "Run process: vitest" / …. The label comes from the preset's name field; the action is "launch this preset into a new pane."

Selecting a preset either launches it immediately (no required args) or opens a sub-palette for optional args — namely an initial prompt (agent presets only), which patterm injects into the spawned PTY's input after the agent is ready (§8). The orchestrator equivalent of this — spawn_agent / spawn_process MCP tools — uses the exact same machinery: pick a preset by name, optionally supply an initial prompt, patterm handles the rest.

Rationale: the keybinding surface for sessions + children + scratchpads + control transfer + spawning gets large fast. A palette lets us ship the full feature set without committing to a key map yet, and gives the user a discoverable index of every action. Dedicated keybindings can be layered on top later for the few actions a user does often enough to memorize — they should be configured by binding to palette command IDs, not by re-implementing the action.

Only two keybindings are reserved at the application level in v1:

Action	Binding
Open command palette	`Ctrl-K`
Pass-through prefix (everything else after this goes to the focused PTY untouched, e.g. for nested tmux/Ctrl-K-using TUIs)	`Ctrl-K Ctrl-K`

Everything else — session switching, child cycling, control transfer, quitting — lives in the palette for v1.

5. PTY layer

One PTY per session orchestrator and one per child. For each PTY the tool maintains:

The underlying process (pid, status, exit code on death).
A raw byte ring buffer (default 1 MiB) for stream-mode reads.
A vt-emulated character grid representing current visible state.
Alt-screen flag (whether the process is in alternate-buffer mode, i.e. a TUI).
Last-write timestamp (used for the idle heuristic).

Terminal emulator: Go has limited options. Start with vt10x or a maintained fork. Budget real time — this is the load-bearing component for grid mode read_output. The emulator must handle: SGR colours (then strip them on read), cursor movement, alt-screen entry/exit, scroll regions, basic mouse passthrough where needed.

Resize: On startup and on SIGWINCH, the tool reads its own terminal dimensions, computes per-pane winsize (accounting for tab bar, sidebar, status line), and ioctl(TIOCSWINSZ) each PTY. Children get SIGWINCH automatically. One process, one viewport — no multi-client resize negotiation.

6. Input ownership

Each pane has an owner flag: user or orchestrator. A toast / status-line glyph reflects current owner.

When the orchestrator spawns a child, that child defaults to orchestrator-owned.
When the user focuses a pane and presses any key, ownership flips to user. The orchestrator can still write — bytes interleave. A warning toast appears: "Orchestrator is also driving this pane."
The user explicitly returns ownership with the release key.

No locking. The user's call if they collide. The visual indicator is the only protection.

7. MCP tool surface

The tool embeds an MCP server in-process. Each spawned agent gets an MCP config injected at spawn time (see §10) pointing at a stdio proxy subcommand of the same binary, which forwards JSON-RPC over the per-PID unix socket to the running process. Tool calls carry an implicit caller identity (which session / which child) derived from the connection.

Tools available to orchestrators only

`spawn_agent`

Args: preset (string — name of an agent preset under $XDG_CONFIG_HOME/patterm/presets/agents/), initial_prompt (string), name? (display name, defaults to <preset>-<n>)
Behaviour: Launches the agent preset in a new PTY as a child of the calling session. Wires MCP per the preset's injection strategy (§10). Waits for the preset's ready signal (default: 1s idle). Then types initial_prompt into the TUI input box and submits. patterm does not inject any other text — the caller's initial_prompt is the agent's first turn. If the caller wants the agent to know about the message-tag conventions (§8), tool availability, or its orchestrator role, the caller must say so in initial_prompt.
Returns: child_id.
Error: Returns an error if preset isn't a known agent preset. patterm has no built-in knowledge of vendor CLIs — everything is preset-driven.

`send_message_to`

Args: target (child_id), message (string)
Behaviour: Types [orchestrator] <message>\n into the target child's PTY.
Returns: ok.

`request_human_attention`

Args: child_id, reason (string)
Behaviour: Surfaces a notification in the TUI, blinks the sidebar entry for the child, optionally auto-focuses if the user setting allows it. Used by orchestrator when it wants to punt a decision (e.g. ambiguous permission prompt) to the human.
Returns: ok.

Tools available to all agents

`spawn_process`

Args: One of:
- preset (string — name of a process preset under $XDG_CONFIG_HOME/patterm/presets/processes/), plus optional working_dir? / env? overrides; or
- argv (array of strings — freeform launch), with optional working_dir?, env?, and shell? (default false; when true, argv is interpreted as ["sh", "-lc", argv[0]]-style).
Behaviour: Launches the command in a new PTY, attached as a child of the calling agent's session. Presets are the preferred path; freeform argv is the escape hatch for one-offs the user hasn't pre-configured. No MCP injection (process children aren't agents).
Returns: child_id.

`read_output`

Args: child_id, mode (grid | stream), since_offset? (stream mode only)
Behaviour:
- grid mode: returns the current rendered visible grid as plain text, ANSI stripped, with best-effort trimming of detectable vendor chrome (top banner, bottom input box, status line) per agent-type heuristics. Use for TUI children.
- stream mode: returns raw byte content from since_offset to current write head, ANSI stripped. Use for line-mode processes.
Returns: { content: string, new_offset: int, mode: "grid" | "stream" }.
Note in tool description (visible to the calling agent): "The grid result is the entire visible pane. You are responsible for locating the response to your last prompt within it."

`send_input`

Args: child_id, input (string), append_newline? (default true)
Behaviour: Writes bytes to the child PTY's stdin. Used both for free-form input and for single-key confirmations (y, n).
Returns: ok.

`kill`

Args: child_id, signal? (default SIGTERM)
Returns: ok.

`wait_for_pattern`

Args: child_id, pattern (regex), timeout_seconds
Behaviour: Blocks the calling agent until the rendered grid matches the regex or the timeout expires. Polls the grid at ~50ms intervals.
Returns: { matched: bool, snippet?: string }.

`timer_wait`

Args: seconds, label? (default auto-generated)
Behaviour: Returns immediately with a timer_id. After seconds, the tool injects [system] Your timer [<label>] has completed.\n into the calling agent's pane.
Returns: { timer_id: string }.

`list_children`

Args: none
Returns: Array of { child_id, name, type, status, exit_code? } for the calling agent's session.

`scratchpad_list`

Returns: Array of { name, size, modified_at }.

`scratchpad_read`

Args: name
Returns: { content: string }.

`scratchpad_write`

Args: name, content (full replacement)
Returns: ok.

`scratchpad_append`

Args: name, content
Returns: ok.

8. Conversation protocol

patterm does not inject any framing or system-prompt text into spawned agents. Whatever an agent sees in its input is exactly what the user typed or what an orchestrator chose to send. The orchestrator (or the human launching it) is responsible for telling a spawned agent what its role is, what tools it has, and what conventions to expect.

That said, when patterm relays messages programmatically between agents or surfaces lifecycle events, it tags them so the receiving agent can distinguish sources. These tags are the patterm convention; agents will encounter them in their input and are expected to recognize them from context (or because their parent explained them in the initial prompt).

[orchestrator] <msg> — prepended when send_message_to delivers a message from a parent to a child.
[sub-agent:<name>] <msg> — prepended when report_to_parent delivers a message from a child to its parent.
[system] <msg> — patterm itself (timer fires, child exited, etc.).
Direct user typing is not prefixed. The user sees the pane and types normally; the agent receives the keystrokes as-is.

No "ready" handshake. patterm treats the agent as ready once its PTY hits the preset's ready_signal (default: 1s idle after launch — see §10). The very first thing the agent receives after that point is whatever the caller passed as initial_prompt.

Two-level tree only. Sub-agents cannot call spawn_agent.

9. Permissions flow

Sub-agents are launched with vendor permissions on — the orchestrator drives their confirmation prompts.

Loop:

Orchestrator sends a message to a sub-agent via send_message_to.
Sub-agent runs, eventually hits a tool-use confirmation in its TUI ("Allow Bash(rm -rf foo)? [y/N]").
Sub-agent goes idle (cursor stops animating, no byte writes for 1s).
Orchestrator's loop calls read_output(child_id, mode="grid"), sees the prompt, decides, and calls send_input(child_id, "y") or "n".
If the orchestrator can't safely decide, it calls request_human_attention(child_id, "Sub-agent wants to run X, looks destructive, need your call"). The orchestrator then waits (using wait_for_pattern or repeated reads) until the prompt is no longer on screen.

Risks acknowledged: the orchestrator's reading of the prompt is a vision/parsing problem on rendered text. We trust a SOTA model to handle this correctly. The request_human_attention punt is the safety valve.

10. Presets

Presets are user-editable JSON files that describe how to launch something. patterm itself has no hard-coded agent or process types — every spawnable thing is a preset. Two flavours:

Agent presets

$XDG_CONFIG_HOME/patterm/presets/agents/<name>.json. Launches a vendor LLM CLI with MCP wired up and the conversation-protocol addendum injected.

Field	Purpose
`name`	Display name shown in the palette (e.g. "claude", "codex haiku", "opencode-experimental")
`argv`	Full launch argv (e.g. `["claude"]`, `["codex", "--no-tui-banner"]`)
`env`	Env vars to set (merged over inherited env)
`working_dir`	Defaults to the project root
`mcp_injection`	How to point this CLI at patterm's stdio proxy. One of: `{ "kind": "flag", "flag": "--mcp-config", "config_path": "..." }`, `{ "kind": "config_file", "path": "~/.codex/config.toml", "merge_key": "mcp_servers" }`, `{ "kind": "env_var", "var": "MCP_CONFIG_PATH" }`
`ready_signal`	How to detect the TUI is ready (default: 1s idle after launch). Override per-CLI if needed.
`chrome_trim_hints`	Optional regexes / row ranges for stripping vendor chrome in grid reads

Default presets shipped: claude, codex, opencode. Authoring these is per-vendor research — each CLI has its own MCP config conventions, ready states, and TUI chrome. Users can copy and edit them, or add new ones (e.g. a second claude preset that launches with a specific model or system prompt file).

MCP config flow: at startup, for each agent preset, patterm renders a small JSON pointing at its own mcp-stdio proxy subcommand (patterm mcp-stdio --socket <pid-sock> --identity <token>) into a per-preset temp file. The launch then uses the preset's mcp_injection strategy to hand that path to the CLI. The user's global vendor config is never mutated.

Process presets

$XDG_CONFIG_HOME/patterm/presets/processes/<name>.json. Launches a raw command in a PTY — no MCP, no addendum, no system prompt.

Field	Purpose
`name`	Display name shown in the palette (e.g. "bun run dev")
`argv`	Launch argv (e.g. `["bun", "run", "dev"]`)
`shell`	If `true`, argv is interpreted via `sh -lc`. Default `false`.
`env`	Env vars to set
`working_dir`	Defaults to the project root

Process presets are intentionally thin: they're shortcuts for commands the user runs often. Anything more exotic — pipelines, redirections — uses shell: true, or the orchestrator can call spawn_process with freeform argv.

11. Done-signal heuristic

A pane is considered "idle" when no bytes have been written to its PTY's master end for 1000 ms.

Rationale: every supported vendor TUI animates a spinner while busy (during LLM streaming and during tool execution). A genuinely idle pane stops animating.

Caveats and mitigations:

LLM provider hiccups can cause >1s gaps mid-stream. Per-agent tuning of the idle threshold is allowed in the preset.
Orchestrators should treat idle as a signal to read, not as a guarantee of completion. If the read returns something ambiguous, they can wait_for_pattern with a known terminal marker (e.g. the agent's input prompt) for stronger evidence.
The tool exposes idle state via list_children so orchestrators don't need to poll byte streams directly.

12. Failure modes

Failure	Behaviour
Sub-agent process exits unexpectedly	Sidebar marks child as exited, exit code preserved. Orchestrator's next `read_output` returns final grid + exit metadata.
Vendor CLI hangs without exiting	Looks idle. Orchestrator must use `wait_for_pattern` or `request_human_attention` to escape.
Tool process crashes	All PTYs are children of the tool's process group; OS cleans them up (process-group SIGHUP on terminal close, PTY master close, parent-death signal on Linux). On macOS treat cleanup as best-effort; scratchpads on disk survive.
User closes the terminal window / SSH drops	Process receives SIGHUP, cascades SIGTERM → SIGKILL to every child, exits. Everything inside the tool dies with it. This is the intended model.
Disk full on scratchpad write	Tool returns error to caller.
LLM provider network blip	Pane idles, may trigger false "done" — orchestrator should sanity-check responses.
User kills the orchestrator pane	Tool detects PTY close, cascades SIGTERM to that session's children.
Concurrent input	Bytes interleave on PTY stdin. Toast warns. User's call.
Vt emulator bug on exotic ANSI	Grid rendering corrupts for that pane. Orchestrator's read will be noisy; degrade gracefully, don't crash.

13. Out of scope for v1

Cross-project orchestration.
Sub-agents spawning sub-agents (trees deeper than 2).
Daemonized / detachable sessions surviving the terminal window. The tool is intentionally bound to the user's foreground process.
Multi-client attach to a single session.
Native ACP support (PTY scraping only).
Hosting any LLM internally.
Auth beyond OS-level file permissions on the IPC socket and state dir.
Web / API control surface.
Recording / replay of sessions.

14. Open questions

Vt emulator library. Resolved in the closing note — libghostty-vt is the bet, with vt10x / charmbracelet/x/vt as fallback only.
MCP transport. Resolved — in-process MCP core with a mcp-stdio proxy subcommand for spawned children (see §7 and §10). Streamable HTTP can be added later.
Scratchpad concurrency. Two agents writing the same scratchpad: last-write-wins with a revision token (see addendum item 7 in the closing note). Agents are expected to coordinate.
Default presets that ship in the box. claude / codex / opencode is the working set; trimming to two for the first cut is fine since presets are user-editable anyway.
Per-project preset overrides. v1 has a single global preset directory. Whether ./.patterm/presets/ should override per-project is a v2 question.

15. Suggested build order

Single-process skeleton: TUI bootstraps, owns the terminal, handles SIGWINCH / SIGHUP / SIGTERM, exits cleanly.
Single PTY per session + vt emulator + tab bar UI + basic input/render.
Multi-session, multi-child (sidebar) with raw process spawning, process groups, kill cascade on exit (no MCP yet).
In-process MCP server + mcp-stdio proxy subcommand + per-PID unix socket + spawn_process / read_output / send_input / kill / wait_for_pattern.
spawn_agent preset for one agent (probably claude), conversation tag conventions, initial_prompt injection (typed into the TUI input after ready).
Scratchpads, timer_wait, request_human_attention, send_message_to, report_to_parent.
Second and third agent presets, chrome-trim heuristics.
Polish: command palette, status indicators, error UX.

Yes — use libghostty-vt for the terminal emulation layer. Not full Ghostty, and not as a UI renderer. Use it as a headless VT state machine inside the tool process, wrapped behind your own Go interface.

libghostty-vt is basically aimed at exactly your load-bearing problem: it is a C library extracted from Ghostty that handles VT parsing, terminal state, scrollback, line wrapping, resize reflow, input event encoding, and related terminal internals. The current docs also warn that the API is still unstable, so this should be a pinned dependency, not something you casually track at HEAD. (libghostty.tip.ghostty.org)

The right move is:

type Emulator interface {
    WritePTYOutput([]byte)
    Resize(cols, rows uint16)
    PlainText() string
    Cell(x, y int) Cell
    Cursor() Cursor
    ActiveScreen() Screen
}

Then implement GhosttyEmulator behind that. Keep vt10x or charmbracelet/x/vt as experimental/fallback only. vt10x is pure Go and convenient, but its own package docs describe it as “in development”; Charm’s x repo is explicitly experimental with no backwards-compatibility promise. For this project, terminal fidelity is not a nice-to-have; it is the product. (Go Packages)

The best part: libghostty-vt already has formatter support for producing plain text from the active screen, which maps cleanly to your read_output(mode="grid"); it also exposes key and mouse encoding, which matters once you stop only typing ASCII strings and start needing arrows, Ctrl-C, Tab, Escape, mouse passthrough, and Kitty keyboard protocol support. (libghostty.tip.ghostty.org)

The catch: cgo/build packaging becomes real. Pin a commit, vendor or checksum the library, and put all C ABI calls in one internal package. Do not scatter cgo across the codebase.

Big spec changes I’d make before building:

First, change MCP transport strategy. Implement the in-process MCP core once, then expose it via a tiny stdio proxy subcommand:

patterm mcp-stdio --socket "$SOCK" --identity "$TOKEN"

Each spawned agent gets an MCP config pointing at that command. The vendor CLI thinks it is launching a normal stdio MCP server; the proxy forwards JSON-RPC to the running tool process over its per-PID Unix socket. This avoids relying on every CLI supporting HTTP over Unix sockets, gives you clean per-agent identity, and keeps the tool process as the single owner of state.

Still support Streamable HTTP later, but stdio-proxy-first is more robust for local CLIs. MCP currently defines stdio and Streamable HTTP as standard transports, and Claude Code, Codex, and OpenCode all expose MCP configuration paths that can work with local or HTTP-style servers. (Model Context Protocol)

Second, remove the generic MCP_CONFIG_PATH assumption. Each preset needs real vendor-specific MCP config handling. Claude Code supports --mcp-config and --strict-mcp-config. (Claude) Codex config uses ~/.codex/config.toml / project .codex/config.toml, with mcp_servers.<id>.command for stdio and mcp_servers.<id>.url for HTTP. (OpenAI Developers) OpenCode exposes MCP through its mcp config option and opencode mcp add, so that preset needs its own path too. (OpenCode)

Third, add a child-to-parent MCP tool. Your conversation protocol mentions [sub-agent:<name>] messages reporting back, but the tool surface does not currently include a way for a sub-agent to send one. Add:

report_to_parent(message: string) -> ok

Then the tool injects:

[sub-agent:codex-2] <message>

into the parent orchestrator pane. Without this, the orchestrator has to scrape the child forever, which is workable but worse.

Fourth, change spawn_process(command: string) to an argv form:

{
  "argv": ["bun", "run", "dev"],
  "working_dir": ".",
  "env": {},
  "shell": false
}

Let agents explicitly request shell mode:

{
  "argv": ["sh", "-lc", "bun run dev | tee /tmp/dev.log"],
  "shell": true
}

A raw command string is quoting hell and makes policy inspection harder.

Fifth, make permission handling more conservative. The orchestrator reading a rendered confirmation prompt is useful, but it is not a safety boundary. A malicious repo or child process can print misleading prompt-like text. Default policy should be: auto-answer only boring, allowlisted prompts; punt writes, deletes, network exfiltration, credential access, sudo, package install scripts, and broad shell commands to the human. OpenCode’s own docs say operations are allowed by default unless permissions are configured, so per-agent recipe permissions need to be deliberate rather than assumed safe. (OpenCode)

Sixth, child cleanup on tool exit must be real. There is no daemon to keep PTYs alive — but the OS will not magically reap children either. Put every spawned PTY in the tool's process group (or a dedicated sub-group), set Linux PR_SET_PDEATHSIG on children, close PTY masters on exit, and install a SIGHUP/SIGTERM handler that runs the SIGTERM→grace→SIGKILL cascade before the process actually exits. On macOS, parent-death signals don't exist; rely on process-group SIGHUP and PTY master close, and treat any straggler cleanup as best-effort. A stale-process sweep on next startup is unnecessary now that there is no daemon to outlive its children.

Seventh, revise send_input. Text plus append_newline is too weak. You need:

{
  "kind": "text" | "paste" | "key",
  "text": "...",
  "key": "enter|tab|escape|ctrl-c|left|right|up|down",
  "submit": true
}

Use bracketed paste for multi-line prompt injection where the target TUI supports it. Otherwise multi-line prompts can accidentally submit partial content.

Eighth, expose more metadata in read_output. Return row numbers, active screen, cursor position, idle state, process status, and maybe a screen_version.

{
  "content": "...",
  "mode": "grid",
  "active_screen": "alternate",
  "rows": 38,
  "cols": 120,
  "cursor": {"x": 4, "y": 37},
  "idle_ms": 1420,
  "screen_version": 9182,
  "status": "running"
}

Models are better at parsing when you give them stable structure.

For libghostty-vt, the implementation detail that matters most is effects. The docs say VT processing handles terminal state by default, but side-effect sequences such as bell, title changes, device queries, and write-back responses need configured callbacks; those callbacks are synchronous and should not block. Wire at least WRITE_PTY, bell, title, size/query responses, and active-screen tracking early. (libghostty.tip.ghostty.org)

Recommended revised build order:

PTY + libghostty-vt spike before any UI work. Spawn bash, vim, htop, Claude/Codex/OpenCode if installed, feed output into Ghostty, dump plain grid. This either validates the core bet or kills it early.
Single-process TUI with one PTY session. SIGWINCH-driven resize from the tool's own terminal. No MCP yet.
Raw child process spawning, sidebar, process groups, kill cascade on exit/SIGHUP, idle detection.
MCP stdio proxy subcommand and core tools: spawn_process, read_output, send_input, kill, list_children.
One orchestrator preset, probably Claude first because it has useful CLI flags for MCP config. Use --mcp-config and --strict-mcp-config so the user's global Claude config isn't mutated. (Claude)
spawn_agent, report_to_parent, send_message_to, and timer injection.
Scratchpads with revision IDs. Last-write-wins is okay for v1, but return a revision so agents can avoid blind overwrites:

scratchpad_read -> { "content": "...", "revision": "abc123" }
scratchpad_write -> { "content": "...", "expected_revision": "abc123" }

Second and third recipes. Keep recipe files declarative, but expect custom Go code for each vendor.
Chrome trimming heuristics and golden tests using recorded VT byte streams from each supported CLI.

One more practical point: put scratchpads under XDG data, not config. Something like:

$XDG_DATA_HOME/patterm/projects/<key>/scratchpads/

Keep spawn recipes/config under:

$XDG_CONFIG_HOME/patterm/

Scratchpads are user data, not configuration. Not fatal, but fixing it now avoids awkward migration later.

Overall: the concept is buildable, but the hard parts are not MCP or the TUI chrome. The hard parts are terminal fidelity, process lifecycle, vendor recipe drift, and permission safety. libghostty-vt is the right core bet, provided you isolate it behind an interface and treat its unstable API as a vendored implementation detail.

36 KiB Raw Blame History Unescape Escape