diff --git a/SPEC.md b/SPEC.md index fbf7ca3..462e11e 100644 --- a/SPEC.md +++ b/SPEC.md @@ -46,6 +46,7 @@ $XDG_DATA_HOME/patterm/ └── projects/ └── / ├── meta.json # project path, last-opened, version + ├── trust.json # persisted command-preset trust grants (§7) └── scratchpads/ ├── notes.md ├── todos.md @@ -71,7 +72,7 @@ Project key = `sha256(realpath(project_dir))[:16]`. Used only as a scratchpad di Internal MCP socket (for spawned children to talk to the running process): `$XDG_RUNTIME_DIR/patterm/.sock`, falling back to `/tmp/patterm-.sock` if `XDG_RUNTIME_DIR` is unset. Created on startup, removed on exit. Per-PID, not per-project — it is a private IPC channel, not a discovery point. -Scratchpads persist across runs. Sessions and child processes do not. +Scratchpads and command-preset trust grants persist across runs. Sessions and child processes do not — every patterm run starts with an empty process tree. --- @@ -165,80 +166,163 @@ No locking. The user's call if they collide. The visual indicator is the only pr ## 7. MCP tool surface -The tool embeds an MCP server in-process. Each spawned agent gets an MCP config injected at spawn time (see §10) pointing at a stdio proxy subcommand of the same binary, which forwards JSON-RPC over the per-PID unix socket to the running process. Tool calls carry an implicit caller identity (which session / which child) derived from the connection. +The tool embeds an MCP server in-process. Each spawned agent gets an MCP config injected at spawn time (see §10) pointing at a stdio proxy subcommand of the same binary, which forwards JSON-RPC over the per-PID unix socket to the running process. Tool calls carry an implicit caller identity (which session / which process) derived from the connection. -### Tools available to orchestrators only +### Concepts shared by all tools -#### `spawn_agent` -- **Args:** `preset` (string — name of an agent preset under `$XDG_CONFIG_HOME/patterm/presets/agents/`), `initial_prompt` (string), `name?` (display name, defaults to `-`) -- **Behaviour:** Launches the agent preset in a new PTY as a child of the calling session. Wires MCP per the preset's injection strategy (§10). Waits for the preset's ready signal (default: 1s idle). Then types `initial_prompt` into the TUI input box and submits. patterm does not inject any other text — the caller's `initial_prompt` is the agent's first turn. If the caller wants the agent to know about the message-tag conventions (§8), tool availability, or its orchestrator role, the caller must say so in `initial_prompt`. -- **Returns:** `child_id`. -- **Error:** Returns an error if `preset` isn't a known agent preset. patterm has no built-in knowledge of vendor CLIs — everything is preset-driven. +- **Process IDs.** Every spawnable thing — agents, terminals, commands — is addressed by an opaque short token (e.g. `p_a1b2c3`), not by OS PID. IDs are stable for the lifetime of the entry: they survive stop/restart for stored command entries; they are released when an agent or terminal exits and is `close_process`'d. Each entry also has a human-readable display name (default `-`, settable via `rename_process` or the `name` arg on spawn). +- **Process kinds.** + - `agent` — a vendor LLM CLI launched from an agent preset (§10). MCP-wired. Ephemeral: lost when the underlying PTY exits. + - `terminal` — a bare interactive shell. Defaults to `$SHELL -i`. Ephemeral. + - `command` — a process preset (e.g. `bun run dev`, `vitest --watch`) or freeform argv. **Session-persistent**: a command entry survives PTY exit so it can be `restart_process`'d, and is removed only when `close_process` is called or patterm exits. +- **Trust gating.** Command entries that were authored as presets are *not* trusted by default. The first time an agent attempts to `spawn_process(kind: 'command', preset: …)`, `start_process`, or `restart_process` against an untrusted command preset, the call returns a `needs_trust` error and patterm surfaces a UI confirmation in the focused tab. Once the user confirms, the trust grant is **persisted to disk** (`$XDG_DATA_HOME/patterm/projects//trust.json`, see §3), so the user only confirms each preset once per project — not once per patterm run. Trust is keyed by `(project, preset name)` in v1; content-hashed trust (re-confirming on edit) is a v2 question (§14). Freeform-argv command entries are trusted implicitly at spawn time because the agent already had to compose the argv, and they are not written to the trust file. +- **Caller role.** Every connection has a role: `orchestrator` (root of a session tree), `sub-agent` (an agent spawned by an orchestrator), or `process` (commands/terminals — these don't talk MCP, but they appear as targets). Role gates which tools the caller may invoke. Calls disallowed by role return a structured error explaining why, so the agent can adapt rather than silently fail. +- **Idle / readiness.** Every PTY-backed entry tracks `idle_ms` (ms since last write to its master). Tools that read state surface this so callers can decide when a target is "done" without polling raw bytes themselves (§11). -#### `send_message_to` -- **Args:** `target` (child_id), `message` (string) -- **Behaviour:** Types `[orchestrator] \n` into the target child's PTY. -- **Returns:** `ok`. +### Lifecycle and spawning -#### `request_human_attention` -- **Args:** `child_id`, `reason` (string) -- **Behaviour:** Surfaces a notification in the TUI, blinks the sidebar entry for the child, optionally auto-focuses if the user setting allows it. Used by orchestrator when it wants to punt a decision (e.g. ambiguous permission prompt) to the human. -- **Returns:** `ok`. - -### Tools available to all agents +#### `spawn_agent` — orchestrator-only +- **Args:** `agent` (preset name under `presets/agents/`), `agent_instructions` (string — the first turn typed into the agent's TUI after ready), `name?` (display name). +- **Behaviour:** Launches the agent preset in a new PTY as a child of the calling session. Wires MCP per the preset's injection strategy (§10). Waits for the preset's `ready_signal` (default: 1s idle), then types `agent_instructions` into the input box and submits. patterm injects nothing else — the spawned agent learns its role and conventions either from `agent_instructions` or by calling `whoami` / `help` itself. +- **Returns:** `{ process_id, name }`. +- **Errors:** `unknown_agent` if the preset is missing; `role_forbidden` if a sub-agent calls it (with a message pointing the caller at its parent or at vendor-native subagent tooling). #### `spawn_process` -- **Args:** One of: - - `preset` (string — name of a process preset under `$XDG_CONFIG_HOME/patterm/presets/processes/`), plus optional `working_dir?` / `env?` overrides; **or** - - `argv` (array of strings — freeform launch), with optional `working_dir?`, `env?`, and `shell?` (default `false`; when `true`, `argv` is interpreted as `["sh", "-lc", argv[0]]`-style). -- **Behaviour:** Launches the command in a new PTY, attached as a child of the calling agent's session. Presets are the preferred path; freeform `argv` is the escape hatch for one-offs the user hasn't pre-configured. No MCP injection (process children aren't agents). -- **Returns:** `child_id`. +- **Args:** `kind` (`terminal` | `command`), one of `preset` (name under `presets/processes/`) or `argv` (string array), `name?`, `working_dir?` (default project root), `env?`, `shell?` (only valid with `argv`; default `false`). + - For `kind: terminal`, `argv` is optional — defaults to `$SHELL -i`. + - For `kind: command`, exactly one of `preset` or `argv` must be supplied. +- **Behaviour:** Creates a process entry, attached as a child of the calling agent's session, and starts it. No MCP injection (these aren't agents). `command` entries are persisted to the session for later `restart_process` / `start_process`; `terminal` entries are ephemeral. +- **Returns:** `{ process_id, name }`. +- **Errors:** `needs_trust` if `kind: command` references an untrusted preset. -#### `read_output` -- **Args:** `child_id`, `mode` (`grid` | `stream`), `since_offset?` (stream mode only) -- **Behaviour:** - - `grid` mode: returns the current rendered visible grid as plain text, ANSI stripped, with best-effort trimming of detectable vendor chrome (top banner, bottom input box, status line) per agent-type heuristics. Use for TUI children. - - `stream` mode: returns raw byte content from `since_offset` to current write head, ANSI stripped. Use for line-mode processes. -- **Returns:** `{ content: string, new_offset: int, mode: "grid" | "stream" }`. -- **Note in tool description (visible to the calling agent):** "The grid result is the entire visible pane. You are responsible for locating the response to your last prompt within it." +#### `start_process` +- **Args:** `process_id`. +- **Behaviour:** Starts a stored `command` entry that is currently in `stopped` or `exited` state. No-op on a running entry (returns the existing state). +- **Returns:** `{ process_id, status }`. +- **Errors:** `not_found`, `wrong_kind` (only command entries are start-able post-creation), `needs_trust`. -#### `send_input` -- **Args:** `child_id`, `input` (string), `append_newline?` (default `true`) -- **Behaviour:** Writes bytes to the child PTY's stdin. Used both for free-form input and for single-key confirmations (`y`, `n`). +#### `restart_process` +- **Args:** `process_id`, `signal?` (default `SIGTERM` for the stop phase). +- **Behaviour:** Stops the entry if running (grace window then SIGKILL), then starts it again with the same argv/env/working_dir. Valid for `command` entries; valid for `agent` and `terminal` entries only while their PTY is still live (since they have no stored definition to rehydrate from). +- **Returns:** `{ process_id, status }`. +- **Errors:** `not_found`, `needs_trust` (command presets), `wrong_kind` (trying to restart an exited agent/terminal). + +#### `stop_process` +- **Args:** `process_id`, `signal?` (default `SIGTERM`). +- **Behaviour:** Sends the signal to the entry's PTY, with the standard grace window before SIGKILL. +- **Returns:** `{ process_id, status }`. + +#### `close_process` +- **Args:** `process_id`. +- **Behaviour:** Removes the entry from the session entirely. If still running, stops it first. Used to clear stored command entries the orchestrator no longer needs, and to clean up exited agent/terminal ghosts from the sidebar. - **Returns:** `ok`. -#### `kill` -- **Args:** `child_id`, `signal?` (default `SIGTERM`) +#### `rename_process` +- **Args:** `process_id`, `name`. +- **Returns:** `ok`. Updates the display name in the sidebar and tab bar. + +#### `select_process` +- **Args:** `process_id`. +- **Behaviour:** Asks the UI to focus the given pane (switches session tab if needed). Non-blocking, advisory — distinct from `request_human_attention`, which raises a notification and expects a human decision. - **Returns:** `ok`. +### Inspection + +#### `list_processes` +- **Args:** `kind?` (filter by `agent` | `terminal` | `command`). +- **Returns:** Array of `{ process_id, name, kind, status, parent_process_id, exit_code?, idle_ms?, trusted? }` for the caller's session. `status ∈ { starting, running, stopped, exited, errored }`. + +#### `get_process_status` +- **Args:** `process_id`. +- **Returns:** `{ process_id, name, kind, status, parent_process_id, working_dir, argv?, exit_code?, started_at, idle_ms, active_screen: "main" | "alternate", rows, cols, cursor: { x, y }, trusted?, screen_version }`. + +#### `get_project_status` +- **Args:** none. +- **Returns:** `{ project: { path, key }, caller: { process_id, role, name, parent_process_id?, available_tools: [string] }, processes: [], scratchpads: [{ name, size, modified_at, revision }] }`. Everything an agent needs to orient itself in one call. + +#### `get_process_output` +- **Args:** `process_id`, `mode` (`grid` | `stream`), `since_offset?` (stream mode only). +- **Behaviour:** `grid` returns the current visible pane as plain text, ANSI stripped, with best-effort vendor-chrome trim per preset hints (§10). `stream` returns ANSI-stripped bytes from `since_offset` to the current write head. +- **Returns:** `{ content, mode, new_offset?, active_screen, rows, cols, cursor, idle_ms, status, screen_version }`. +- **Tool-description note (shown to the calling agent):** "The grid result is the entire visible pane. You are responsible for locating the response to your last prompt within it. Use `search_output` if you have a specific marker to find." + +#### `get_process_raw_output` +- **Args:** `process_id`, `since_offset?`. +- **Behaviour:** Returns raw bytes from `since_offset`, escape sequences preserved. Used when the agent needs to inspect control codes (rare). +- **Returns:** `{ content, new_offset, status }`. + +#### `search_output` +- **Args:** `process_id`, `pattern` (regex), `kind` (`rendered` | `raw`), `limit?` (default 20). +- **Returns:** `{ matches: [{ line_no, text }], truncated: bool }`. Searches scrollback (not just the visible grid). + #### `wait_for_pattern` -- **Args:** `child_id`, `pattern` (regex), `timeout_seconds` -- **Behaviour:** Blocks the calling agent until the rendered grid matches the regex or the timeout expires. Polls the grid at ~50ms intervals. -- **Returns:** `{ matched: bool, snippet?: string }`. +- **Args:** `process_id`, `pattern` (regex), `timeout_seconds`, `scope?` (`grid` | `scrollback`, default `grid`). +- **Behaviour:** Blocks the calling agent until the chosen surface matches the regex, or the timeout expires. Polls at ~50ms. +- **Returns:** `{ matched: bool, snippet?: string }`. Used in the §9 permissions-prompt-clear flow. + +#### `get_process_ports` +- **Args:** `process_id`. +- **Returns:** `{ ports: [{ port, url?, first_seen_at }] }`. Best-effort: patterm watches the stream for `:NNNN` and `http://…` patterns and reports what it has seen. No probing. + +### I/O + +#### `send_input` +- **Args:** `process_id`, `kind` (`text` | `paste` | `key`), and: + - For `text`: `text` (string), `submit?` (default `true` — appends Enter). + - For `paste`: `text` (string). Sent via bracketed paste (`\e[200~ … \e[201~`) when the target's emulator state indicates support; otherwise falls back to chunked text writes without trailing newline. + - For `key`: `key` (one of `enter`, `tab`, `escape`, `backspace`, `ctrl-c`, `ctrl-d`, `up`, `down`, `left`, `right`, `home`, `end`, `page-up`, `page-down`, `f1`…`f12`). Encoded via the emulator's key-encoding (Kitty keyboard protocol where negotiated, legacy escapes otherwise). +- **Optional tail:** `wait_ms?` (default `0`), `tail_mode?` (`none` | `stream` | `grid`, default `stream` when `wait_ms > 0`). When `wait_ms > 0`, the call blocks for that many milliseconds after sending and then returns the tail in the chosen mode. +- **Returns:** `{ ok: true, tail?: { content, mode, new_offset?, active_screen, idle_ms, screen_version } }`. + +### Coordination + +#### `send_message` +- **Args:** `target_process_id`, `message` (string). +- **Behaviour:** Delivers a tagged message into the target's PTY. Direction is inferred from the relationship between caller and target: + - parent → child: prepended with `[orchestrator] `. + - child → parent: prepended with `[sub-agent:] `. +- **Returns:** `ok`. +- **Errors:** `not_related` if the target is neither the caller's parent nor a child of the caller (siblings must route through the parent in v1). + +#### `request_human_attention` +- **Args:** `process_id`, `reason` (string). +- **Behaviour:** Notification in the TUI, blinks the sidebar entry, optionally auto-focuses per user setting. The escape hatch when the orchestrator can't safely decide. +- **Returns:** `ok`. #### `timer_wait` -- **Args:** `seconds`, `label?` (default auto-generated) -- **Behaviour:** Returns immediately with a `timer_id`. After `seconds`, the tool injects `[system] Your timer [