2 Commits

Author SHA1 Message Date
0c960fa859 Clarify sub-agent reply routing in MCP tool descriptions
A sub-agent's reply to send_message lands in the caller's own pane
tagged [sub-agent:<name>], not in the sub-agent's output. The
descriptions for wait_for_pattern, send_message, both
timer_fire_when_idle_*, and the server-instructions preamble now
spell this out, along with the canonical send_message →
timer_fire_when_idle_any → read-own-pane pattern. help('readiness')
and help('coordination') updated to match. Previously agents reached
for wait_for_pattern on the sub-agent and deadlocked until timeout
because the reply had already been delivered to their own pane.
2026-05-15 16:08:07 +01:00
b05065a601 Sync TODO.md perf-audit review pass
Removed low/marginal items from the original sweep; remaining items
have measured or workflow evidence to justify action.
2026-05-15 16:07:58 +01:00
4 changed files with 98 additions and 148 deletions

View File

@@ -20,6 +20,19 @@ loosely follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
child agent, even though you were still within that thread. child agent, even though you were still within that thread.
### Changed ### Changed
- MCP tool descriptions and `help('coordination')` /
`help('readiness')` now spell out that a sub-agent's reply to
`send_message` lands in the caller's own pane (tagged
`[sub-agent:<name>]`), not in the sub-agent's output. The canonical
wait-for-reply pattern — `send_message``timer_fire_when_idle_any`
on the sub-agent → read your own pane — is now called out on
`send_message`, `wait_for_pattern`, both `timer_fire_when_idle_*`,
the help topics, and the server-instructions preamble every agent
reads at startup. Previously `wait_for_pattern` was the obvious
blocking primitive in the catalog, and agents routinely called it
against the sub-agent for a reply that had already arrived in their
own pane, deadlocking until the wait timed out. No behaviour
changes; descriptions only.
- Agent-initiated `spawn_agent` and `spawn_process` MCP calls no - Agent-initiated `spawn_agent` and `spawn_process` MCP calls no
longer steal viewport focus from the currently active tab. The longer steal viewport focus from the currently active tab. The
new child still appears in the sidebar and tab bar; switch to it new child still appears in the sidebar and tab bar; switch to it

205
TODO.md
View File

@@ -1,6 +1,7 @@
# Perf Audit (auto-generated 2026-05-15) # Perf Audit (reviewed 2026-05-15)
Findings from a codebase sweep — not user-reported, needs review before Findings that survived the 2026-05-15 review pass. Low and marginal
action. Each item names the anchor and a sketched fix. items from the original sweep were removed; remaining items have enough
measured or workflow evidence to justify action.
Baseline benchmark numbers (`go test -bench=. ./internal/app/`, AMD Baseline benchmark numbers (`go test -bench=. ./internal/app/`, AMD
Ryzen 7 7800X3D, libghostty-vt **ReleaseFast** after the Makefile Ryzen 7 7800X3D, libghostty-vt **ReleaseFast** after the Makefile
@@ -27,145 +28,75 @@ Emulator_Write_Stream_8Color_120fps 257 µs/frame 3890 fps_ceiling
Emulator_Write_Stream_TrueColor_120fps 488 µs/frame 2051 fps_ceiling Emulator_Write_Stream_TrueColor_120fps 488 µs/frame 2051 fps_ceiling
``` ```
Result of the fix below: 27-32× pipeline speedup, 60× emulator The current pipeline still has large 120 fps headroom. The remaining
speedup. Pipeline hits 930-2030 fps end-to-end — 7-16× headroom renderer concern is multi-MiB styled replay latency and allocation
over the 120 fps target on the heaviest workload (truecolor churn, not normal steady-state frame budget.
full-screen redraws).
- [ ] **viewport renderer allocates ~1 alloc per 4 input bytes on SGR/CSI-heavy chunks.** [MEDIUM] - [ ] **viewport renderer allocates heavily on SGR/CSI-heavy chunks.** [MEDIUM]
- `internal/app/viewport_renderer.go` — the styled-lines and - Review evidence: five benchmark reps confirmed
ratatui benchmarks show 4-17k allocs per chunk. The hot `ViewportRenderer_StyledLines` at about 4,325 allocs per 16 KiB
contributors are likely (a) `string(vr.buf)` / `string(params)` chunk (~91.5 KB/op, roughly 1 alloc per 3.8 input bytes), and
conversions in `emitCSI` for every escape sequence, (b) the `ViewportRenderer_RatatuiBurst` at about 17,306 allocs per chunk
`pending strings.Builder` resizing as fragments arrive, and (c) (~365 KB/op). A 5 MiB styled resume benchmark allocated about
`vr.shifter.Shift(vr.buf)` returning a fresh slice per CSI. 31 MB across 1.38M objects.
- Fix direction: switch CSI param parsing to byte-slice - Likely hot paths: generic CSI/SGR output in
comparison (no string conversion); reuse `vr.buf` and `internal/app/viewport_renderer.go` sends many sequences through
`vr.pending` backing arrays across `Render` calls by `vr.shifter.Shift(vr.buf)`, while `internal/app/cursorshift.go`
pre-growing in `newViewportRenderer`; have `cursorShifter.Shift` returns a fresh `[]byte` via `pending.String()` on every
return into a caller-owned buffer instead of allocating. `Shift` call and parses CSI params through `string(raw)` /
Profile-guided: run the styled-lines bench, point pprof at the `strings.Split`. The mode-helper `string(params)` conversions
allocs profile, fix the top three call sites. are real, but probably not the main SGR-heavy cost.
- Fix direction: make `cursorShifter` write into caller-owned
scratch output or directly into the viewport renderer's pending
builder; parse CSI params from byte slices; pre-grow/reuse
renderer and shifter buffers. Re-run styled-lines, ratatui, and
5 MiB resume benchmarks; use pprof when available to confirm the
top allocation sites.
- [ ] **viewport renderer throughput (~90 MB/s styled) limits codex steady-state.** [MEDIUM] - [ ] **large styled resume/replay dumps spend visible time in viewport rendering.** [MEDIUM]
- The styled-lines and ratatui benchmarks come in at 89 MB/s and - Review evidence: `BenchmarkSessionResume_5MiBStyled` measured
40 MB/s respectively. A 100 KB/s codex burst is far under that about 58 ms median and 63 ms p95 over five reps. The plain 5 MiB
limit, but a session-resume dump of a 5 MiB chat history takes benchmark was about 23-24 ms with only 21 allocs. The live path
50-130 ms of pure renderer time at those rates — enough to be renders focused PTY chunks through `renderer.Render`, then still
user-visible at the start of a long resume. pays emulator writes, ring writes, event dispatch, stdout writes,
- Fix direction: same as the alloc fix above; once the per-call and real terminal paint.
allocation cost drops, the throughput ceiling rises with it. - Scope: this is not a Codex steady-state throughput limit. A
Worth re-running the benches after fixing the allocs and only 100 KB/s stream is far below the styled renderer's ~80-90 MB/s
investing further if the styled-lines bench is still under ceiling. It matters for multi-MiB burst replay, resume/startup
~300 MB/s. dumps, and dense full-screen churn.
- Fix direction: do the allocation fix first, since it should also
improve throughput. After that, invest further only if styled
resume traces remain user-visible or the styled-lines benchmark
is still under roughly 300 MB/s.
- [ ] **Session.Children() allocates a fresh slice on every call.** [MEDIUM] - [ ] **wait_for_pattern re-scans the entire stream/grid while waiting.** [MEDIUM]
- `internal/app/session.go:530-541` walks `s.order` under `s.mu` and - `internal/app/host.go:476-493` (the `check` closure). On
builds a new `[]*Child` slice every time. Callers on hot paths: `scope="scrollback"` it calls `c.StreamRead(0)` followed by
`drawSidebar` calls it twice per frame `stripANSIBytes(nil, b)`, so each check can copy, strip, and
(`internal/app/sidebar.go:139` and `:171`); `drawTabBar` calls it search the full 1 MiB ring. On `scope="grid"` it calls
once per frame (`internal/app/tabbar.go:37`); the classifier `PlainText()` and runs the regex against the full grid string.
iterates it every 250 ms (`internal/app/classifier.go:38`); and - Caveat from review: the current chunk notifier coalesces bursts
palette/navigation hit it on every Ctrl-A/D/W/S keystroke. with a buffered channel and has a 500 ms fallback, so this is not
- Fix direction: store the snapshot in an `atomic.Pointer[[]*Child]` necessarily one full scan per PTY chunk. It is still meaningful
on `Session`, refresh it under `s.mu` only when `Spawn` / `delete` for active waits on chatty panes.
mutates the map. Readers get O(1) `Load()` with zero allocation — - Fix direction: for `scrollback`, track the last checked stream
same pattern already used for `listeners` (session.go:118-123). offset and search only new output plus a bounded overlap/scratch
buffer so matches spanning chunks are not missed. For `grid`,
dedupe on `ScreenVersion()` and skip work when the version has
not changed.
- [ ] **wait_for_pattern re-scans the entire stream/grid every iteration.** [MEDIUM] - [ ] **search_output rebuilds and searches whole scrollback on every call.** [MEDIUM]
- `internal/app/host.go:476-493` (the `check` closure). On `scope = - `internal/app/host.go:428-437` compiles a fresh regex, reads the
"scrollback"` it calls `c.StreamRead(0)` followed by stream from offset 0, strips ANSI for `kind="rendered"`, converts
`stripANSIBytes(nil, b)` over the entire ring on every wake — a the full buffer to a string, and splits it into lines before
full O(ring size) walk per chunk arrival. On `grid` it goes applying `limit`. This is meaningful when agents poll the same
through PlainText (one CGO call) plus a regex match against the pattern; it is low impact for ad hoc searches.
full grid string. For an agent waiting on a marker in a chatty - Fix direction: cache compiled regexes by pattern; cache stripped
pane, every PTY chunk fires `check()`. rendered output by child id and stream end offset; avoid
- Fix direction: for `scrollback`, track the offset of the last `strings.Split` over the whole ring when only the first `limit`
check and run the regex only over the new tail, reusing a matches are needed. Prefer an incremental search shape if this
per-call scratch buffer for ANSI stripping. For `grid`, dedupe becomes the standard "watch for marker" path.
on `ScreenVersion()` — skip when version hasn't changed.
- [ ] **search_output compiles regex + strips ANSI on every call.** [MEDIUM]
- `internal/app/host.go:428` compiles a fresh `regexp.Regexp` per
invocation; `:434` strips ANSI over the entire ring buffer when
`kind="rendered"`. Agents that poll `search_output` with the same
pattern (the typical "watch for marker" loop) repay both costs on
every call.
- Fix direction: small LRU of compiled regexes keyed by pattern
string (cap maybe 32) on `toolHost`. Cache the stripped-ANSI
buffer keyed by `c.ScreenVersion()` so consecutive searches over
an unchanged ring reuse the strip.
- [ ] **GetProcessOutput grid mode acquires the emulator twice.** [MEDIUM]
- `internal/app/host.go:375-391` does `em := c.Emulator()` for
ActiveScreen / Cursor / Size, then at line 387 re-fetches
`em := c.Emulator()` for PlainText. Each `Emulator()` call goes
through `ptyMu` and inspects the live PTY pointer. Under a
chatty agent polling `get_process_output` every 100 ms this is
a redundant lock and pointer chase per call.
- Fix direction: hold the emulator reference from the first
lookup; reuse it for PlainText. The check `if em == nil` still
runs cleanly because the variable is captured.
- [ ] **FindChildByIdentity is O(N) under the session lock.** [LOW]
- `internal/app/session.go:553-565` scans the children map looking
for a matching `Identity` token on every new mcp-stdio
connection. Not a steady-state hot path — only fires once per
child spawn — but with many short-lived sub-agents it adds up
and contends with everyone else taking `s.mu`.
- Fix direction: maintain an `identityIndex map[string]string`
(identity → child id) updated alongside spawn / exit, give the
lookup an O(1) read.
- [ ] **Per-promoter regex matches in the idle classifier.** [LOW]
- `internal/app/idle.go:175-182` (`matchAny`) walks each compiled
pattern and runs the DFA over the same 4 KiB tail. A preset with
five permission patterns + five error patterns is ten DFA
invocations per child per 250 ms tick.
- Fix direction: at preset load time, compile each `_patterns`
list into a single alternation regex (`(?:p1)|(?:p2)|…`). The
classifier then makes one Match call per category per tick.
- [ ] **Port-detection dedup is O(N²) over c.ports.** [LOW]
- `internal/app/child.go:461-467`: for each fresh URL match the
code linearly scans the existing port list. The list rarely
grows past a handful, but a dev server that lists "all open
ports" in one log line interacts badly: M new matches × N
existing entries.
- Fix direction: keep a `seenPorts map[int]struct{}` next to
`c.ports`, rebuilt on prune (none today). O(1) per match.
- [ ] **Port-sighting string allocations happen before the dedup check.** [LOW]
- `internal/app/child.go:455-456` allocates `urlForm` and `portStr`
before line 461's `seen` walk. Both strings are wasted when the
port is already in `c.ports`. Inside `c.portsMu` for the whole
loop body too, blocking the `Ports()` reader path.
- Fix direction: bind the port int first (cheap parse from
`m[1]`), do the seen check, only then allocate the URL string
for the surviving sighting.
- [ ] **classifier `time.Now()` syscall per child per tick.** [LOW]
- `internal/app/classifier.go:54` (and the `IdleMS` /
`TitleIdleMS` helpers it transitively calls in
`internal/app/child.go:343-374`) each call `time.Now()`.
Reading time on Linux is fast (vDSO) but with N children × 4
`time.Now()` per tick × 4 ticks/sec it's wasted work that can
be batched.
- Fix direction: capture `now := time.Now().UnixNano()` once at
the top of `classifyAll` and thread it into `classifyOne` and
the helpers as a parameter.
- [ ] **wait_for_pattern subscribes a listener for every call.** [LOW]
- `internal/app/host.go:472-474`: each invocation calls
`Session.Subscribe(wake)` which clones the listener slice and
swaps the atomic pointer; the `defer Unsubscribe` does the same
on exit. Two allocations per `wait_for_pattern`. The agent
pattern of looping on `wait_for_pattern` after every tool call
pays this churn on the steady-state path.
- Fix direction: a per-child `chunkBroadcaster` registered once
at child spawn that hands out lightweight subscriber tokens,
rather than going through the full session listener machinery.
# On Hold # On Hold
- [ ] There's a unicode <?> being displayed in opencode [ON HOLD] - [ ] There's a unicode <?> being displayed in opencode [ON HOLD]

View File

@@ -1135,8 +1135,9 @@ func helpFor(topic string) mcp.HelpResponse {
case "coordination": case "coordination":
return mcp.HelpResponse{ return mcp.HelpResponse{
Topic: "coordination", Topic: "coordination",
Content: "send_message tags the message with the caller's role (parent → [orchestrator], child → [sub-agent:<name>]). Siblings must route through their parent. request_human_attention raises a UI notification when you can't safely decide.", Content: "send_message tags the message with the caller's role (parent → [orchestrator], child → [sub-agent:<name>]). Siblings must route through their parent. request_human_attention raises a UI notification when you can't safely decide.\n\n" +
RelatedTools: []string{"send_message", "request_human_attention"}, "Reply routing: a sub-agent's reply to your send_message lands in YOUR pane tagged `[sub-agent:<name>]`, not in the sub-agent's output. Anti-pattern: `wait_for_pattern(sub_agent, …)` to wait for a reply — the sub-agent is already idle, its output won't change, and the call spins to timeout. Pattern: send_message → timer_fire_when_idle_any([sub_agent_id], body=\"[system] sub-agent finished\") → when the timer fires, the reply is already queued as your next user turn (or visible via get_process_output on your own pane).",
RelatedTools: []string{"send_message", "request_human_attention", "timer_fire_when_idle_any", "timer_fire_when_idle_all"},
} }
case "scratchpads": case "scratchpads":
return mcp.HelpResponse{ return mcp.HelpResponse{
@@ -1162,8 +1163,13 @@ func helpFor(topic string) mcp.HelpResponse {
case "readiness": case "readiness":
return mcp.HelpResponse{ return mcp.HelpResponse{
Topic: "readiness", Topic: "readiness",
Content: "A pane is 'idle' once nothing has been written to its PTY for ~1s (SPEC §11). Treat idle as a signal to read, not a guarantee of completion. wait_for_pattern lets you wait on a known terminal marker for stronger evidence.", Content: "A pane is 'idle' once nothing has been written to its PTY for ~1s (SPEC §11). Treat idle as a signal to read, not a guarantee of completion.\n\n" +
RelatedTools: []string{"wait_for_pattern", "get_process_status"}, "Waiting for a sub-agent's reply (canonical pattern):\n" +
" 1. send_message(sub_agent_id, request)\n" +
" 2. timer_fire_when_idle_any(watched=[sub_agent_id], body=\"[system] sub-agent done\")\n" +
" 3. When the timer fires you re-enter as a fresh user turn; the sub-agent's reply is already in your own pane tagged `[sub-agent:<name>]` (read via get_process_output on yourself if you need it explicitly).\n\n" +
"wait_for_pattern is for waiting on text a process emits in its OWN output (a shell prompt, a build's \"tests passed\" line). It does NOT see send_message replies, because those land in the caller's pane, not the target's — calling wait_for_pattern on a sub-agent to wait for its reply deadlocks until timeout.",
RelatedTools: []string{"wait_for_pattern", "get_process_status", "timer_fire_when_idle_any", "send_message"},
} }
case "permissions": case "permissions":
return mcp.HelpResponse{ return mcp.HelpResponse{

View File

@@ -43,7 +43,7 @@ var serverInfo = map[string]any{
// up as sub-agents and won't be tied into the patterm lifecycle. // up as sub-agents and won't be tied into the patterm lifecycle.
// //
// Keep this short — clients vary in how much they surface to the LLM. // Keep this short — clients vary in how much they surface to the LLM.
const serverInstructions = "You are already running INSIDE patterm; the `patterm` MCP server is connected over the same stdio MCP transport you use for any other MCP server. Use the MCP tools you see in tools/list — do NOT (a) try to launch `patterm` or `patterm mcp-stdio` yourself, (b) poke the Unix socket through perl / nc / socat / curl, or (c) shell out to `claude` / `codex` / `opencode` to start a peer. Any of those bypasses caller-identity and the new agent will land as a stray top-level tab instead of a child under you. Start with `whoami` for your role and the full tool list, then `help('topics')` for orientation. `spawn_agent` is the only correct way to start a sub-agent; `spawn_process` is for non-LLM commands; `list_processes` / `get_process_output` inspect them; `send_input` / `send_message` drive them. Whatever you spawn is yours to `close_process` when done." const serverInstructions = "You are already running INSIDE patterm; the `patterm` MCP server is connected over the same stdio MCP transport you use for any other MCP server. Use the MCP tools you see in tools/list — do NOT (a) try to launch `patterm` or `patterm mcp-stdio` yourself, (b) poke the Unix socket through perl / nc / socat / curl, or (c) shell out to `claude` / `codex` / `opencode` to start a peer. Any of those bypasses caller-identity and the new agent will land as a stray top-level tab instead of a child under you. Start with `whoami` for your role and the full tool list, then `help('topics')` for orientation. `spawn_agent` is the only correct way to start a sub-agent; `spawn_process` is for non-LLM commands; `list_processes` / `get_process_output` inspect them; `send_input` / `send_message` drive them. Whatever you spawn is yours to `close_process` when done. When you `send_message` a sub-agent, its reply comes back into YOUR pane as `[sub-agent:<name>] …`, not into the sub-agent's output — to wait for it, use `timer_fire_when_idle_any([sub_agent])` and then read your own pane; do NOT `wait_for_pattern` on the sub-agent, that will deadlock until timeout."
// toolDescriptor is the shape returned by `tools/list`. inputSchema is // toolDescriptor is the shape returned by `tools/list`. inputSchema is
// a JSON Schema object — we provide a minimal `{type: "object"}` schema // a JSON Schema object — we provide a minimal `{type: "object"}` schema
@@ -219,7 +219,7 @@ func toolCatalog() []toolDescriptor {
}, },
{ {
Name: "wait_for_pattern", Name: "wait_for_pattern",
Description: "Block until pattern appears in process output or timeout elapses.", Description: "Block until pattern appears in the TARGET process's own output, or timeout elapses. Use this for waiting on text the target itself will emit (a shell prompt, a build's \"tests passed\" line, etc.). Anti-pattern: do NOT use this to wait for a sub-agent's reply to send_message — replies are routed into the CALLER's pane tagged `[sub-agent:<name>]`, not into the sub-agent's output, so this call will spin to timeout. For sub-agent coordination use `timer_fire_when_idle_any` and then read your own pane.",
InputSchema: objectSchema(map[string]any{ InputSchema: objectSchema(map[string]any{
"process_id": stringProp("Target process id."), "process_id": stringProp("Target process id."),
"pattern": stringProp("Regex pattern."), "pattern": stringProp("Regex pattern."),
@@ -249,7 +249,7 @@ func toolCatalog() []toolDescriptor {
}, },
{ {
Name: "send_message", Name: "send_message",
Description: "Deliver a text message to another process as orchestrator-owned input.", Description: "Deliver a text message to another process as orchestrator-owned input. Fire-and-forget: returns immediately, without waiting for the recipient to read or act. If the recipient replies via send_message, that reply arrives in YOUR pane tagged `[sub-agent:<name>]` (child→parent) or `[orchestrator]` (parent→child) — NOT in the recipient's output. To wait for a sub-agent's reply, schedule `timer_fire_when_idle_any([sub_agent_id], body=…)` and then read your own pane when the timer fires. Do not `wait_for_pattern` on the recipient for a reply; it will deadlock.",
InputSchema: objectSchema(map[string]any{ InputSchema: objectSchema(map[string]any{
"target_process_id": stringProp("Recipient process id."), "target_process_id": stringProp("Recipient process id."),
"message": stringProp("Message body."), "message": stringProp("Message body."),
@@ -283,7 +283,7 @@ func toolCatalog() []toolDescriptor {
}, },
{ {
Name: "timer_fire_when_idle_any", Name: "timer_fire_when_idle_any",
Description: "Schedule a timer that fires when any watched process enters idle (already-idle entries excluded), or when max_wait_seconds elapses.", Description: "Canonical way to wait for a sub-agent to finish working: send_message the sub-agent, then schedule this with watched=[sub_agent_id]; when it fires, the reply is already sitting in your own pane tagged `[sub-agent:<name>]`. Schedules a timer that fires when any watched process enters idle (already-idle entries excluded), or when max_wait_seconds elapses.",
InputSchema: objectSchema(map[string]any{ InputSchema: objectSchema(map[string]any{
"watched": arrayOfStringsProp("Process ids to watch."), "watched": arrayOfStringsProp("Process ids to watch."),
"body": stringProp("Message delivered verbatim to the owning agent when the timer fires."), "body": stringProp("Message delivered verbatim to the owning agent when the timer fires."),
@@ -294,7 +294,7 @@ func toolCatalog() []toolDescriptor {
}, },
{ {
Name: "timer_fire_when_idle_all", Name: "timer_fire_when_idle_all",
Description: "Schedule a timer that fires when all watched processes are idle (already-idle entries count as satisfied), or when max_wait_seconds elapses.", Description: "Canonical way to wait for several sub-agents to finish working in parallel: send_message each one, then schedule this with watched=[…ids]; when it fires, each reply is in your own pane tagged `[sub-agent:<name>]`. Schedules a timer that fires when all watched processes are idle (already-idle entries count as satisfied), or when max_wait_seconds elapses.",
InputSchema: objectSchema(map[string]any{ InputSchema: objectSchema(map[string]any{
"watched": arrayOfStringsProp("Process ids to watch."), "watched": arrayOfStringsProp("Process ids to watch."),
"body": stringProp("Message delivered verbatim to the owning agent when the timer fires."), "body": stringProp("Message delivered verbatim to the owning agent when the timer fires."),