# Perf Audit (auto-generated 2026-05-15)
Findings from a codebase sweep — not user-reported, needs review before
action. Each item names the anchor and a sketched fix.

Baseline benchmark numbers (`go test -bench=. ./internal/app/`, AMD
Ryzen 7 7800X3D):

```
ViewportRenderer_PlainASCII       229 MB/s     1.3 KB/op    6 allocs/op
ViewportRenderer_StyledLines       89 MB/s    91   KB/op  4325 allocs/op
ViewportRenderer_RatatuiBurst      40 MB/s   365   KB/op 17306 allocs/op
RendererThroughput_ReuseInstance   90 MB/s   316   KB/op 17380 allocs/op
ContainsOSC_NoOSC                3050 MB/s     0   B/op     0 allocs/op
```

- [ ] **viewport renderer allocates ~1 alloc per 4 input bytes on SGR/CSI-heavy chunks.** [MEDIUM]
  - `internal/app/viewport_renderer.go` — the styled-lines and
    ratatui benchmarks show 4-17k allocs per chunk. The hot
    contributors are likely (a) `string(vr.buf)` / `string(params)`
    conversions in `emitCSI` for every escape sequence, (b) the
    `pending strings.Builder` resizing as fragments arrive, and (c)
    `vr.shifter.Shift(vr.buf)` returning a fresh slice per CSI.
  - Fix direction: switch CSI param parsing to byte-slice
    comparison (no string conversion); reuse `vr.buf` and
    `vr.pending` backing arrays across `Render` calls by
    pre-growing in `newViewportRenderer`; have `cursorShifter.Shift`
    return into a caller-owned buffer instead of allocating.
    Profile-guided: run the styled-lines bench, point pprof at the
    allocs profile, fix the top three call sites.

- [ ] **viewport renderer throughput (~90 MB/s styled) limits codex steady-state.** [MEDIUM]
  - The styled-lines and ratatui benchmarks come in at 89 MB/s and
    40 MB/s respectively. A 100 KB/s codex burst is far under that
    limit, but a session-resume dump of a 5 MiB chat history takes
    50-130 ms of pure renderer time at those rates — enough to be
    user-visible at the start of a long resume.
  - Fix direction: same as the alloc fix above; once the per-call
    allocation cost drops, the throughput ceiling rises with it.
    Worth re-running the benches after fixing the allocs and only
    investing further if the styled-lines bench is still under
    ~300 MB/s.

- [ ] **Session.Children() allocates a fresh slice on every call.** [MEDIUM]
  - `internal/app/session.go:530-541` walks `s.order` under `s.mu` and
    builds a new `[]*Child` slice every time. Callers on hot paths:
    `drawSidebar` calls it twice per frame
    (`internal/app/sidebar.go:139` and `:171`); `drawTabBar` calls it
    once per frame (`internal/app/tabbar.go:37`); the classifier
    iterates it every 250 ms (`internal/app/classifier.go:38`); and
    palette/navigation hit it on every Ctrl-A/D/W/S keystroke.
  - Fix direction: store the snapshot in an `atomic.Pointer[[]*Child]`
    on `Session`, refresh it under `s.mu` only when `Spawn` / `delete`
    mutates the map. Readers get O(1) `Load()` with zero allocation —
    same pattern already used for `listeners` (session.go:118-123).

- [ ] **wait_for_pattern re-scans the entire stream/grid every iteration.** [MEDIUM]
  - `internal/app/host.go:476-493` (the `check` closure). On `scope =
    "scrollback"` it calls `c.StreamRead(0)` followed by
    `stripANSIBytes(nil, b)` over the entire ring on every wake — a
    full O(ring size) walk per chunk arrival. On `grid` it goes
    through PlainText (one CGO call) plus a regex match against the
    full grid string. For an agent waiting on a marker in a chatty
    pane, every PTY chunk fires `check()`.
  - Fix direction: for `scrollback`, track the offset of the last
    check and run the regex only over the new tail, reusing a
    per-call scratch buffer for ANSI stripping. For `grid`, dedupe
    on `ScreenVersion()` — skip when version hasn't changed.

- [ ] **search_output compiles regex + strips ANSI on every call.** [MEDIUM]
  - `internal/app/host.go:428` compiles a fresh `regexp.Regexp` per
    invocation; `:434` strips ANSI over the entire ring buffer when
    `kind="rendered"`. Agents that poll `search_output` with the same
    pattern (the typical "watch for marker" loop) repay both costs on
    every call.
  - Fix direction: small LRU of compiled regexes keyed by pattern
    string (cap maybe 32) on `toolHost`. Cache the stripped-ANSI
    buffer keyed by `c.ScreenVersion()` so consecutive searches over
    an unchanged ring reuse the strip.

- [ ] **GetProcessOutput grid mode acquires the emulator twice.** [MEDIUM]
  - `internal/app/host.go:375-391` does `em := c.Emulator()` for
    ActiveScreen / Cursor / Size, then at line 387 re-fetches
    `em := c.Emulator()` for PlainText. Each `Emulator()` call goes
    through `ptyMu` and inspects the live PTY pointer. Under a
    chatty agent polling `get_process_output` every 100 ms this is
    a redundant lock and pointer chase per call.
  - Fix direction: hold the emulator reference from the first
    lookup; reuse it for PlainText. The check `if em == nil` still
    runs cleanly because the variable is captured.

- [ ] **FindChildByIdentity is O(N) under the session lock.** [LOW]
  - `internal/app/session.go:553-565` scans the children map looking
    for a matching `Identity` token on every new mcp-stdio
    connection. Not a steady-state hot path — only fires once per
    child spawn — but with many short-lived sub-agents it adds up
    and contends with everyone else taking `s.mu`.
  - Fix direction: maintain an `identityIndex map[string]string`
    (identity → child id) updated alongside spawn / exit, give the
    lookup an O(1) read.

- [ ] **Per-promoter regex matches in the idle classifier.** [LOW]
  - `internal/app/idle.go:175-182` (`matchAny`) walks each compiled
    pattern and runs the DFA over the same 4 KiB tail. A preset with
    five permission patterns + five error patterns is ten DFA
    invocations per child per 250 ms tick.
  - Fix direction: at preset load time, compile each `_patterns`
    list into a single alternation regex (`(?:p1)|(?:p2)|…`). The
    classifier then makes one Match call per category per tick.

- [ ] **Port-detection dedup is O(N²) over c.ports.** [LOW]
  - `internal/app/child.go:461-467`: for each fresh URL match the
    code linearly scans the existing port list. The list rarely
    grows past a handful, but a dev server that lists "all open
    ports" in one log line interacts badly: M new matches × N
    existing entries.
  - Fix direction: keep a `seenPorts map[int]struct{}` next to
    `c.ports`, rebuilt on prune (none today). O(1) per match.

- [ ] **Port-sighting string allocations happen before the dedup check.** [LOW]
  - `internal/app/child.go:455-456` allocates `urlForm` and `portStr`
    before line 461's `seen` walk. Both strings are wasted when the
    port is already in `c.ports`. Inside `c.portsMu` for the whole
    loop body too, blocking the `Ports()` reader path.
  - Fix direction: bind the port int first (cheap parse from
    `m[1]`), do the seen check, only then allocate the URL string
    for the surviving sighting.

- [ ] **classifier `time.Now()` syscall per child per tick.** [LOW]
  - `internal/app/classifier.go:54` (and the `IdleMS` /
    `TitleIdleMS` helpers it transitively calls in
    `internal/app/child.go:343-374`) each call `time.Now()`.
    Reading time on Linux is fast (vDSO) but with N children × 4
    `time.Now()` per tick × 4 ticks/sec it's wasted work that can
    be batched.
  - Fix direction: capture `now := time.Now().UnixNano()` once at
    the top of `classifyAll` and thread it into `classifyOne` and
    the helpers as a parameter.

- [ ] **wait_for_pattern subscribes a listener for every call.** [LOW]
  - `internal/app/host.go:472-474`: each invocation calls
    `Session.Subscribe(wake)` which clones the listener slice and
    swaps the atomic pointer; the `defer Unsubscribe` does the same
    on exit. Two allocations per `wait_for_pattern`. The agent
    pattern of looping on `wait_for_pattern` after every tool call
    pays this churn on the steady-state path.
  - Fix direction: a per-child `chunkBroadcaster` registered once
    at child spawn that hands out lightweight subscriber tokens,
    rather than going through the full session listener machinery.

# On Hold
- [ ] There's a unicode <?> being displayed in opencode [ON HOLD]
  - Investigated 2026-05-14: patterm passes ghostty grapheme codepoints
    through unchanged (vt/ghostty.go:452-462), so the `<?>` glyph is
    most likely the *host* terminal's font fallback for opencode's
    Nerd Font private-use codepoints, not a patterm substitution.
    Need a concrete reproduction (which codepoint, which host
    terminal/font) before changing rendering.
- [ ] After codex rips for like 15 minutes, the terminal becomes quite slow. [ON HOLD / VERIFYING]
  - 2026-05-14: Perf plan P1-P11 landed (see CHANGELOG). Needs a real
    long-running codex session to confirm whether the steady-state
    slowdown is gone or some hotspot remains. Capture a pprof if it
    still feels slow after ≥15 minutes — the structural drivers the
    audit named are all addressed, so a remaining symptom is a new
    one and probably wants fresh profiling.