Codebase sweep for perf issues outside the per-PTY-chunk path that recent CHANGELOG work already covered. Ten findings under a new "Perf Audit (auto-generated)" section in TODO.md — anchored to file:line, classified MEDIUM/LOW, with a sketched fix per entry. None landed as code changes; review pending.
7.0 KiB
7.0 KiB
Perf Audit (auto-generated 2026-05-15)
Findings from a codebase sweep — not user-reported, needs review before action. Each item names the anchor and a sketched fix.
-
Session.Children() allocates a fresh slice on every call. [MEDIUM]
internal/app/session.go:530-541walkss.orderunders.muand builds a new[]*Childslice every time. Callers on hot paths:drawSidebarcalls it twice per frame (internal/app/sidebar.go:139and:171);drawTabBarcalls it once per frame (internal/app/tabbar.go:37); the classifier iterates it every 250 ms (internal/app/classifier.go:38); and palette/navigation hit it on every Ctrl-A/D/W/S keystroke.- Fix direction: store the snapshot in an
atomic.Pointer[[]*Child]onSession, refresh it unders.muonly whenSpawn/deletemutates the map. Readers get O(1)Load()with zero allocation — same pattern already used forlisteners(session.go:118-123).
-
wait_for_pattern re-scans the entire stream/grid every iteration. [MEDIUM]
internal/app/host.go:476-493(thecheckclosure). Onscope = "scrollback"it callsc.StreamRead(0)followed bystripANSIBytes(nil, b)over the entire ring on every wake — a full O(ring size) walk per chunk arrival. Ongridit goes through PlainText (one CGO call) plus a regex match against the full grid string. For an agent waiting on a marker in a chatty pane, every PTY chunk firescheck().- Fix direction: for
scrollback, track the offset of the last check and run the regex only over the new tail, reusing a per-call scratch buffer for ANSI stripping. Forgrid, dedupe onScreenVersion()— skip when version hasn't changed.
-
search_output compiles regex + strips ANSI on every call. [MEDIUM]
internal/app/host.go:428compiles a freshregexp.Regexpper invocation;:434strips ANSI over the entire ring buffer whenkind="rendered". Agents that pollsearch_outputwith the same pattern (the typical "watch for marker" loop) repay both costs on every call.- Fix direction: small LRU of compiled regexes keyed by pattern
string (cap maybe 32) on
toolHost. Cache the stripped-ANSI buffer keyed byc.ScreenVersion()so consecutive searches over an unchanged ring reuse the strip.
-
GetProcessOutput grid mode acquires the emulator twice. [MEDIUM]
internal/app/host.go:375-391doesem := c.Emulator()for ActiveScreen / Cursor / Size, then at line 387 re-fetchesem := c.Emulator()for PlainText. EachEmulator()call goes throughptyMuand inspects the live PTY pointer. Under a chatty agent pollingget_process_outputevery 100 ms this is a redundant lock and pointer chase per call.- Fix direction: hold the emulator reference from the first
lookup; reuse it for PlainText. The check
if em == nilstill runs cleanly because the variable is captured.
-
FindChildByIdentity is O(N) under the session lock. [LOW]
internal/app/session.go:553-565scans the children map looking for a matchingIdentitytoken on every new mcp-stdio connection. Not a steady-state hot path — only fires once per child spawn — but with many short-lived sub-agents it adds up and contends with everyone else takings.mu.- Fix direction: maintain an
identityIndex map[string]string(identity → child id) updated alongside spawn / exit, give the lookup an O(1) read.
-
Per-promoter regex matches in the idle classifier. [LOW]
internal/app/idle.go:175-182(matchAny) walks each compiled pattern and runs the DFA over the same 4 KiB tail. A preset with five permission patterns + five error patterns is ten DFA invocations per child per 250 ms tick.- Fix direction: at preset load time, compile each
_patternslist into a single alternation regex ((?:p1)|(?:p2)|…). The classifier then makes one Match call per category per tick.
-
Port-detection dedup is O(N²) over c.ports. [LOW]
internal/app/child.go:461-467: for each fresh URL match the code linearly scans the existing port list. The list rarely grows past a handful, but a dev server that lists "all open ports" in one log line interacts badly: M new matches × N existing entries.- Fix direction: keep a
seenPorts map[int]struct{}next toc.ports, rebuilt on prune (none today). O(1) per match.
-
Port-sighting string allocations happen before the dedup check. [LOW]
internal/app/child.go:455-456allocatesurlFormandportStrbefore line 461'sseenwalk. Both strings are wasted when the port is already inc.ports. Insidec.portsMufor the whole loop body too, blocking thePorts()reader path.- Fix direction: bind the port int first (cheap parse from
m[1]), do the seen check, only then allocate the URL string for the surviving sighting.
-
classifier
time.Now()syscall per child per tick. [LOW]internal/app/classifier.go:54(and theIdleMS/TitleIdleMShelpers it transitively calls ininternal/app/child.go:343-374) each calltime.Now(). Reading time on Linux is fast (vDSO) but with N children × 4time.Now()per tick × 4 ticks/sec it's wasted work that can be batched.- Fix direction: capture
now := time.Now().UnixNano()once at the top ofclassifyAlland thread it intoclassifyOneand the helpers as a parameter.
-
wait_for_pattern subscribes a listener for every call. [LOW]
internal/app/host.go:472-474: each invocation callsSession.Subscribe(wake)which clones the listener slice and swaps the atomic pointer; thedefer Unsubscribedoes the same on exit. Two allocations perwait_for_pattern. The agent pattern of looping onwait_for_patternafter every tool call pays this churn on the steady-state path.- Fix direction: a per-child
chunkBroadcasterregistered once at child spawn that hands out lightweight subscriber tokens, rather than going through the full session listener machinery.
On Hold
- There's a unicode being displayed in opencode [ON HOLD]
- Investigated 2026-05-14: patterm passes ghostty grapheme codepoints
through unchanged (vt/ghostty.go:452-462), so the
<?>glyph is most likely the host terminal's font fallback for opencode's Nerd Font private-use codepoints, not a patterm substitution. Need a concrete reproduction (which codepoint, which host terminal/font) before changing rendering.
- Investigated 2026-05-14: patterm passes ghostty grapheme codepoints
through unchanged (vt/ghostty.go:452-462), so the
- After codex rips for like 15 minutes, the terminal becomes quite slow. [ON HOLD / VERIFYING]
- 2026-05-14: Perf plan P1-P11 landed (see CHANGELOG). Needs a real long-running codex session to confirm whether the steady-state slowdown is gone or some hotspot remains. Capture a pprof if it still feels slow after ≥15 minutes — the structural drivers the audit named are all addressed, so a remaining symptom is a new one and probably wants fresh profiling.