From e6f5a94fae1c1cd4555395ded523dabb9c651fcb Mon Sep 17 00:00:00 2001 From: Harry Bayliss Date: Fri, 15 May 2026 19:53:51 +0100 Subject: [PATCH] Trim actioned perf-audit items; add palette polish TODO Removes the 2026-05-15 perf audit findings that have either shipped (see CHANGELOG) or are tracked elsewhere, and replaces them with the remaining palette-refinement notes: generic labels for focused actions ("Close current agent") and a higher-level concern that the palette has grown cluttered as features were added. --- TODO.md | 118 ++------------------------------------------------------ 1 file changed, 3 insertions(+), 115 deletions(-) diff --git a/TODO.md b/TODO.md index 761dec0..343b903 100644 --- a/TODO.md +++ b/TODO.md @@ -1,115 +1,3 @@ -# Perf Audit (reviewed 2026-05-15) -Findings that survived the 2026-05-15 review pass. Low and marginal -items from the original sweep were removed; remaining items have enough -measured or workflow evidence to justify action. - -Baseline benchmark numbers (`go test -bench=. ./internal/app/`, AMD -Ryzen 7 7800X3D, libghostty-vt **ReleaseFast** after the Makefile -fix landed): - -``` -# Renderer alone -ViewportRenderer_PlainASCII 229 MB/s 1.3 KB/op 6 allocs/op -ViewportRenderer_StyledLines 89 MB/s 91 KB/op 4325 allocs/op -ViewportRenderer_RatatuiBurst 40 MB/s 365 KB/op 17306 allocs/op -RendererThroughput_ReuseInstance 90 MB/s 316 KB/op 17380 allocs/op -ContainsOSC_NoOSC 3050 MB/s 0 B/op 0 allocs/op - -# ASCII-video stream (renderer only — 3 sec at the target fps) -ASCIIVideo_Stream_8Color_120fps 260 µs/frame 3845 fps_ceiling 3.1% budget -ASCIIVideo_Stream_TrueColor_120fps 576 µs/frame 1735 fps_ceiling 6.9% budget - -# Full pipeline (em.Write + renderer + io.Discard write) -Pipeline_ASCIIVideo_8Color_120fps 493 µs/frame 2030 fps_ceiling 5.9% budget -Pipeline_ASCIIVideo_TrueColor_120fps 1075 µs/frame 931 fps_ceiling 12.9% budget - -# Emulator alone (libghostty-vt CSI/SGR parser) -Emulator_Write_Stream_8Color_120fps 257 µs/frame 3890 fps_ceiling -Emulator_Write_Stream_TrueColor_120fps 488 µs/frame 2051 fps_ceiling -``` - -The current pipeline still has large 120 fps headroom. The remaining -renderer concern is multi-MiB styled replay latency and allocation -churn, not normal steady-state frame budget. - - -- [ ] **viewport renderer allocates heavily on SGR/CSI-heavy chunks.** [MEDIUM] - - Review evidence: five benchmark reps confirmed - `ViewportRenderer_StyledLines` at about 4,325 allocs per 16 KiB - chunk (~91.5 KB/op, roughly 1 alloc per 3.8 input bytes), and - `ViewportRenderer_RatatuiBurst` at about 17,306 allocs per chunk - (~365 KB/op). A 5 MiB styled resume benchmark allocated about - 31 MB across 1.38M objects. - - Likely hot paths: generic CSI/SGR output in - `internal/app/viewport_renderer.go` sends many sequences through - `vr.shifter.Shift(vr.buf)`, while `internal/app/cursorshift.go` - returns a fresh `[]byte` via `pending.String()` on every - `Shift` call and parses CSI params through `string(raw)` / - `strings.Split`. The mode-helper `string(params)` conversions - are real, but probably not the main SGR-heavy cost. - - Fix direction: make `cursorShifter` write into caller-owned - scratch output or directly into the viewport renderer's pending - builder; parse CSI params from byte slices; pre-grow/reuse - renderer and shifter buffers. Re-run styled-lines, ratatui, and - 5 MiB resume benchmarks; use pprof when available to confirm the - top allocation sites. - -- [ ] **large styled resume/replay dumps spend visible time in viewport rendering.** [MEDIUM] - - Review evidence: `BenchmarkSessionResume_5MiBStyled` measured - about 58 ms median and 63 ms p95 over five reps. The plain 5 MiB - benchmark was about 23-24 ms with only 21 allocs. The live path - renders focused PTY chunks through `renderer.Render`, then still - pays emulator writes, ring writes, event dispatch, stdout writes, - and real terminal paint. - - Scope: this is not a Codex steady-state throughput limit. A - 100 KB/s stream is far below the styled renderer's ~80-90 MB/s - ceiling. It matters for multi-MiB burst replay, resume/startup - dumps, and dense full-screen churn. - - Fix direction: do the allocation fix first, since it should also - improve throughput. After that, invest further only if styled - resume traces remain user-visible or the styled-lines benchmark - is still under roughly 300 MB/s. - -- [ ] **wait_for_pattern re-scans the entire stream/grid while waiting.** [MEDIUM] - - `internal/app/host.go:476-493` (the `check` closure). On - `scope="scrollback"` it calls `c.StreamRead(0)` followed by - `stripANSIBytes(nil, b)`, so each check can copy, strip, and - search the full 1 MiB ring. On `scope="grid"` it calls - `PlainText()` and runs the regex against the full grid string. - - Caveat from review: the current chunk notifier coalesces bursts - with a buffered channel and has a 500 ms fallback, so this is not - necessarily one full scan per PTY chunk. It is still meaningful - for active waits on chatty panes. - - Fix direction: for `scrollback`, track the last checked stream - offset and search only new output plus a bounded overlap/scratch - buffer so matches spanning chunks are not missed. For `grid`, - dedupe on `ScreenVersion()` and skip work when the version has - not changed. - -- [ ] **search_output rebuilds and searches whole scrollback on every call.** [MEDIUM] - - `internal/app/host.go:428-437` compiles a fresh regex, reads the - stream from offset 0, strips ANSI for `kind="rendered"`, converts - the full buffer to a string, and splits it into lines before - applying `limit`. This is meaningful when agents poll the same - pattern; it is low impact for ad hoc searches. - - Fix direction: cache compiled regexes by pattern; cache stripped - rendered output by child id and stream end offset; avoid - `strings.Split` over the whole ring when only the first `limit` - matches are needed. Prefer an incremental search shape if this - becomes the standard "watch for marker" path. - -# On Hold -- [ ] There's a unicode being displayed in opencode [ON HOLD] - - Investigated 2026-05-14: patterm passes ghostty grapheme codepoints - through unchanged (vt/ghostty.go:452-462), so the `` glyph is - most likely the *host* terminal's font fallback for opencode's - Nerd Font private-use codepoints, not a patterm substitution. - Need a concrete reproduction (which codepoint, which host - terminal/font) before changing rendering. -- [ ] After codex rips for like 15 minutes, the terminal becomes quite slow. [ON HOLD / VERIFYING] - - 2026-05-14: Perf plan P1-P11 landed (see CHANGELOG). Needs a real - long-running codex session to confirm whether the steady-state - slowdown is gone or some hotspot remains. Capture a pprof if it - still feels slow after ≥15 minutes — the structural drivers the - audit named are all addressed, so a remaining symptom is a new - one and probably wants fresh profiling. +The close action in the command palette should just be "Close current agent" rather than "Close codex" +Same with the other "focused" parts. It seems a bit clunky right now. "Close current agent" +In general I think while the feature set has grown, the actual refinement of it isn't great, it feels a bit cluttered.