Trim actioned perf-audit items; add palette polish TODO

Removes the 2026-05-15 perf audit findings that have either shipped (see CHANGELOG) or are tracked elsewhere, and replaces them with the remaining palette-refinement notes: generic labels for focused actions ("Close current agent") and a higher-level concern that the palette has grown cluttered as features were added.
2026-05-15 19:53:51 +01:00
parent c1ecba0624
commit e6f5a94fae
1 changed files with 3 additions and 115 deletions
--- a/TODO.md
+++ b/TODO.md
@@ -1,115 +1,3 @@
-# Perf Audit (reviewed 2026-05-15)
-Findings that survived the 2026-05-15 review pass. Low and marginal
-items from the original sweep were removed; remaining items have enough
-measured or workflow evidence to justify action.
-
-Baseline benchmark numbers (`go test -bench=. ./internal/app/`, AMD
-Ryzen 7 7800X3D, libghostty-vt **ReleaseFast** after the Makefile
-fix landed):
-
-```
-# Renderer alone
-ViewportRenderer_PlainASCII       229 MB/s     1.3 KB/op    6 allocs/op
-ViewportRenderer_StyledLines       89 MB/s    91   KB/op  4325 allocs/op
-ViewportRenderer_RatatuiBurst      40 MB/s   365   KB/op 17306 allocs/op
-RendererThroughput_ReuseInstance   90 MB/s   316   KB/op 17380 allocs/op
-ContainsOSC_NoOSC                3050 MB/s     0   B/op     0 allocs/op
-
-# ASCII-video stream (renderer only — 3 sec at the target fps)
-ASCIIVideo_Stream_8Color_120fps     260 µs/frame  3845 fps_ceiling   3.1% budget
-ASCIIVideo_Stream_TrueColor_120fps  576 µs/frame  1735 fps_ceiling   6.9% budget
-
-# Full pipeline (em.Write + renderer + io.Discard write)
-Pipeline_ASCIIVideo_8Color_120fps     493 µs/frame  2030 fps_ceiling   5.9% budget
-Pipeline_ASCIIVideo_TrueColor_120fps 1075 µs/frame   931 fps_ceiling  12.9% budget
-
-# Emulator alone (libghostty-vt CSI/SGR parser)
-Emulator_Write_Stream_8Color_120fps    257 µs/frame  3890 fps_ceiling
-Emulator_Write_Stream_TrueColor_120fps 488 µs/frame  2051 fps_ceiling
-```
-
-The current pipeline still has large 120 fps headroom. The remaining
-renderer concern is multi-MiB styled replay latency and allocation
-churn, not normal steady-state frame budget.
-
-
- [ ] **viewport renderer allocates heavily on SGR/CSI-heavy chunks.** [MEDIUM]
-  - Review evidence: five benchmark reps confirmed
-    `ViewportRenderer_StyledLines` at about 4,325 allocs per 16 KiB
-    chunk (~91.5 KB/op, roughly 1 alloc per 3.8 input bytes), and
-    `ViewportRenderer_RatatuiBurst` at about 17,306 allocs per chunk
-    (~365 KB/op). A 5 MiB styled resume benchmark allocated about
-    31 MB across 1.38M objects.
-  - Likely hot paths: generic CSI/SGR output in
-    `internal/app/viewport_renderer.go` sends many sequences through
-    `vr.shifter.Shift(vr.buf)`, while `internal/app/cursorshift.go`
-    returns a fresh `[]byte` via `pending.String()` on every
-    `Shift` call and parses CSI params through `string(raw)` /
-    `strings.Split`. The mode-helper `string(params)` conversions
-    are real, but probably not the main SGR-heavy cost.
-  - Fix direction: make `cursorShifter` write into caller-owned
-    scratch output or directly into the viewport renderer's pending
-    builder; parse CSI params from byte slices; pre-grow/reuse
-    renderer and shifter buffers. Re-run styled-lines, ratatui, and
-    5 MiB resume benchmarks; use pprof when available to confirm the
-    top allocation sites.
-
- [ ] **large styled resume/replay dumps spend visible time in viewport rendering.** [MEDIUM]
-  - Review evidence: `BenchmarkSessionResume_5MiBStyled` measured
-    about 58 ms median and 63 ms p95 over five reps. The plain 5 MiB
-    benchmark was about 23-24 ms with only 21 allocs. The live path
-    renders focused PTY chunks through `renderer.Render`, then still
-    pays emulator writes, ring writes, event dispatch, stdout writes,
-    and real terminal paint.
-  - Scope: this is not a Codex steady-state throughput limit. A
-    100 KB/s stream is far below the styled renderer's ~80-90 MB/s
-    ceiling. It matters for multi-MiB burst replay, resume/startup
-    dumps, and dense full-screen churn.
-  - Fix direction: do the allocation fix first, since it should also
-    improve throughput. After that, invest further only if styled
-    resume traces remain user-visible or the styled-lines benchmark
-    is still under roughly 300 MB/s.
-
- [ ] **wait_for_pattern re-scans the entire stream/grid while waiting.** [MEDIUM]
-  - `internal/app/host.go:476-493` (the `check` closure). On
-    `scope="scrollback"` it calls `c.StreamRead(0)` followed by
-    `stripANSIBytes(nil, b)`, so each check can copy, strip, and
-    search the full 1 MiB ring. On `scope="grid"` it calls
-    `PlainText()` and runs the regex against the full grid string.
-  - Caveat from review: the current chunk notifier coalesces bursts
-    with a buffered channel and has a 500 ms fallback, so this is not
-    necessarily one full scan per PTY chunk. It is still meaningful
-    for active waits on chatty panes.
-  - Fix direction: for `scrollback`, track the last checked stream
-    offset and search only new output plus a bounded overlap/scratch
-    buffer so matches spanning chunks are not missed. For `grid`,
-    dedupe on `ScreenVersion()` and skip work when the version has
-    not changed.
-
- [ ] **search_output rebuilds and searches whole scrollback on every call.** [MEDIUM]
-  - `internal/app/host.go:428-437` compiles a fresh regex, reads the
-    stream from offset 0, strips ANSI for `kind="rendered"`, converts
-    the full buffer to a string, and splits it into lines before
-    applying `limit`. This is meaningful when agents poll the same
-    pattern; it is low impact for ad hoc searches.
-  - Fix direction: cache compiled regexes by pattern; cache stripped
-    rendered output by child id and stream end offset; avoid
-    `strings.Split` over the whole ring when only the first `limit`
-    matches are needed. Prefer an incremental search shape if this
-    becomes the standard "watch for marker" path.
-
-# On Hold
- [ ] There's a unicode <?> being displayed in opencode [ON HOLD]
-  - Investigated 2026-05-14: patterm passes ghostty grapheme codepoints
-    through unchanged (vt/ghostty.go:452-462), so the `<?>` glyph is
-    most likely the *host* terminal's font fallback for opencode's
-    Nerd Font private-use codepoints, not a patterm substitution.
-    Need a concrete reproduction (which codepoint, which host
-    terminal/font) before changing rendering.
- [ ] After codex rips for like 15 minutes, the terminal becomes quite slow. [ON HOLD / VERIFYING]
-  - 2026-05-14: Perf plan P1-P11 landed (see CHANGELOG). Needs a real
-    long-running codex session to confirm whether the steady-state
-    slowdown is gone or some hotspot remains. Capture a pprof if it
-    still feels slow after ≥15 minutes — the structural drivers the
-    audit named are all addressed, so a remaining symptom is a new
-    one and probably wants fresh profiling.
+The close action in the command palette should just be "Close current agent" rather than "Close codex"
+Same with the other "focused" parts. It seems a bit clunky right now. "Close current agent"
+In general I think while the feature set has grown, the actual refinement of it isn't great, it feels a bit cluttered.