Trim actioned perf-audit items; add palette polish TODO

Removes the 2026-05-15 perf audit findings that have either shipped
(see CHANGELOG) or are tracked elsewhere, and replaces them with the
remaining palette-refinement notes: generic labels for focused
actions ("Close current agent") and a higher-level concern that the
palette has grown cluttered as features were added.
This commit is contained in:
2026-05-15 19:53:51 +01:00
parent c1ecba0624
commit e6f5a94fae

118
TODO.md
View File

@@ -1,115 +1,3 @@
# Perf Audit (reviewed 2026-05-15)
Findings that survived the 2026-05-15 review pass. Low and marginal
items from the original sweep were removed; remaining items have enough
measured or workflow evidence to justify action.
Baseline benchmark numbers (`go test -bench=. ./internal/app/`, AMD
Ryzen 7 7800X3D, libghostty-vt **ReleaseFast** after the Makefile
fix landed):
```
# Renderer alone
ViewportRenderer_PlainASCII 229 MB/s 1.3 KB/op 6 allocs/op
ViewportRenderer_StyledLines 89 MB/s 91 KB/op 4325 allocs/op
ViewportRenderer_RatatuiBurst 40 MB/s 365 KB/op 17306 allocs/op
RendererThroughput_ReuseInstance 90 MB/s 316 KB/op 17380 allocs/op
ContainsOSC_NoOSC 3050 MB/s 0 B/op 0 allocs/op
# ASCII-video stream (renderer only — 3 sec at the target fps)
ASCIIVideo_Stream_8Color_120fps 260 µs/frame 3845 fps_ceiling 3.1% budget
ASCIIVideo_Stream_TrueColor_120fps 576 µs/frame 1735 fps_ceiling 6.9% budget
# Full pipeline (em.Write + renderer + io.Discard write)
Pipeline_ASCIIVideo_8Color_120fps 493 µs/frame 2030 fps_ceiling 5.9% budget
Pipeline_ASCIIVideo_TrueColor_120fps 1075 µs/frame 931 fps_ceiling 12.9% budget
# Emulator alone (libghostty-vt CSI/SGR parser)
Emulator_Write_Stream_8Color_120fps 257 µs/frame 3890 fps_ceiling
Emulator_Write_Stream_TrueColor_120fps 488 µs/frame 2051 fps_ceiling
```
The current pipeline still has large 120 fps headroom. The remaining
renderer concern is multi-MiB styled replay latency and allocation
churn, not normal steady-state frame budget.
- [ ] **viewport renderer allocates heavily on SGR/CSI-heavy chunks.** [MEDIUM]
- Review evidence: five benchmark reps confirmed
`ViewportRenderer_StyledLines` at about 4,325 allocs per 16 KiB
chunk (~91.5 KB/op, roughly 1 alloc per 3.8 input bytes), and
`ViewportRenderer_RatatuiBurst` at about 17,306 allocs per chunk
(~365 KB/op). A 5 MiB styled resume benchmark allocated about
31 MB across 1.38M objects.
- Likely hot paths: generic CSI/SGR output in
`internal/app/viewport_renderer.go` sends many sequences through
`vr.shifter.Shift(vr.buf)`, while `internal/app/cursorshift.go`
returns a fresh `[]byte` via `pending.String()` on every
`Shift` call and parses CSI params through `string(raw)` /
`strings.Split`. The mode-helper `string(params)` conversions
are real, but probably not the main SGR-heavy cost.
- Fix direction: make `cursorShifter` write into caller-owned
scratch output or directly into the viewport renderer's pending
builder; parse CSI params from byte slices; pre-grow/reuse
renderer and shifter buffers. Re-run styled-lines, ratatui, and
5 MiB resume benchmarks; use pprof when available to confirm the
top allocation sites.
- [ ] **large styled resume/replay dumps spend visible time in viewport rendering.** [MEDIUM]
- Review evidence: `BenchmarkSessionResume_5MiBStyled` measured
about 58 ms median and 63 ms p95 over five reps. The plain 5 MiB
benchmark was about 23-24 ms with only 21 allocs. The live path
renders focused PTY chunks through `renderer.Render`, then still
pays emulator writes, ring writes, event dispatch, stdout writes,
and real terminal paint.
- Scope: this is not a Codex steady-state throughput limit. A
100 KB/s stream is far below the styled renderer's ~80-90 MB/s
ceiling. It matters for multi-MiB burst replay, resume/startup
dumps, and dense full-screen churn.
- Fix direction: do the allocation fix first, since it should also
improve throughput. After that, invest further only if styled
resume traces remain user-visible or the styled-lines benchmark
is still under roughly 300 MB/s.
- [ ] **wait_for_pattern re-scans the entire stream/grid while waiting.** [MEDIUM]
- `internal/app/host.go:476-493` (the `check` closure). On
`scope="scrollback"` it calls `c.StreamRead(0)` followed by
`stripANSIBytes(nil, b)`, so each check can copy, strip, and
search the full 1 MiB ring. On `scope="grid"` it calls
`PlainText()` and runs the regex against the full grid string.
- Caveat from review: the current chunk notifier coalesces bursts
with a buffered channel and has a 500 ms fallback, so this is not
necessarily one full scan per PTY chunk. It is still meaningful
for active waits on chatty panes.
- Fix direction: for `scrollback`, track the last checked stream
offset and search only new output plus a bounded overlap/scratch
buffer so matches spanning chunks are not missed. For `grid`,
dedupe on `ScreenVersion()` and skip work when the version has
not changed.
- [ ] **search_output rebuilds and searches whole scrollback on every call.** [MEDIUM]
- `internal/app/host.go:428-437` compiles a fresh regex, reads the
stream from offset 0, strips ANSI for `kind="rendered"`, converts
the full buffer to a string, and splits it into lines before
applying `limit`. This is meaningful when agents poll the same
pattern; it is low impact for ad hoc searches.
- Fix direction: cache compiled regexes by pattern; cache stripped
rendered output by child id and stream end offset; avoid
`strings.Split` over the whole ring when only the first `limit`
matches are needed. Prefer an incremental search shape if this
becomes the standard "watch for marker" path.
# On Hold
- [ ] There's a unicode <?> being displayed in opencode [ON HOLD]
- Investigated 2026-05-14: patterm passes ghostty grapheme codepoints
through unchanged (vt/ghostty.go:452-462), so the `<?>` glyph is
most likely the *host* terminal's font fallback for opencode's
Nerd Font private-use codepoints, not a patterm substitution.
Need a concrete reproduction (which codepoint, which host
terminal/font) before changing rendering.
- [ ] After codex rips for like 15 minutes, the terminal becomes quite slow. [ON HOLD / VERIFYING]
- 2026-05-14: Perf plan P1-P11 landed (see CHANGELOG). Needs a real
long-running codex session to confirm whether the steady-state
slowdown is gone or some hotspot remains. Capture a pprof if it
still feels slow after ≥15 minutes — the structural drivers the
audit named are all addressed, so a remaining symptom is a new
one and probably wants fresh profiling.
The close action in the command palette should just be "Close current agent" rather than "Close codex"
Same with the other "focused" parts. It seems a bit clunky right now. "Close current agent"
In general I think while the feature set has grown, the actual refinement of it isn't great, it feels a bit cluttered.