Files

Harry Bayliss b05065a601 Sync TODO.md perf-audit review pass

Removed low/marginal items from the original sweep; remaining items
have measured or workflow evidence to justify action.

2026-05-15 16:07:58 +01:00

6.2 KiB

Raw Blame History

Perf Audit (reviewed 2026-05-15)

Findings that survived the 2026-05-15 review pass. Low and marginal items from the original sweep were removed; remaining items have enough measured or workflow evidence to justify action.

Baseline benchmark numbers (go test -bench=. ./internal/app/, AMD Ryzen 7 7800X3D, libghostty-vt ReleaseFast after the Makefile fix landed):

# Renderer alone
ViewportRenderer_PlainASCII       229 MB/s     1.3 KB/op    6 allocs/op
ViewportRenderer_StyledLines       89 MB/s    91   KB/op  4325 allocs/op
ViewportRenderer_RatatuiBurst      40 MB/s   365   KB/op 17306 allocs/op
RendererThroughput_ReuseInstance   90 MB/s   316   KB/op 17380 allocs/op
ContainsOSC_NoOSC                3050 MB/s     0   B/op     0 allocs/op

# ASCII-video stream (renderer only — 3 sec at the target fps)
ASCIIVideo_Stream_8Color_120fps     260 µs/frame  3845 fps_ceiling   3.1% budget
ASCIIVideo_Stream_TrueColor_120fps  576 µs/frame  1735 fps_ceiling   6.9% budget

# Full pipeline (em.Write + renderer + io.Discard write)
Pipeline_ASCIIVideo_8Color_120fps     493 µs/frame  2030 fps_ceiling   5.9% budget
Pipeline_ASCIIVideo_TrueColor_120fps 1075 µs/frame   931 fps_ceiling  12.9% budget

# Emulator alone (libghostty-vt CSI/SGR parser)
Emulator_Write_Stream_8Color_120fps    257 µs/frame  3890 fps_ceiling
Emulator_Write_Stream_TrueColor_120fps 488 µs/frame  2051 fps_ceiling

The current pipeline still has large 120 fps headroom. The remaining renderer concern is multi-MiB styled replay latency and allocation churn, not normal steady-state frame budget.

viewport renderer allocates heavily on SGR/CSI-heavy chunks. [MEDIUM]
- Review evidence: five benchmark reps confirmed ViewportRenderer_StyledLines at about 4,325 allocs per 16 KiB chunk (~91.5 KB/op, roughly 1 alloc per 3.8 input bytes), and ViewportRenderer_RatatuiBurst at about 17,306 allocs per chunk (~365 KB/op). A 5 MiB styled resume benchmark allocated about 31 MB across 1.38M objects.
- Likely hot paths: generic CSI/SGR output in internal/app/viewport_renderer.go sends many sequences through vr.shifter.Shift(vr.buf), while internal/app/cursorshift.go returns a fresh []byte via pending.String() on every Shift call and parses CSI params through string(raw) / strings.Split. The mode-helper string(params) conversions are real, but probably not the main SGR-heavy cost.
- Fix direction: make cursorShifter write into caller-owned scratch output or directly into the viewport renderer's pending builder; parse CSI params from byte slices; pre-grow/reuse renderer and shifter buffers. Re-run styled-lines, ratatui, and 5 MiB resume benchmarks; use pprof when available to confirm the top allocation sites.
large styled resume/replay dumps spend visible time in viewport rendering. [MEDIUM]
- Review evidence: BenchmarkSessionResume_5MiBStyled measured about 58 ms median and 63 ms p95 over five reps. The plain 5 MiB benchmark was about 23-24 ms with only 21 allocs. The live path renders focused PTY chunks through renderer.Render, then still pays emulator writes, ring writes, event dispatch, stdout writes, and real terminal paint.
- Scope: this is not a Codex steady-state throughput limit. A 100 KB/s stream is far below the styled renderer's ~80-90 MB/s ceiling. It matters for multi-MiB burst replay, resume/startup dumps, and dense full-screen churn.
- Fix direction: do the allocation fix first, since it should also improve throughput. After that, invest further only if styled resume traces remain user-visible or the styled-lines benchmark is still under roughly 300 MB/s.
wait_for_pattern re-scans the entire stream/grid while waiting. [MEDIUM]
- internal/app/host.go:476-493 (the check closure). On scope="scrollback" it calls c.StreamRead(0) followed by stripANSIBytes(nil, b), so each check can copy, strip, and search the full 1 MiB ring. On scope="grid" it calls PlainText() and runs the regex against the full grid string.
- Caveat from review: the current chunk notifier coalesces bursts with a buffered channel and has a 500 ms fallback, so this is not necessarily one full scan per PTY chunk. It is still meaningful for active waits on chatty panes.
- Fix direction: for scrollback, track the last checked stream offset and search only new output plus a bounded overlap/scratch buffer so matches spanning chunks are not missed. For grid, dedupe on ScreenVersion() and skip work when the version has not changed.
search_output rebuilds and searches whole scrollback on every call. [MEDIUM]
- internal/app/host.go:428-437 compiles a fresh regex, reads the stream from offset 0, strips ANSI for kind="rendered", converts the full buffer to a string, and splits it into lines before applying limit. This is meaningful when agents poll the same pattern; it is low impact for ad hoc searches.
- Fix direction: cache compiled regexes by pattern; cache stripped rendered output by child id and stream end offset; avoid strings.Split over the whole ring when only the first limit matches are needed. Prefer an incremental search shape if this becomes the standard "watch for marker" path.

On Hold

There's a unicode being displayed in opencode [ON HOLD]
- Investigated 2026-05-14: patterm passes ghostty grapheme codepoints through unchanged (vt/ghostty.go:452-462), so the <?> glyph is most likely the host terminal's font fallback for opencode's Nerd Font private-use codepoints, not a patterm substitution. Need a concrete reproduction (which codepoint, which host terminal/font) before changing rendering.
After codex rips for like 15 minutes, the terminal becomes quite slow. [ON HOLD / VERIFYING]
- 2026-05-14: Perf plan P1-P11 landed (see CHANGELOG). Needs a real long-running codex session to confirm whether the steady-state slowdown is gone or some hotspot remains. Capture a pprof if it still feels slow after ≥15 minutes — the structural drivers the audit named are all addressed, so a remaining symptom is a new one and probably wants fresh profiling.

6.2 KiB Raw Blame History

Perf Audit (reviewed 2026-05-15)

On Hold

6.2 KiB

Raw Blame History