Concrete perf metrics: live counters in --profile + benchmark suite
Live metrics (--profile): - New metricsTracker instruments OnPTYOut, viewport renderer, stdout writes, libghostty-vt Write/Title CGO calls, sidebar / tabbar / status draws (with cache-hit accounting), snapshot replays, and the chrome ticker (so we can see ticker fires that did nothing). - Writes metrics.jsonl (one snapshot per second) and metrics.json + summary.txt on exit, alongside the existing pprof files. - All record* methods are nil-safe so disabled paths pay only a cheap nil check; counters are atomic so the per-PTY-chunk hot path stays lock-free. Benchmark suite (go test -bench=.): - Three workload fixtures — plain ASCII, SGR-styled lines, and a ratatui-style cursor-shuffling burst — plus a containsOSC microbenchmark. Reports ns/op, MB/s, allocs/op, B/op. - Initial baseline numbers added to TODO under the perf-audit section, alongside two new findings (renderer allocs ~1 per 4 bytes on styled chunks; styled throughput tops out near 90 MB/s) those benchmarks surfaced.
This commit is contained in:
38
TODO.md
38
TODO.md
@@ -2,6 +2,44 @@
|
||||
Findings from a codebase sweep — not user-reported, needs review before
|
||||
action. Each item names the anchor and a sketched fix.
|
||||
|
||||
Baseline benchmark numbers (`go test -bench=. ./internal/app/`, AMD
|
||||
Ryzen 7 7800X3D):
|
||||
|
||||
```
|
||||
ViewportRenderer_PlainASCII 229 MB/s 1.3 KB/op 6 allocs/op
|
||||
ViewportRenderer_StyledLines 89 MB/s 91 KB/op 4325 allocs/op
|
||||
ViewportRenderer_RatatuiBurst 40 MB/s 365 KB/op 17306 allocs/op
|
||||
RendererThroughput_ReuseInstance 90 MB/s 316 KB/op 17380 allocs/op
|
||||
ContainsOSC_NoOSC 3050 MB/s 0 B/op 0 allocs/op
|
||||
```
|
||||
|
||||
- [ ] **viewport renderer allocates ~1 alloc per 4 input bytes on SGR/CSI-heavy chunks.** [MEDIUM]
|
||||
- `internal/app/viewport_renderer.go` — the styled-lines and
|
||||
ratatui benchmarks show 4-17k allocs per chunk. The hot
|
||||
contributors are likely (a) `string(vr.buf)` / `string(params)`
|
||||
conversions in `emitCSI` for every escape sequence, (b) the
|
||||
`pending strings.Builder` resizing as fragments arrive, and (c)
|
||||
`vr.shifter.Shift(vr.buf)` returning a fresh slice per CSI.
|
||||
- Fix direction: switch CSI param parsing to byte-slice
|
||||
comparison (no string conversion); reuse `vr.buf` and
|
||||
`vr.pending` backing arrays across `Render` calls by
|
||||
pre-growing in `newViewportRenderer`; have `cursorShifter.Shift`
|
||||
return into a caller-owned buffer instead of allocating.
|
||||
Profile-guided: run the styled-lines bench, point pprof at the
|
||||
allocs profile, fix the top three call sites.
|
||||
|
||||
- [ ] **viewport renderer throughput (~90 MB/s styled) limits codex steady-state.** [MEDIUM]
|
||||
- The styled-lines and ratatui benchmarks come in at 89 MB/s and
|
||||
40 MB/s respectively. A 100 KB/s codex burst is far under that
|
||||
limit, but a session-resume dump of a 5 MiB chat history takes
|
||||
50-130 ms of pure renderer time at those rates — enough to be
|
||||
user-visible at the start of a long resume.
|
||||
- Fix direction: same as the alloc fix above; once the per-call
|
||||
allocation cost drops, the throughput ceiling rises with it.
|
||||
Worth re-running the benches after fixing the allocs and only
|
||||
investing further if the styled-lines bench is still under
|
||||
~300 MB/s.
|
||||
|
||||
- [ ] **Session.Children() allocates a fresh slice on every call.** [MEDIUM]
|
||||
- `internal/app/session.go:530-541` walks `s.order` under `s.mu` and
|
||||
builds a new `[]*Child` slice every time. Callers on hot paths:
|
||||
|
||||
Reference in New Issue
Block a user