Added a full ASCII-video benchmark suite that hammers the renderer
with 30 KiB / 70 KiB full-screen frames at 30, 60, and 120 fps
targets — both renderer-only and full-pipeline (em.Write + renderer
+ stdout). Each stream benchmark reports µs/frame, fps_ceiling, and
percent of the per-frame budget consumed.
The pipeline benchmarks revealed we were missing 120 fps by a wide
margin (190%-350% of budget at 120fps, 60-90 fps ceiling). Isolating
em.Write confirmed libghostty-vt is the bottleneck — 16-29 ms per
truecolor frame, library file at 33 MiB.
Root cause: the Makefile invoked `zig build` with no
-Doptimize, and Zig's standardOptimizeOption defaults to Debug. So
the shipped libghostty-vt was unoptimised. Fixed by pinning
ReleaseFast in the Makefile (override via GHOSTTY_VT_OPTIMIZE for
debug builds of the upstream lib).
Existing checkouts need `make clean-deps && make deps` to pick up
the rebuild.
Live metrics (--profile):
- New metricsTracker instruments OnPTYOut, viewport renderer,
stdout writes, libghostty-vt Write/Title CGO calls, sidebar /
tabbar / status draws (with cache-hit accounting), snapshot
replays, and the chrome ticker (so we can see ticker fires that
did nothing).
- Writes metrics.jsonl (one snapshot per second) and metrics.json
+ summary.txt on exit, alongside the existing pprof files.
- All record* methods are nil-safe so disabled paths pay only a
cheap nil check; counters are atomic so the per-PTY-chunk hot
path stays lock-free.
Benchmark suite (go test -bench=.):
- Three workload fixtures — plain ASCII, SGR-styled lines, and a
ratatui-style cursor-shuffling burst — plus a containsOSC
microbenchmark. Reports ns/op, MB/s, allocs/op, B/op.
- Initial baseline numbers added to TODO under the perf-audit
section, alongside two new findings (renderer allocs ~1 per 4
bytes on styled chunks; styled throughput tops out near
90 MB/s) those benchmarks surfaced.