Stress-test ASCII video at 30/60/120 fps; fix libghostty-vt Debug build

Added a full ASCII-video benchmark suite that hammers the renderer
with 30 KiB / 70 KiB full-screen frames at 30, 60, and 120 fps
targets — both renderer-only and full-pipeline (em.Write + renderer
+ stdout). Each stream benchmark reports µs/frame, fps_ceiling, and
percent of the per-frame budget consumed.

The pipeline benchmarks revealed we were missing 120 fps by a wide
margin (190%-350% of budget at 120fps, 60-90 fps ceiling). Isolating
em.Write confirmed libghostty-vt is the bottleneck — 16-29 ms per
truecolor frame, library file at 33 MiB.

Root cause: the Makefile invoked `zig build` with no
-Doptimize, and Zig's standardOptimizeOption defaults to Debug. So
the shipped libghostty-vt was unoptimised. Fixed by pinning
ReleaseFast in the Makefile (override via GHOSTTY_VT_OPTIMIZE for
debug builds of the upstream lib).

Existing checkouts need `make clean-deps && make deps` to pick up
the rebuild.
This commit is contained in:
2026-05-15 13:43:31 +01:00
parent 1c590f8e32
commit 2f109a84fa
4 changed files with 451 additions and 3 deletions

39
TODO.md
View File

@@ -3,16 +3,53 @@ Findings from a codebase sweep — not user-reported, needs review before
action. Each item names the anchor and a sketched fix.
Baseline benchmark numbers (`go test -bench=. ./internal/app/`, AMD
Ryzen 7 7800X3D):
Ryzen 7 7800X3D, libghostty-vt **Debug-mode** — see the first item
below):
```
# Renderer alone
ViewportRenderer_PlainASCII 229 MB/s 1.3 KB/op 6 allocs/op
ViewportRenderer_StyledLines 89 MB/s 91 KB/op 4325 allocs/op
ViewportRenderer_RatatuiBurst 40 MB/s 365 KB/op 17306 allocs/op
RendererThroughput_ReuseInstance 90 MB/s 316 KB/op 17380 allocs/op
ContainsOSC_NoOSC 3050 MB/s 0 B/op 0 allocs/op
# ASCII-video stream (renderer only — 3 sec at the target fps)
ASCIIVideo_Stream_8Color_120fps 260 µs/frame 3845 fps_ceiling 3.1% budget
ASCIIVideo_Stream_TrueColor_120fps 576 µs/frame 1735 fps_ceiling 6.9% budget
# Full pipeline (em.Write + renderer + io.Discard write)
Pipeline_ASCIIVideo_8Color_120fps 15838 µs/frame 63 fps_ceiling 190% budget
Pipeline_ASCIIVideo_TrueColor_120fps 29224 µs/frame 34 fps_ceiling 350% budget
# Emulator alone (libghostty-vt CSI/SGR parser)
Emulator_Write_Stream_8Color_120fps 15930 µs/frame 63 fps_ceiling
Emulator_Write_Stream_TrueColor_120fps 29241 µs/frame 34 fps_ceiling
```
The renderer alone hits 1700-3800 fps with margin. The full pipeline
caps at 34-63 fps. **The whole gap is libghostty-vt's em.Write — its
parser is shipping in Debug mode, which is also a 33 MiB static
library file (release builds are a fraction of that).**
- [ ] **libghostty-vt was being built in Debug mode.** [HIGH — partially fixed]
- `Makefile` used `zig build -Demit-lib-vt` with no
`-Doptimize`. Zig's `standardOptimizeOption` defaults to
`.Debug`, so the shipped static lib was unoptimised. Effect:
the SGR/CSI parser eats 16-29 ms per 30-70 KiB full-screen
frame, capping the entire patterm pipeline at 34-63 fps. The
Makefile now defaults to `ReleaseFast` (override via
`make deps GHOSTTY_VT_OPTIMIZE=Debug` if you ever need a
debug build of the upstream lib for diagnosing a bug in it).
- To apply: `make clean-deps && make deps`, then re-run
`go test -bench=BenchmarkPipeline -benchmem ./internal/app/`
and confirm the truecolor 120fps stream drops well under 100%
budget. Update the numbers in this section after rebuilding.
- Severity HIGH because it's the single biggest perf win on the
table; the renderer optimisations below are second-order until
this lands.
- [ ] **viewport renderer allocates ~1 alloc per 4 input bytes on SGR/CSI-heavy chunks.** [MEDIUM]
- `internal/app/viewport_renderer.go` — the styled-lines and
ratatui benchmarks show 4-17k allocs per chunk. The hot