Stress-test ASCII video at 30/60/120 fps; fix libghostty-vt Debug build

Added a full ASCII-video benchmark suite that hammers the renderer
with 30 KiB / 70 KiB full-screen frames at 30, 60, and 120 fps
targets — both renderer-only and full-pipeline (em.Write + renderer
+ stdout). Each stream benchmark reports µs/frame, fps_ceiling, and
percent of the per-frame budget consumed.

The pipeline benchmarks revealed we were missing 120 fps by a wide
margin (190%-350% of budget at 120fps, 60-90 fps ceiling). Isolating
em.Write confirmed libghostty-vt is the bottleneck — 16-29 ms per
truecolor frame, library file at 33 MiB.

Root cause: the Makefile invoked `zig build` with no
-Doptimize, and Zig's standardOptimizeOption defaults to Debug. So
the shipped libghostty-vt was unoptimised. Fixed by pinning
ReleaseFast in the Makefile (override via GHOSTTY_VT_OPTIMIZE for
debug builds of the upstream lib).

Existing checkouts need `make clean-deps && make deps` to pick up
the rebuild.
This commit is contained in:
2026-05-15 13:43:31 +01:00
parent 1c590f8e32
commit 2f109a84fa
4 changed files with 451 additions and 3 deletions

View File

@@ -6,6 +6,29 @@ loosely follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Fixed
- `make deps` now builds libghostty-vt with `-Doptimize=ReleaseFast`
instead of zig's silent `Debug` default. The default-Debug build
shipped an unoptimised CSI/SGR parser that ate 16-29 ms per
30-70 KiB full-screen frame in benchmarks, capping the entire
PTY-to-host pipeline at 34-63 fps no matter how fast the rest of
patterm got. The static library file size drops accordingly
(the Debug build was 33 MiB). Override with
`make deps GHOSTTY_VT_OPTIMIZE=Debug` only when debugging the
upstream library itself. Apply on existing checkouts with
`make clean-deps && make deps`.
### Added
- ASCII-video stress benchmarks (`internal/app/bench_test.go`):
per-frame and per-stream variants at 30 / 60 / 120 fps targets,
three workload fixtures (8-colour cells, 24-bit truecolor cells,
and a Bad-Apple-style 1-bit pattern). Each stream benchmark
reports `µs/frame`, an achievable `fps_ceiling`, and `budget_pct`
so you can read off "do we hit N fps?" directly. A matching
Pipeline_ASCIIVideo_* set includes libghostty-vt's em.Write CGO
and an io.Discard stdout write so the FPS claim reflects the
whole pipeline, not just the renderer.
### Fixed
- Long claude session resume (and codex steady-state rendering) is
noticeably faster. Two costs that scaled per-PTY-chunk are now