Stress-test ASCII video at 30/60/120 fps; fix libghostty-vt Debug build
Added a full ASCII-video benchmark suite that hammers the renderer with 30 KiB / 70 KiB full-screen frames at 30, 60, and 120 fps targets — both renderer-only and full-pipeline (em.Write + renderer + stdout). Each stream benchmark reports µs/frame, fps_ceiling, and percent of the per-frame budget consumed. The pipeline benchmarks revealed we were missing 120 fps by a wide margin (190%-350% of budget at 120fps, 60-90 fps ceiling). Isolating em.Write confirmed libghostty-vt is the bottleneck — 16-29 ms per truecolor frame, library file at 33 MiB. Root cause: the Makefile invoked `zig build` with no -Doptimize, and Zig's standardOptimizeOption defaults to Debug. So the shipped libghostty-vt was unoptimised. Fixed by pinning ReleaseFast in the Makefile (override via GHOSTTY_VT_OPTIMIZE for debug builds of the upstream lib). Existing checkouts need `make clean-deps && make deps` to pick up the rebuild.
This commit is contained in:
23
CHANGELOG.md
23
CHANGELOG.md
@@ -6,6 +6,29 @@ loosely follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
### Fixed
|
||||
- `make deps` now builds libghostty-vt with `-Doptimize=ReleaseFast`
|
||||
instead of zig's silent `Debug` default. The default-Debug build
|
||||
shipped an unoptimised CSI/SGR parser that ate 16-29 ms per
|
||||
30-70 KiB full-screen frame in benchmarks, capping the entire
|
||||
PTY-to-host pipeline at 34-63 fps no matter how fast the rest of
|
||||
patterm got. The static library file size drops accordingly
|
||||
(the Debug build was 33 MiB). Override with
|
||||
`make deps GHOSTTY_VT_OPTIMIZE=Debug` only when debugging the
|
||||
upstream library itself. Apply on existing checkouts with
|
||||
`make clean-deps && make deps`.
|
||||
|
||||
### Added
|
||||
- ASCII-video stress benchmarks (`internal/app/bench_test.go`):
|
||||
per-frame and per-stream variants at 30 / 60 / 120 fps targets,
|
||||
three workload fixtures (8-colour cells, 24-bit truecolor cells,
|
||||
and a Bad-Apple-style 1-bit pattern). Each stream benchmark
|
||||
reports `µs/frame`, an achievable `fps_ceiling`, and `budget_pct`
|
||||
so you can read off "do we hit N fps?" directly. A matching
|
||||
Pipeline_ASCIIVideo_* set includes libghostty-vt's em.Write CGO
|
||||
and an io.Discard stdout write so the FPS claim reflects the
|
||||
whole pipeline, not just the renderer.
|
||||
|
||||
### Fixed
|
||||
- Long claude session resume (and codex steady-state rendering) is
|
||||
noticeably faster. Two costs that scaled per-PTY-chunk are now
|
||||
|
||||
Reference in New Issue
Block a user