From bda799a3c66dd8a575ebefe94929547edffb0715 Mon Sep 17 00:00:00 2001 From: Harry Bayliss Date: Fri, 15 May 2026 13:54:48 +0100 Subject: [PATCH] =?UTF-8?q?mise-pin=20zig=200.15.2;=20rebuild=20libghostty?= =?UTF-8?q?-vt=20ReleaseFast=20=E2=80=94=2027-32x=20pipeline=20speedup?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Added .mise.toml pinning zig = "0.15.2" (the minimum the vendored Ghostty commit requires) and taught the Makefile to resolve zig through mise when available, falling back to PATH. Contributors run `mise install` once and `make deps` just works. Re-ran the pipeline benchmarks after rebuilding libghostty-vt with ReleaseFast (same hardware, AMD Ryzen 7 7800X3D): Debug ReleaseFast speedup Pipeline 8-colour @120fps 63 fps 2030 fps 32x Pipeline truecolor @120fps 34 fps 931 fps 27x Emulator-only truecolor 34 fps 2051 fps 60x 7-16x headroom over 120 fps for the heaviest workload (truecolor full-screen redraws). Static library size 33 MiB -> 13 MiB. TODO.md baseline numbers updated to reflect post-fix throughput; the "Debug-mode lib" finding is folded into the result it produced rather than left as an open item. --- .mise.toml | 8 ++++++++ CHANGELOG.md | 26 +++++++++++++++++--------- Makefile | 15 ++++++++++++--- TODO.md | 37 ++++++++++--------------------------- 4 files changed, 47 insertions(+), 39 deletions(-) create mode 100644 .mise.toml diff --git a/.mise.toml b/.mise.toml new file mode 100644 index 0000000..998f68d --- /dev/null +++ b/.mise.toml @@ -0,0 +1,8 @@ +# mise config — `mise install` provisions the tools `make deps` needs. +# +# libghostty-vt is built from a pinned upstream Ghostty commit; that +# commit's build.zig.zon pins minimum_zig_version = 0.15.2. We match +# it here so contributors don't have to puzzle out the version from +# a deep upstream file. +[tools] +zig = "0.15.2" diff --git a/CHANGELOG.md b/CHANGELOG.md index 21344e4..f484e0a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,17 +8,25 @@ loosely follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ### Fixed - `make deps` now builds libghostty-vt with `-Doptimize=ReleaseFast` - instead of zig's silent `Debug` default. The default-Debug build - shipped an unoptimised CSI/SGR parser that ate 16-29 ms per - 30-70 KiB full-screen frame in benchmarks, capping the entire - PTY-to-host pipeline at 34-63 fps no matter how fast the rest of - patterm got. The static library file size drops accordingly - (the Debug build was 33 MiB). Override with - `make deps GHOSTTY_VT_OPTIMIZE=Debug` only when debugging the - upstream library itself. Apply on existing checkouts with - `make clean-deps && make deps`. + instead of zig's silent `Debug` default, and resolves `zig` + through `mise` when a project `.mise.toml` pins it. The + default-Debug build shipped an unoptimised CSI/SGR parser that + ate 16-29 ms per 30-70 KiB full-screen frame in benchmarks, + capping the entire PTY-to-host pipeline at 34-63 fps. After the + rebuild the same pipeline runs at **930-2030 fps**: 27-32× the + prior throughput, and 7-16× margin over 120 fps for full-screen + truecolor ASCII video. Static library size drops from 33 MiB to + 13 MiB. Override with `make deps GHOSTTY_VT_OPTIMIZE=Debug` only + when debugging the upstream library itself. Apply on existing + checkouts with `mise install && make clean-deps && make deps`. ### Added +- `.mise.toml` pinning `zig = "0.15.2"` (the minimum version the + vendored Ghostty commit requires). Contributors run + `mise install` once; the Makefile picks up the resulting `zig` + binary automatically via `mise which zig` and falls back to + PATH when mise isn't available, so the existing build flow + still works. - ASCII-video stress benchmarks (`internal/app/bench_test.go`): per-frame and per-stream variants at 30 / 60 / 120 fps targets, three workload fixtures (8-colour cells, 24-bit truecolor cells, diff --git a/Makefile b/Makefile index fca9138..a40a4aa 100644 --- a/Makefile +++ b/Makefile @@ -31,10 +31,19 @@ deps-fetch: $(SOURCE)/.git/HEAD # debug build of the upstream lib. GHOSTTY_VT_OPTIMIZE ?= ReleaseFast +# Resolve zig via the project's mise pin (.mise.toml) when available, +# falling back to whatever's on PATH. mise keeps the zig version in +# lockstep with what the pinned ghostty commit requires; without it, +# contributors have to chase the version requirement themselves. +ZIG := $(shell command -v mise >/dev/null && mise which zig 2>/dev/null || command -v zig 2>/dev/null) + $(INSTALL)/lib/libghostty-vt.a: $(SOURCE)/.git/HEAD - @command -v zig >/dev/null || { echo "ERROR: zig not on PATH (need >=0.15.2 to build libghostty-vt)"; exit 1; } - @echo ">> building libghostty-vt with zig (optimize=$(GHOSTTY_VT_OPTIMIZE))" - @cd $(SOURCE) && zig build -Demit-lib-vt -Doptimize=$(GHOSTTY_VT_OPTIMIZE) --prefix $(INSTALL) + @if [ -z "$(ZIG)" ]; then \ + echo "ERROR: zig not available. Run \`mise install\` (see .mise.toml — needs zig 0.15.2) or install zig manually."; \ + exit 1; \ + fi + @echo ">> building libghostty-vt with $(ZIG) (optimize=$(GHOSTTY_VT_OPTIMIZE))" + @cd $(SOURCE) && $(ZIG) build -Demit-lib-vt -Doptimize=$(GHOSTTY_VT_OPTIMIZE) --prefix $(INSTALL) @test -f $(INSTALL)/lib/libghostty-vt.a || { echo "ERROR: expected static lib at $(INSTALL)/lib/libghostty-vt.a"; exit 1; } @echo ">> libghostty-vt installed under $(INSTALL)" diff --git a/TODO.md b/TODO.md index f8d97fb..3604cf4 100644 --- a/TODO.md +++ b/TODO.md @@ -3,8 +3,8 @@ Findings from a codebase sweep — not user-reported, needs review before action. Each item names the anchor and a sketched fix. Baseline benchmark numbers (`go test -bench=. ./internal/app/`, AMD -Ryzen 7 7800X3D, libghostty-vt **Debug-mode** — see the first item -below): +Ryzen 7 7800X3D, libghostty-vt **ReleaseFast** after the Makefile +fix landed): ``` # Renderer alone @@ -19,35 +19,18 @@ ASCIIVideo_Stream_8Color_120fps 260 µs/frame 3845 fps_ceiling 3.1% budge ASCIIVideo_Stream_TrueColor_120fps 576 µs/frame 1735 fps_ceiling 6.9% budget # Full pipeline (em.Write + renderer + io.Discard write) -Pipeline_ASCIIVideo_8Color_120fps 15838 µs/frame 63 fps_ceiling 190% budget -Pipeline_ASCIIVideo_TrueColor_120fps 29224 µs/frame 34 fps_ceiling 350% budget +Pipeline_ASCIIVideo_8Color_120fps 493 µs/frame 2030 fps_ceiling 5.9% budget +Pipeline_ASCIIVideo_TrueColor_120fps 1075 µs/frame 931 fps_ceiling 12.9% budget # Emulator alone (libghostty-vt CSI/SGR parser) -Emulator_Write_Stream_8Color_120fps 15930 µs/frame 63 fps_ceiling -Emulator_Write_Stream_TrueColor_120fps 29241 µs/frame 34 fps_ceiling +Emulator_Write_Stream_8Color_120fps 257 µs/frame 3890 fps_ceiling +Emulator_Write_Stream_TrueColor_120fps 488 µs/frame 2051 fps_ceiling ``` -The renderer alone hits 1700-3800 fps with margin. The full pipeline -caps at 34-63 fps. **The whole gap is libghostty-vt's em.Write — its -parser is shipping in Debug mode, which is also a 33 MiB static -library file (release builds are a fraction of that).** - -- [ ] **libghostty-vt was being built in Debug mode.** [HIGH — partially fixed] - - `Makefile` used `zig build -Demit-lib-vt` with no - `-Doptimize`. Zig's `standardOptimizeOption` defaults to - `.Debug`, so the shipped static lib was unoptimised. Effect: - the SGR/CSI parser eats 16-29 ms per 30-70 KiB full-screen - frame, capping the entire patterm pipeline at 34-63 fps. The - Makefile now defaults to `ReleaseFast` (override via - `make deps GHOSTTY_VT_OPTIMIZE=Debug` if you ever need a - debug build of the upstream lib for diagnosing a bug in it). - - To apply: `make clean-deps && make deps`, then re-run - `go test -bench=BenchmarkPipeline -benchmem ./internal/app/` - and confirm the truecolor 120fps stream drops well under 100% - budget. Update the numbers in this section after rebuilding. - - Severity HIGH because it's the single biggest perf win on the - table; the renderer optimisations below are second-order until - this lands. +Result of the fix below: 27-32× pipeline speedup, 60× emulator +speedup. Pipeline hits 930-2030 fps end-to-end — 7-16× headroom +over the 120 fps target on the heaviest workload (truecolor +full-screen redraws). - [ ] **viewport renderer allocates ~1 alloc per 4 input bytes on SGR/CSI-heavy chunks.** [MEDIUM]