Stress-test ASCII video at 30/60/120 fps; fix libghostty-vt Debug build
Added a full ASCII-video benchmark suite that hammers the renderer with 30 KiB / 70 KiB full-screen frames at 30, 60, and 120 fps targets — both renderer-only and full-pipeline (em.Write + renderer + stdout). Each stream benchmark reports µs/frame, fps_ceiling, and percent of the per-frame budget consumed. The pipeline benchmarks revealed we were missing 120 fps by a wide margin (190%-350% of budget at 120fps, 60-90 fps ceiling). Isolating em.Write confirmed libghostty-vt is the bottleneck — 16-29 ms per truecolor frame, library file at 33 MiB. Root cause: the Makefile invoked `zig build` with no -Doptimize, and Zig's standardOptimizeOption defaults to Debug. So the shipped libghostty-vt was unoptimised. Fixed by pinning ReleaseFast in the Makefile (override via GHOSTTY_VT_OPTIMIZE for debug builds of the upstream lib). Existing checkouts need `make clean-deps && make deps` to pick up the rebuild.
This commit is contained in:
23
CHANGELOG.md
23
CHANGELOG.md
@@ -6,6 +6,29 @@ loosely follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
### Fixed
|
||||
- `make deps` now builds libghostty-vt with `-Doptimize=ReleaseFast`
|
||||
instead of zig's silent `Debug` default. The default-Debug build
|
||||
shipped an unoptimised CSI/SGR parser that ate 16-29 ms per
|
||||
30-70 KiB full-screen frame in benchmarks, capping the entire
|
||||
PTY-to-host pipeline at 34-63 fps no matter how fast the rest of
|
||||
patterm got. The static library file size drops accordingly
|
||||
(the Debug build was 33 MiB). Override with
|
||||
`make deps GHOSTTY_VT_OPTIMIZE=Debug` only when debugging the
|
||||
upstream library itself. Apply on existing checkouts with
|
||||
`make clean-deps && make deps`.
|
||||
|
||||
### Added
|
||||
- ASCII-video stress benchmarks (`internal/app/bench_test.go`):
|
||||
per-frame and per-stream variants at 30 / 60 / 120 fps targets,
|
||||
three workload fixtures (8-colour cells, 24-bit truecolor cells,
|
||||
and a Bad-Apple-style 1-bit pattern). Each stream benchmark
|
||||
reports `µs/frame`, an achievable `fps_ceiling`, and `budget_pct`
|
||||
so you can read off "do we hit N fps?" directly. A matching
|
||||
Pipeline_ASCIIVideo_* set includes libghostty-vt's em.Write CGO
|
||||
and an io.Discard stdout write so the FPS claim reflects the
|
||||
whole pipeline, not just the renderer.
|
||||
|
||||
### Fixed
|
||||
- Long claude session resume (and codex steady-state rendering) is
|
||||
noticeably faster. Two costs that scaled per-PTY-chunk are now
|
||||
|
||||
15
Makefile
15
Makefile
@@ -20,10 +20,21 @@ $(SOURCE)/.git/HEAD:
|
||||
|
||||
deps-fetch: $(SOURCE)/.git/HEAD
|
||||
|
||||
# Zig's `standardOptimizeOption` defaults to .Debug when no
|
||||
# -Doptimize is passed, which makes libghostty-vt's CSI/SGR parser
|
||||
# an order of magnitude slower — truecolor full-screen frames spend
|
||||
# ~16-29 ms each in em.Write under Debug (see
|
||||
# internal/app/bench_test.go BenchmarkEmulator_Write_*), which caps
|
||||
# the full PTY-to-host pipeline at ~60 fps. ReleaseFast is the
|
||||
# right default for the shipped artefact. Override with
|
||||
# `make deps GHOSTTY_VT_OPTIMIZE=Debug` when you actually want a
|
||||
# debug build of the upstream lib.
|
||||
GHOSTTY_VT_OPTIMIZE ?= ReleaseFast
|
||||
|
||||
$(INSTALL)/lib/libghostty-vt.a: $(SOURCE)/.git/HEAD
|
||||
@command -v zig >/dev/null || { echo "ERROR: zig not on PATH (need >=0.15.2 to build libghostty-vt)"; exit 1; }
|
||||
@echo ">> building libghostty-vt with zig"
|
||||
@cd $(SOURCE) && zig build -Demit-lib-vt --prefix $(INSTALL)
|
||||
@echo ">> building libghostty-vt with zig (optimize=$(GHOSTTY_VT_OPTIMIZE))"
|
||||
@cd $(SOURCE) && zig build -Demit-lib-vt -Doptimize=$(GHOSTTY_VT_OPTIMIZE) --prefix $(INSTALL)
|
||||
@test -f $(INSTALL)/lib/libghostty-vt.a || { echo "ERROR: expected static lib at $(INSTALL)/lib/libghostty-vt.a"; exit 1; }
|
||||
@echo ">> libghostty-vt installed under $(INSTALL)"
|
||||
|
||||
|
||||
39
TODO.md
39
TODO.md
@@ -3,16 +3,53 @@ Findings from a codebase sweep — not user-reported, needs review before
|
||||
action. Each item names the anchor and a sketched fix.
|
||||
|
||||
Baseline benchmark numbers (`go test -bench=. ./internal/app/`, AMD
|
||||
Ryzen 7 7800X3D):
|
||||
Ryzen 7 7800X3D, libghostty-vt **Debug-mode** — see the first item
|
||||
below):
|
||||
|
||||
```
|
||||
# Renderer alone
|
||||
ViewportRenderer_PlainASCII 229 MB/s 1.3 KB/op 6 allocs/op
|
||||
ViewportRenderer_StyledLines 89 MB/s 91 KB/op 4325 allocs/op
|
||||
ViewportRenderer_RatatuiBurst 40 MB/s 365 KB/op 17306 allocs/op
|
||||
RendererThroughput_ReuseInstance 90 MB/s 316 KB/op 17380 allocs/op
|
||||
ContainsOSC_NoOSC 3050 MB/s 0 B/op 0 allocs/op
|
||||
|
||||
# ASCII-video stream (renderer only — 3 sec at the target fps)
|
||||
ASCIIVideo_Stream_8Color_120fps 260 µs/frame 3845 fps_ceiling 3.1% budget
|
||||
ASCIIVideo_Stream_TrueColor_120fps 576 µs/frame 1735 fps_ceiling 6.9% budget
|
||||
|
||||
# Full pipeline (em.Write + renderer + io.Discard write)
|
||||
Pipeline_ASCIIVideo_8Color_120fps 15838 µs/frame 63 fps_ceiling 190% budget
|
||||
Pipeline_ASCIIVideo_TrueColor_120fps 29224 µs/frame 34 fps_ceiling 350% budget
|
||||
|
||||
# Emulator alone (libghostty-vt CSI/SGR parser)
|
||||
Emulator_Write_Stream_8Color_120fps 15930 µs/frame 63 fps_ceiling
|
||||
Emulator_Write_Stream_TrueColor_120fps 29241 µs/frame 34 fps_ceiling
|
||||
```
|
||||
|
||||
The renderer alone hits 1700-3800 fps with margin. The full pipeline
|
||||
caps at 34-63 fps. **The whole gap is libghostty-vt's em.Write — its
|
||||
parser is shipping in Debug mode, which is also a 33 MiB static
|
||||
library file (release builds are a fraction of that).**
|
||||
|
||||
- [ ] **libghostty-vt was being built in Debug mode.** [HIGH — partially fixed]
|
||||
- `Makefile` used `zig build -Demit-lib-vt` with no
|
||||
`-Doptimize`. Zig's `standardOptimizeOption` defaults to
|
||||
`.Debug`, so the shipped static lib was unoptimised. Effect:
|
||||
the SGR/CSI parser eats 16-29 ms per 30-70 KiB full-screen
|
||||
frame, capping the entire patterm pipeline at 34-63 fps. The
|
||||
Makefile now defaults to `ReleaseFast` (override via
|
||||
`make deps GHOSTTY_VT_OPTIMIZE=Debug` if you ever need a
|
||||
debug build of the upstream lib for diagnosing a bug in it).
|
||||
- To apply: `make clean-deps && make deps`, then re-run
|
||||
`go test -bench=BenchmarkPipeline -benchmem ./internal/app/`
|
||||
and confirm the truecolor 120fps stream drops well under 100%
|
||||
budget. Update the numbers in this section after rebuilding.
|
||||
- Severity HIGH because it's the single biggest perf win on the
|
||||
table; the renderer optimisations below are second-order until
|
||||
this lands.
|
||||
|
||||
|
||||
- [ ] **viewport renderer allocates ~1 alloc per 4 input bytes on SGR/CSI-heavy chunks.** [MEDIUM]
|
||||
- `internal/app/viewport_renderer.go` — the styled-lines and
|
||||
ratatui benchmarks show 4-17k allocs per chunk. The hot
|
||||
|
||||
@@ -2,8 +2,11 @@ package app
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"io"
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
"github.com/hjbdev/patterm/internal/vt"
|
||||
)
|
||||
|
||||
// Benchmarks for patterm's hot paths. Run with:
|
||||
@@ -167,3 +170,377 @@ func BenchmarkRendererThroughput_ReuseInstance(b *testing.B) {
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Stress workloads — these model the worst things a real session
|
||||
// can throw at us. The headline target is "ASCII video": every cell
|
||||
// of an 80x40 viewport carries an SGR colour change and a printable
|
||||
// character, rendered as one chunk per frame. Real ASCII-video CLIs
|
||||
// (ascii-image-converter, asciinema-render, towel.blinkenlights, the
|
||||
// Bad Apple meme) hit patterm with exactly this pattern at 24-30 fps
|
||||
// for minutes at a time.
|
||||
//
|
||||
// We synthesise the workload rather than ship a captured corpus so
|
||||
// the benchmarks stay deterministic and the repo doesn't carry tens
|
||||
// of MiB of fixture data. The encoding is faithful to what those
|
||||
// tools actually emit.
|
||||
|
||||
// buildASCIIVideoFrame builds a single full-viewport frame with
|
||||
// 8-colour SGR per cell (`\x1b[3Nm`). One frame ≈ 30 KiB for an
|
||||
// 80x40 viewport, which lines up with what ascii-video tools emit.
|
||||
func buildASCIIVideoFrame(cols, rows int) []byte {
|
||||
var b strings.Builder
|
||||
b.WriteString("\x1b[H") // home cursor before the frame starts
|
||||
for r := 0; r < rows; r++ {
|
||||
for c := 0; c < cols; c++ {
|
||||
fmt.Fprintf(&b, "\x1b[3%dm%c", (r+c)%8, byte(' '+(r*c)%(0x7e-' ')))
|
||||
}
|
||||
b.WriteString("\x1b[0m\r\n")
|
||||
}
|
||||
return []byte(b.String())
|
||||
}
|
||||
|
||||
// buildASCIIVideoFrameTrueColor builds the same frame but with
|
||||
// 24-bit RGB SGR (`\x1b[38;2;R;G;Bm`). Every cell is ~20 bytes of
|
||||
// escape + 1 byte glyph, so a frame is ≈ 70 KiB. This is what
|
||||
// chafa --colors=full and modern terminal video players emit, and
|
||||
// it's the heaviest SGR variant the renderer's CSI path sees.
|
||||
func buildASCIIVideoFrameTrueColor(cols, rows int) []byte {
|
||||
var b strings.Builder
|
||||
b.WriteString("\x1b[H")
|
||||
for r := 0; r < rows; r++ {
|
||||
for c := 0; c < cols; c++ {
|
||||
rd := (r * 7) % 256
|
||||
gd := (c * 11) % 256
|
||||
bd := ((r + c) * 13) % 256
|
||||
fmt.Fprintf(&b, "\x1b[38;2;%d;%d;%dm%c", rd, gd, bd, byte(' '+(r*c)%(0x7e-' ')))
|
||||
}
|
||||
b.WriteString("\x1b[0m\r\n")
|
||||
}
|
||||
return []byte(b.String())
|
||||
}
|
||||
|
||||
// buildBadApplePattern builds the simplest possible ASCII video
|
||||
// frame: alternating black/white cells (the Bad Apple meme is
|
||||
// essentially a 1-bit silhouette video). This is the pattern that
|
||||
// stresses the SGR state-machine without exercising truecolor parse
|
||||
// — useful for isolating "is the cost in the colour parsing or in
|
||||
// the cell-by-cell switching?"
|
||||
func buildBadApplePattern(cols, rows int) []byte {
|
||||
var b strings.Builder
|
||||
b.WriteString("\x1b[H")
|
||||
for r := 0; r < rows; r++ {
|
||||
for c := 0; c < cols; c++ {
|
||||
if (r+c)%2 == 0 {
|
||||
b.WriteString("\x1b[37m█")
|
||||
} else {
|
||||
b.WriteString("\x1b[30m█")
|
||||
}
|
||||
}
|
||||
b.WriteString("\x1b[0m\r\n")
|
||||
}
|
||||
return []byte(b.String())
|
||||
}
|
||||
|
||||
// BenchmarkASCIIVideo_Frame_8Color renders a single full-screen
|
||||
// frame as one chunk. The headline number is MB/s — at 30 fps a
|
||||
// frame is one PTY chunk every ~33 ms, so this should comfortably
|
||||
// stay well under 1 ms.
|
||||
func BenchmarkASCIIVideo_Frame_8Color(b *testing.B) {
|
||||
frame := buildASCIIVideoFrame(80, 40)
|
||||
b.SetBytes(int64(len(frame)))
|
||||
b.ReportAllocs()
|
||||
b.ResetTimer()
|
||||
for i := 0; i < b.N; i++ {
|
||||
vr := newViewportRenderer(newTerminalLayout(120, 40))
|
||||
_ = vr.Render(frame)
|
||||
}
|
||||
}
|
||||
|
||||
// BenchmarkASCIIVideo_Frame_TrueColor renders a single truecolor
|
||||
// frame. ~70 KiB per frame. Compare this to the 8-colour number to
|
||||
// see how much extra cost the truecolor SGR parse imposes — the
|
||||
// `\x1b[38;2;R;G;Bm` form is the longest and most parameter-rich
|
||||
// CSI patterm sees in practice.
|
||||
func BenchmarkASCIIVideo_Frame_TrueColor(b *testing.B) {
|
||||
frame := buildASCIIVideoFrameTrueColor(80, 40)
|
||||
b.SetBytes(int64(len(frame)))
|
||||
b.ReportAllocs()
|
||||
b.ResetTimer()
|
||||
for i := 0; i < b.N; i++ {
|
||||
vr := newViewportRenderer(newTerminalLayout(120, 40))
|
||||
_ = vr.Render(frame)
|
||||
}
|
||||
}
|
||||
|
||||
// BenchmarkASCIIVideo_Frame_BadApple is the 1-bit pattern: simplest
|
||||
// SGR (two colours, alternating). Isolates the renderer's cell-by-
|
||||
// cell SGR cycling cost from the truecolor parse cost.
|
||||
func BenchmarkASCIIVideo_Frame_BadApple(b *testing.B) {
|
||||
frame := buildBadApplePattern(80, 40)
|
||||
b.SetBytes(int64(len(frame)))
|
||||
b.ReportAllocs()
|
||||
b.ResetTimer()
|
||||
for i := 0; i < b.N; i++ {
|
||||
vr := newViewportRenderer(newTerminalLayout(120, 40))
|
||||
_ = vr.Render(frame)
|
||||
}
|
||||
}
|
||||
|
||||
// runStreamBench is the shared body for the per-fps stream
|
||||
// benchmarks. It feeds a fixed frame N times through a single
|
||||
// renderer instance and reports µs/frame + an achievable-fps
|
||||
// ceiling alongside the standard ns/op + MB/s. The fps value in
|
||||
// the benchmark name is the *target* — the workload itself doesn't
|
||||
// rate-limit; we just decide how many frames make a benchmark op
|
||||
// (3 seconds' worth) so steady-state cost dominates warm-up.
|
||||
func runStreamBench(b *testing.B, frame []byte, fps int) {
|
||||
frames := fps * 3 // 3 seconds at the target rate
|
||||
totalBytes := int64(len(frame) * frames)
|
||||
b.SetBytes(totalBytes)
|
||||
b.ReportAllocs()
|
||||
b.ResetTimer()
|
||||
for i := 0; i < b.N; i++ {
|
||||
vr := newViewportRenderer(newTerminalLayout(120, 40))
|
||||
for f := 0; f < frames; f++ {
|
||||
_ = vr.Render(frame)
|
||||
}
|
||||
}
|
||||
nsPerFrame := float64(b.Elapsed().Nanoseconds()) / float64(b.N*frames)
|
||||
b.ReportMetric(nsPerFrame/1000.0, "µs/frame")
|
||||
b.ReportMetric(1e9/nsPerFrame, "fps_ceiling")
|
||||
// budget_pct = how much of the per-frame budget at the target
|
||||
// rate we burn. Under 100 means we can hit the target; over
|
||||
// means we can't.
|
||||
budgetNs := 1e9 / float64(fps)
|
||||
b.ReportMetric(nsPerFrame/budgetNs*100, "budget_pct")
|
||||
}
|
||||
|
||||
// BenchmarkASCIIVideo_Stream_8Color_30fps / _60fps / _120fps reuse
|
||||
// one renderer across (3 × fps) frames. The headline numbers are
|
||||
// µs/frame, fps_ceiling (= 1e9 / ns/frame), and budget_pct (=
|
||||
// percent of the per-frame budget at the target rate we consume).
|
||||
//
|
||||
// 30 fps is the typical ASCII-video baseline (towel, chafa, Bad
|
||||
// Apple ports). 60 is the "smooth playback" target. 120 is a
|
||||
// future-proofing stress level matching modern high-refresh
|
||||
// terminals.
|
||||
func BenchmarkASCIIVideo_Stream_8Color_30fps(b *testing.B) {
|
||||
runStreamBench(b, buildASCIIVideoFrame(80, 40), 30)
|
||||
}
|
||||
func BenchmarkASCIIVideo_Stream_8Color_60fps(b *testing.B) {
|
||||
runStreamBench(b, buildASCIIVideoFrame(80, 40), 60)
|
||||
}
|
||||
func BenchmarkASCIIVideo_Stream_8Color_120fps(b *testing.B) {
|
||||
runStreamBench(b, buildASCIIVideoFrame(80, 40), 120)
|
||||
}
|
||||
|
||||
// BenchmarkASCIIVideo_Stream_TrueColor_* same set but with the
|
||||
// truecolor frames. Compare against the 8-colour numbers to see
|
||||
// what the longer `\x1b[38;2;R;G;Bm` parse costs us.
|
||||
func BenchmarkASCIIVideo_Stream_TrueColor_30fps(b *testing.B) {
|
||||
runStreamBench(b, buildASCIIVideoFrameTrueColor(80, 40), 30)
|
||||
}
|
||||
func BenchmarkASCIIVideo_Stream_TrueColor_60fps(b *testing.B) {
|
||||
runStreamBench(b, buildASCIIVideoFrameTrueColor(80, 40), 60)
|
||||
}
|
||||
func BenchmarkASCIIVideo_Stream_TrueColor_120fps(b *testing.B) {
|
||||
runStreamBench(b, buildASCIIVideoFrameTrueColor(80, 40), 120)
|
||||
}
|
||||
|
||||
// BenchmarkASCIIVideo_Stream_BadApple_* tracks the 1-bit alternating
|
||||
// pattern. Isolates per-cell SGR cycling cost from the truecolor
|
||||
// parse cost above — useful when reading the diff between the two
|
||||
// stream variants.
|
||||
func BenchmarkASCIIVideo_Stream_BadApple_30fps(b *testing.B) {
|
||||
runStreamBench(b, buildBadApplePattern(80, 40), 30)
|
||||
}
|
||||
func BenchmarkASCIIVideo_Stream_BadApple_60fps(b *testing.B) {
|
||||
runStreamBench(b, buildBadApplePattern(80, 40), 60)
|
||||
}
|
||||
func BenchmarkASCIIVideo_Stream_BadApple_120fps(b *testing.B) {
|
||||
runStreamBench(b, buildBadApplePattern(80, 40), 120)
|
||||
}
|
||||
|
||||
// BenchmarkEmulator_Write_8Color / _TrueColor isolate the
|
||||
// libghostty-vt CGO cost — same frames the Pipeline benchmarks use,
|
||||
// but feeding only the emulator. The delta between this and
|
||||
// BenchmarkASCIIVideo_Stream_… is the renderer's share; the rest
|
||||
// is libghostty-vt.
|
||||
func BenchmarkEmulator_Write_8Color_Frame(b *testing.B) {
|
||||
frame := buildASCIIVideoFrame(80, 40)
|
||||
b.SetBytes(int64(len(frame)))
|
||||
b.ReportAllocs()
|
||||
b.ResetTimer()
|
||||
for i := 0; i < b.N; i++ {
|
||||
em, err := vt.NewGhosttyEmulator(80, 40)
|
||||
if err != nil {
|
||||
b.Fatalf("emulator: %v", err)
|
||||
}
|
||||
if _, werr := em.Write(frame); werr != nil {
|
||||
b.Fatalf("emulator.Write: %v", werr)
|
||||
}
|
||||
_ = em.Close()
|
||||
}
|
||||
}
|
||||
|
||||
func BenchmarkEmulator_Write_TrueColor_Frame(b *testing.B) {
|
||||
frame := buildASCIIVideoFrameTrueColor(80, 40)
|
||||
b.SetBytes(int64(len(frame)))
|
||||
b.ReportAllocs()
|
||||
b.ResetTimer()
|
||||
for i := 0; i < b.N; i++ {
|
||||
em, err := vt.NewGhosttyEmulator(80, 40)
|
||||
if err != nil {
|
||||
b.Fatalf("emulator: %v", err)
|
||||
}
|
||||
if _, werr := em.Write(frame); werr != nil {
|
||||
b.Fatalf("emulator.Write: %v", werr)
|
||||
}
|
||||
_ = em.Close()
|
||||
}
|
||||
}
|
||||
|
||||
// BenchmarkEmulator_Write_Stream_120fps reuses one emulator across
|
||||
// 360 frames (3 sec × 120 fps). This is the cleanest measurement
|
||||
// of em.Write steady-state cost.
|
||||
func BenchmarkEmulator_Write_Stream_8Color_120fps(b *testing.B) {
|
||||
frame := buildASCIIVideoFrame(80, 40)
|
||||
const frames = 360
|
||||
b.SetBytes(int64(len(frame) * frames))
|
||||
b.ReportAllocs()
|
||||
b.ResetTimer()
|
||||
for i := 0; i < b.N; i++ {
|
||||
em, err := vt.NewGhosttyEmulator(80, 40)
|
||||
if err != nil {
|
||||
b.Fatalf("emulator: %v", err)
|
||||
}
|
||||
for f := 0; f < frames; f++ {
|
||||
if _, werr := em.Write(frame); werr != nil {
|
||||
b.Fatalf("emulator.Write: %v", werr)
|
||||
}
|
||||
}
|
||||
_ = em.Close()
|
||||
}
|
||||
nsPerFrame := float64(b.Elapsed().Nanoseconds()) / float64(b.N*frames)
|
||||
b.ReportMetric(nsPerFrame/1000.0, "µs/frame")
|
||||
b.ReportMetric(1e9/nsPerFrame, "fps_ceiling")
|
||||
}
|
||||
|
||||
func BenchmarkEmulator_Write_Stream_TrueColor_120fps(b *testing.B) {
|
||||
frame := buildASCIIVideoFrameTrueColor(80, 40)
|
||||
const frames = 360
|
||||
b.SetBytes(int64(len(frame) * frames))
|
||||
b.ReportAllocs()
|
||||
b.ResetTimer()
|
||||
for i := 0; i < b.N; i++ {
|
||||
em, err := vt.NewGhosttyEmulator(80, 40)
|
||||
if err != nil {
|
||||
b.Fatalf("emulator: %v", err)
|
||||
}
|
||||
for f := 0; f < frames; f++ {
|
||||
if _, werr := em.Write(frame); werr != nil {
|
||||
b.Fatalf("emulator.Write: %v", werr)
|
||||
}
|
||||
}
|
||||
_ = em.Close()
|
||||
}
|
||||
nsPerFrame := float64(b.Elapsed().Nanoseconds()) / float64(b.N*frames)
|
||||
b.ReportMetric(nsPerFrame/1000.0, "µs/frame")
|
||||
b.ReportMetric(1e9/nsPerFrame, "fps_ceiling")
|
||||
}
|
||||
|
||||
// runPipelineStreamBench includes the libghostty-vt emulator.Write
|
||||
// CGO call and a stdout write to io.Discard alongside the renderer
|
||||
// — i.e. everything OnPTYOut does in production except the host
|
||||
// terminal's own paint time (which patterm doesn't control). This
|
||||
// is the honest "can we hit N fps end-to-end?" measurement.
|
||||
func runPipelineStreamBench(b *testing.B, frame []byte, fps int) {
|
||||
frames := fps * 3
|
||||
totalBytes := int64(len(frame) * frames)
|
||||
b.SetBytes(totalBytes)
|
||||
b.ReportAllocs()
|
||||
b.ResetTimer()
|
||||
for i := 0; i < b.N; i++ {
|
||||
em, err := vt.NewGhosttyEmulator(80, 40)
|
||||
if err != nil {
|
||||
b.Fatalf("emulator: %v", err)
|
||||
}
|
||||
vr := newViewportRenderer(newTerminalLayout(120, 40))
|
||||
for f := 0; f < frames; f++ {
|
||||
if _, werr := em.Write(frame); werr != nil {
|
||||
b.Fatalf("emulator.Write: %v", werr)
|
||||
}
|
||||
out := vr.Render(frame)
|
||||
// Match OnPTYOut's autowrap prelude/postlude wrapping so
|
||||
// the byte count is faithful.
|
||||
_, _ = io.Discard.Write([]byte("\x1b[?7l"))
|
||||
_, _ = io.Discard.Write(out)
|
||||
_, _ = io.Discard.Write([]byte("\x1b[?7h"))
|
||||
}
|
||||
_ = em.Close()
|
||||
}
|
||||
nsPerFrame := float64(b.Elapsed().Nanoseconds()) / float64(b.N*frames)
|
||||
b.ReportMetric(nsPerFrame/1000.0, "µs/frame")
|
||||
b.ReportMetric(1e9/nsPerFrame, "fps_ceiling")
|
||||
budgetNs := 1e9 / float64(fps)
|
||||
b.ReportMetric(nsPerFrame/budgetNs*100, "budget_pct")
|
||||
}
|
||||
|
||||
// BenchmarkPipeline_ASCIIVideo_* — the FULL OnPTYOut path
|
||||
// (emulator.Write CGO + viewport renderer + a stdout write to
|
||||
// io.Discard) running at 30/60/120 fps targets. These are the
|
||||
// numbers to trust when asking "can we sustain N fps?" The
|
||||
// renderer-only Stream benchmarks above isolate one stage and
|
||||
// understate the real cost.
|
||||
//
|
||||
// 120 fps is the explicit baseline: anything under 100% of the
|
||||
// per-frame budget here means we hit 120 fps with margin to spare.
|
||||
func BenchmarkPipeline_ASCIIVideo_8Color_30fps(b *testing.B) {
|
||||
runPipelineStreamBench(b, buildASCIIVideoFrame(80, 40), 30)
|
||||
}
|
||||
func BenchmarkPipeline_ASCIIVideo_8Color_60fps(b *testing.B) {
|
||||
runPipelineStreamBench(b, buildASCIIVideoFrame(80, 40), 60)
|
||||
}
|
||||
func BenchmarkPipeline_ASCIIVideo_8Color_120fps(b *testing.B) {
|
||||
runPipelineStreamBench(b, buildASCIIVideoFrame(80, 40), 120)
|
||||
}
|
||||
|
||||
func BenchmarkPipeline_ASCIIVideo_TrueColor_30fps(b *testing.B) {
|
||||
runPipelineStreamBench(b, buildASCIIVideoFrameTrueColor(80, 40), 30)
|
||||
}
|
||||
func BenchmarkPipeline_ASCIIVideo_TrueColor_60fps(b *testing.B) {
|
||||
runPipelineStreamBench(b, buildASCIIVideoFrameTrueColor(80, 40), 60)
|
||||
}
|
||||
func BenchmarkPipeline_ASCIIVideo_TrueColor_120fps(b *testing.B) {
|
||||
runPipelineStreamBench(b, buildASCIIVideoFrameTrueColor(80, 40), 120)
|
||||
}
|
||||
|
||||
// BenchmarkSessionResume_5MiBStyled simulates the user's
|
||||
// motivating case: claude resuming a long chat session and dumping
|
||||
// the whole history. 5 MiB of styled output as a single Render
|
||||
// call. Numbers here tell us how long the visible "scrolling
|
||||
// while resume loads" window will be.
|
||||
func BenchmarkSessionResume_5MiBStyled(b *testing.B) {
|
||||
chunk := buildStyledLinesChunk(5 * 1024 * 1024)
|
||||
b.SetBytes(int64(len(chunk)))
|
||||
b.ReportAllocs()
|
||||
b.ResetTimer()
|
||||
for i := 0; i < b.N; i++ {
|
||||
vr := newViewportRenderer(newTerminalLayout(120, 40))
|
||||
_ = vr.Render(chunk)
|
||||
}
|
||||
}
|
||||
|
||||
// BenchmarkSessionResume_5MiBPlain same as above but pure text.
|
||||
// Lower bound — what we'd hit if the resume content were styling-
|
||||
// free.
|
||||
func BenchmarkSessionResume_5MiBPlain(b *testing.B) {
|
||||
chunk := buildPlainASCIIChunk(5 * 1024 * 1024)
|
||||
b.SetBytes(int64(len(chunk)))
|
||||
b.ReportAllocs()
|
||||
b.ResetTimer()
|
||||
for i := 0; i < b.N; i++ {
|
||||
vr := newViewportRenderer(newTerminalLayout(120, 40))
|
||||
_ = vr.Render(chunk)
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user