Trim actioned perf-audit items; add palette polish TODO

Removes the 2026-05-15 perf audit findings that have either shipped (see CHANGELOG) or are tracked elsewhere, and replaces them with the remaining palette-refinement notes: generic labels for focused actions ("Close current agent") and a higher-level concern that the palette has grown cluttered as features were added.
Use mise to install zig + go in release CI; cut 0.0.4
2026-05-15 19:53:51 +01:00 · 2026-05-15 19:38:13 +01:00 · 2026-05-15 19:27:42 +01:00
6 changed files with 82 additions and 121 deletions
--- a/.gitea/workflows/release.yml
+++ b/.gitea/workflows/release.yml
@@ -11,14 +11,19 @@ jobs:
    steps:
      - uses: actions/checkout@v4
-      - uses: actions/setup-go@v5
+      - uses: jdx/mise-action@v2
        with:
          go-version-file: go.mod
          cache: true
-      - uses: mlugg/setup-zig@v2
+      - name: Cache Go modules
        uses: actions/cache@v4
        with:
-          version: 0.15.2
+          path: |
            ~/.cache/go-build
            ~/go/pkg/mod
          key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
          restore-keys: |
            ${{ runner.os }}-go-
      - name: Build libghostty-vt
        run: make deps
--- a/.mise.toml
+++ b/.mise.toml
@@ -3,6 +3,8 @@
 # libghostty-vt is built from a pinned upstream Ghostty commit; that
 # commit's build.zig.zon pins minimum_zig_version = 0.15.2. We match
 # it here so contributors don't have to puzzle out the version from
-# a deep upstream file.
+# a deep upstream file. The go pin matches go.mod so CI and local
 # builds use the same toolchain.
 [tools]
 zig = "0.15.2"
 go = "1.26.3"
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,6 +6,24 @@ loosely follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ## [Unreleased]
 ## [0.0.4] - 2026-05-15
 ### Changed
 - Release workflow (`.gitea/workflows/release.yml`) now provisions
  Zig and Go through `jdx/mise-action@v2`, reading the versions from
  `.mise.toml` (zig 0.15.2, go 1.26.3). Both toolchains were
  previously installed via `mlugg/setup-zig` and `actions/setup-go`,
  whose mirror chase / GitHub fetch combined for ~8 minutes per run
  before any patterm code compiled. mise pulls each tool once and
  caches the install dir, so subsequent runs hit the cache instead of
  re-downloading. `make deps` still resolves zig via `mise which zig`
  with a PATH fallback; `go.mod` already pinned `go 1.26.3`, so the
  new `go` entry in `.mise.toml` just keeps CI and local builds on
  the same toolchain.
 - A Go module/build cache step (`actions/cache@v4`, keyed on
  `go.sum`) was added so `go build` doesn't re-download dependencies
  on every tag push.
 ## [0.0.3] - 2026-05-15
 ### Added
@@ -70,6 +88,8 @@ loosely follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
  the command field.
 ### Fixed
 - Error/status flashes now restore the currently focused pane instead
  of drawing the empty-state hint over a running agent or process.
 - Release workflow (`.gitea/workflows/release.yml`) now uses
  `mlugg/setup-zig@v2` instead of the deprecated `@v1`. v1 hard-coded
  the pre-0.14 tarball name (`zig-linux-x86_64-<ver>.tar.xz`), so
--- a/TODO.md
+++ b/TODO.md
@@ -1,115 +1,3 @@
-# Perf Audit (reviewed 2026-05-15)
+The close action in the command palette should just be "Close current agent" rather than "Close codex"
-Findings that survived the 2026-05-15 review pass. Low and marginal
+Same with the other "focused" parts. It seems a bit clunky right now. "Close current agent"
-items from the original sweep were removed; remaining items have enough
+In general I think while the feature set has grown, the actual refinement of it isn't great, it feels a bit cluttered.
 measured or workflow evidence to justify action.
 Baseline benchmark numbers (`go test -bench=. ./internal/app/`, AMD
 Ryzen 7 7800X3D, libghostty-vt **ReleaseFast** after the Makefile
 fix landed):
 ```
 # Renderer alone
 ViewportRenderer_PlainASCII       229 MB/s     1.3 KB/op    6 allocs/op
 ViewportRenderer_StyledLines       89 MB/s    91   KB/op  4325 allocs/op
 ViewportRenderer_RatatuiBurst      40 MB/s   365   KB/op 17306 allocs/op
 RendererThroughput_ReuseInstance   90 MB/s   316   KB/op 17380 allocs/op
 ContainsOSC_NoOSC                3050 MB/s     0   B/op     0 allocs/op
 # ASCII-video stream (renderer only — 3 sec at the target fps)
 ASCIIVideo_Stream_8Color_120fps     260 µs/frame  3845 fps_ceiling   3.1% budget
 ASCIIVideo_Stream_TrueColor_120fps  576 µs/frame  1735 fps_ceiling   6.9% budget
 # Full pipeline (em.Write + renderer + io.Discard write)
 Pipeline_ASCIIVideo_8Color_120fps     493 µs/frame  2030 fps_ceiling   5.9% budget
 Pipeline_ASCIIVideo_TrueColor_120fps 1075 µs/frame   931 fps_ceiling  12.9% budget
 # Emulator alone (libghostty-vt CSI/SGR parser)
 Emulator_Write_Stream_8Color_120fps    257 µs/frame  3890 fps_ceiling
 Emulator_Write_Stream_TrueColor_120fps 488 µs/frame  2051 fps_ceiling
 ```
 The current pipeline still has large 120 fps headroom. The remaining
 renderer concern is multi-MiB styled replay latency and allocation
 churn, not normal steady-state frame budget.
 - [ ] **viewport renderer allocates heavily on SGR/CSI-heavy chunks.** [MEDIUM]
  - Review evidence: five benchmark reps confirmed
    `ViewportRenderer_StyledLines` at about 4,325 allocs per 16 KiB
    chunk (~91.5 KB/op, roughly 1 alloc per 3.8 input bytes), and
    `ViewportRenderer_RatatuiBurst` at about 17,306 allocs per chunk
    (~365 KB/op). A 5 MiB styled resume benchmark allocated about
    31 MB across 1.38M objects.
  - Likely hot paths: generic CSI/SGR output in
    `internal/app/viewport_renderer.go` sends many sequences through
    `vr.shifter.Shift(vr.buf)`, while `internal/app/cursorshift.go`
    returns a fresh `[]byte` via `pending.String()` on every
    `Shift` call and parses CSI params through `string(raw)` /
    `strings.Split`. The mode-helper `string(params)` conversions
    are real, but probably not the main SGR-heavy cost.
  - Fix direction: make `cursorShifter` write into caller-owned
    scratch output or directly into the viewport renderer's pending
    builder; parse CSI params from byte slices; pre-grow/reuse
    renderer and shifter buffers. Re-run styled-lines, ratatui, and
    5 MiB resume benchmarks; use pprof when available to confirm the
    top allocation sites.
 - [ ] **large styled resume/replay dumps spend visible time in viewport rendering.** [MEDIUM]
  - Review evidence: `BenchmarkSessionResume_5MiBStyled` measured
    about 58 ms median and 63 ms p95 over five reps. The plain 5 MiB
    benchmark was about 23-24 ms with only 21 allocs. The live path
    renders focused PTY chunks through `renderer.Render`, then still
    pays emulator writes, ring writes, event dispatch, stdout writes,
    and real terminal paint.
  - Scope: this is not a Codex steady-state throughput limit. A
    100 KB/s stream is far below the styled renderer's ~80-90 MB/s
    ceiling. It matters for multi-MiB burst replay, resume/startup
    dumps, and dense full-screen churn.
  - Fix direction: do the allocation fix first, since it should also
    improve throughput. After that, invest further only if styled
    resume traces remain user-visible or the styled-lines benchmark
    is still under roughly 300 MB/s.
 - [ ] **wait_for_pattern re-scans the entire stream/grid while waiting.** [MEDIUM]
  - `internal/app/host.go:476-493` (the `check` closure). On
    `scope="scrollback"` it calls `c.StreamRead(0)` followed by
    `stripANSIBytes(nil, b)`, so each check can copy, strip, and
    search the full 1 MiB ring. On `scope="grid"` it calls
    `PlainText()` and runs the regex against the full grid string.
  - Caveat from review: the current chunk notifier coalesces bursts
    with a buffered channel and has a 500 ms fallback, so this is not
    necessarily one full scan per PTY chunk. It is still meaningful
    for active waits on chatty panes.
  - Fix direction: for `scrollback`, track the last checked stream
    offset and search only new output plus a bounded overlap/scratch
    buffer so matches spanning chunks are not missed. For `grid`,
    dedupe on `ScreenVersion()` and skip work when the version has
    not changed.
 - [ ] **search_output rebuilds and searches whole scrollback on every call.** [MEDIUM]
  - `internal/app/host.go:428-437` compiles a fresh regex, reads the
    stream from offset 0, strips ANSI for `kind="rendered"`, converts
    the full buffer to a string, and splits it into lines before
    applying `limit`. This is meaningful when agents poll the same
    pattern; it is low impact for ad hoc searches.
  - Fix direction: cache compiled regexes by pattern; cache stripped
    rendered output by child id and stream end offset; avoid
    `strings.Split` over the whole ring when only the first `limit`
    matches are needed. Prefer an incremental search shape if this
    becomes the standard "watch for marker" path.
 # On Hold
 - [ ] There's a unicode <?> being displayed in opencode [ON HOLD]
  - Investigated 2026-05-14: patterm passes ghostty grapheme codepoints
    through unchanged (vt/ghostty.go:452-462), so the `<?>` glyph is
    most likely the *host* terminal's font fallback for opencode's
    Nerd Font private-use codepoints, not a patterm substitution.
    Need a concrete reproduction (which codepoint, which host
    terminal/font) before changing rendering.
 - [ ] After codex rips for like 15 minutes, the terminal becomes quite slow. [ON HOLD / VERIFYING]
  - 2026-05-14: Perf plan P1-P11 landed (see CHANGELOG). Needs a real
    long-running codex session to confirm whether the steady-state
    slowdown is gone or some hotspot remains. Capture a pprof if it
    still feels slow after ≥15 minutes — the structural drivers the
    audit named are all addressed, so a remaining symptom is a new
    one and probably wants fresh profiling.
--- a/internal/app/app.go
+++ b/internal/app/app.go
@@ -2268,8 +2268,17 @@ func (st *uiState) flashError(msg string) {
 	st.mu.Lock()
 	st.attentionText = msg
 	st.attentionAt = "" // shows on every focus until cleared
 	focusedPad := st.focusedPad
 	focusedID := st.focusedID
 	st.mu.Unlock()
-	st.renderEmptyState()
+	switch {
 	case focusedPad != "":
 		st.repaintFocusedPad()
 	case focusedID != "":
 		st.repaintFocused()
 	default:
 		st.renderEmptyState()
 	}
 	st.drawTabBar()
 	st.drawSidebar()
 	st.drawStatusLine()
--- a/internal/harness/scenarios/error_flash_preserves_focused_pane.json
+++ b/internal/harness/scenarios/error_flash_preserves_focused_pane.json
@@ -0,0 +1,37 @@
 {
  "name": "error_flash_preserves_focused_pane",
  "presets": {
    "processes": [
      {
        "name": "steady",
        "argv": ["sh", "-lc", "printf 'STEADY READY\\n'; sleep 5"]
      }
    ]
  },
  "trust": ["steady"],
  "steps": [
    {
      "type": "mcp_call",
      "method": "spawn_process",
      "params": {"kind": "command", "preset": "steady", "name": "steady"},
      "save_as": "proc"
    },
    { "type": "wait_text", "contains": "STEADY READY", "timeout_ms": 5000 },
    { "type": "send_chord", "chord": "ctrl-k" },
    { "type": "send_text", "text": "Open Settings" },
    { "type": "send_chord", "chord": "enter" },
    { "type": "send_chord", "chord": "enter" },
    { "type": "send_chord", "chord": "ctrl-n" },
    { "type": "send_chord", "chord": "ctrl-n" },
    { "type": "send_chord", "chord": "ctrl-n" },
    { "type": "send_chord", "chord": "ctrl-n" },
    { "type": "send_chord", "chord": "ctrl-n" },
    { "type": "send_chord", "chord": "ctrl-n" },
    { "type": "send_chord", "chord": "ctrl-n" },
    { "type": "send_chord", "chord": "enter" },
    { "type": "wait_text", "contains": "no active top-level agent to summarize", "timeout_ms": 5000 },
    { "type": "wait_text", "contains": "STEADY READY", "timeout_ms": 5000 },
    { "type": "assert_contains", "contains": "STEADY READY" },
    { "type": "assert_not_contains", "contains": "Press Ctrl-K to spawn an agent or process" }
  ]
 }