Cancel pending timers when a child is closed

Stale timer bodies were re-delivered to the orchestrator pane after
the parent had already processed the sub-agent's reply and called
close_process. The timer registry held no link to the child
lifecycle, so timers owned by or watching the closed child lingered
until something triggered a fire — e.g. a trailing classifier tick
for the now-removed child.

Add an OnChildClosed hook to ChildEventListener, emit it from
Session.Close (and the terminal-corpse path in reapChild), and have
the timer manager prune the registry: cancel timers owned by the
closed child; remove the closed child from each timer's watched
list (cancel the timer outright when watched empties).

Natural exit deliberately does not route through this hook — the
classifier already emits an idle transition on exit which delivers
any legitimate "fire when sub-agent finishes" semantics exactly
once; cancelling on exit would swallow that.
This commit is contained in:
2026-05-18 12:37:32 +01:00
parent de60b93bc6
commit 34b41be1df
7 changed files with 278 additions and 4 deletions

View File

@@ -296,6 +296,65 @@ func (m *timerManager) onChildStateChanged(childID string, state IdleState) {
}
}
// onChildClosed drops pending timer references to childID. Called
// from Session.Close (and the terminal-corpse cleanup in reapChild)
// via the session listener bus — a deliberate signal from the host
// that childID is gone and the parent is not waiting on it anymore.
//
// Semantics:
// - timers owned by childID are cancelled and deleted: their owner
// is gone, so even if defaultFireFn's IsLive guard would no-op
// the delivery, the entry has no business surviving a close.
// - timers watching childID have childID pruned from t.watched
// (and t.idleBaseline). If t.watched becomes empty the timer is
// cancelled and deleted; we deliberately do NOT synthesise a
// fire here. The parent already received any legitimate idle
// transition before close_process — see allWatchedIdleLocked's
// "treat as satisfied" comment, which only applies to a
// concurrent re-evaluation, not to this explicit-removal hook.
//
// The natural-exit path (reapChild → emitExit for agent/command
// kinds) is NOT routed through here: the classifier emits a final
// idle transition on exit, which fires and deletes any watching
// timers exactly once. Cancelling on exit would swallow that
// legitimate fire and leave the parent never notified.
func (m *timerManager) onChildClosed(childID string) {
m.mu.Lock()
defer m.mu.Unlock()
for id, t := range m.timers {
if t.ownerID == childID {
if t.rt != nil {
t.rt.Stop()
t.rt = nil
}
t.status = timerStatusCanceled
delete(m.timers, id)
continue
}
if !contains(t.watched, childID) {
continue
}
pruned := t.watched[:0]
for _, w := range t.watched {
if w != childID {
pruned = append(pruned, w)
}
}
t.watched = pruned
if t.idleBaseline != nil {
delete(t.idleBaseline, childID)
}
if len(t.watched) == 0 {
if t.rt != nil {
t.rt.Stop()
t.rt = nil
}
t.status = timerStatusCanceled
delete(m.timers, id)
}
}
}
// allWatchedIdleLocked reports whether every watched child is now
// idle. Called with m.mu held — uses live Child.IdleState() under the
// child's own atomic, not under m.mu.