Council Decision: Audit #9 Implementation Track — Ship Review & Next-Task Ordering

Date: 2026-06-12 Decision: Ship TSK-0042 step 1 (unified chat tool loop) as staged; the proposed next-task ordering (TSK-0042 step 2 → TSK-0189 → TSK-0283 → TSK-0284, TSK-0280 opportunistic) is correct given the confirmed step-2 split scope.

Scope of impact: long-lived architecture + chat/agent execution path (the tool loop is the write-path's engine). Seats: Source-Grounded Archivist, Skeptical Reviewer, Synthesizer (3 of 6 — architecture refactor, no schema/retrieval-ranking change).

Evidence Reviewed

Findings

Seat Recommendation Confidence Blocking Concern
Source-Grounded Archivist Ship as-is. Single-loop claim, streaming-surface preservation (prefix buffering, traces with context merge, run-control strings, iteration-limit message), message-append order, usage-aggregation order, and log event ids all verified against source. Stale items: audit §3 line citations; TSK-0042 status. 94% None
Skeptical Reviewer Ship; no blocker. Exception/cancellation paths verified safe through the iterator into AgentSessionService's TimeoutPhase guard; no source-inspection test reads ChatServices.cs. MEDIUM: parity-test gaps (run-control non-effect on SendAsync, multi-iteration usage parity, prefix-buffer path unexercised). LOW: ordering question — confirm step-2 split keeps provider-acquisition arms in ToolLoop.cs. 72% None
Synthesizer Ship now; close coverage gaps no later than step-2 PR; gate step 2 on the acquisition-arms check. 89% None

Synthesis

Shipped now (corrections applied same session, ahead of the synthesizer's deferral allowance): - The three coverage gaps were closed immediately: SendAsync_IgnoresRunControlStop_ByDesign (pins the documented asymmetry), a two-iteration reply-parity test, and Stream_BuffersToolCallEnvelope_AndFlushesPlainAnswerDeltas (multi-chunk ChunkedScriptProvider exercising the IsPotentialToolCallPrefix path). - TSK-0042 → InProgress (step 1 done); audit §3 citation note added.

Material finding from the corrections: writing the usage-parity assertion exposed a pre-existing divergence — on multi-iteration turns with no provider-reported usage, the streaming path constructs each iteration's response with finalChunk?.Usage ?? currentUsage, feeding the prior aggregate back into the estimator, while SendAsync estimates from scratch. Measured on an identical two-tool-call script: stream 17,756 input / 100 output tokens vs send 13,992 / 59. The unification faithfully preserves both old behaviors; fixing the estimator is TSK-0248's scope (evidence recorded there). A behavior-preserving refactor must not change it silently — this is the Branch B/D pattern: measurable divergence documented, fix deferred to the owning task with the measurement attached.

Deferred: - TSK-0042 step 2 (file split: protocol records → ChatProtocol.cs; each provider → own file). Gate below. - TSK-0189 (chat turn engine) after step 2; TSK-0283 (provider seam) after step 2; TSK-0284 thereafter; TSK-0280 anytime. - TSK-0248 now carries the usage-divergence evidence and owns the estimator fix.

Evidence gates: - Step-2 PR review MUST confirm the provider-acquisition arms (provider.StreamAsync accumulation and provider.CompleteAsync branches) remain in MemoryChatAgent.ToolLoop.cs. If a split design would move them, do TSK-0283 first (provider code then moves once). - TSK-0283 does not begin before step 2 is merged and CI-green.

Dissent

Archivist (94%) vs Skeptic (72%) — not resolved by consensus. The gap reflects risk tolerance around test-coverage gaps rather than any identified defect. The Skeptic's position is recorded: the parity suite as first written could not catch drift in run-control asymmetry, multi-iteration usage, or prefix buffering. The same-session corrections closed those gaps, and the usage assertion immediately found a real divergence — vindicating the dissent's premise that the gaps were load-bearing. The estimator divergence itself remains open (deferred to TSK-0248, not flattened into this ship decision).

Acceptance Criteria

Open Questions