Council Review: Audit Findings Validation and Sprint 46 Plan
Decision
Sprint 46 must shift from adding features to hardening observability, error-handling contracts, and runtime decomposition — the three audits collectively show that silent fallbacks and null-overloading are now the highest-leverage sources of brittleness, outweighing any feature gap.
Evidence Reviewed
Data/Pages/audits/sprint-45-audit-6-24.md— 21 findings across 6 scoring dimensions; runtime architecture scored 5/10, observability 4/10, error handling 4/10Data/Pages/audits/memorysmith_agent_deep_dive_audit_2026-06-24.md— 9 findings + 3 architectural opportunities; strongest on build-origin, null-overloading, and planner decompositionData/Pages/audits/memorysmith_agent_deep_dive_audit_addendum_2026-06-24.md— 6 supplementary findings; partial-origin acceptance at 95% confidence, auth fail-open at 97%Data/Pages/council/sprint35-llm-first-seat-synthesis-20260623.md— Sprint 45 Wave A/B plan- Source code:
BuildGoal.cs(lines 64-68,HasExplicitOriginOR logic),BuildFactKeys.cs(line 17, sentinel doc),HtnPlanner.cs(lines 169-196, catch→null + originalGoal unused),RestMemoryGatewayOptions.cs(lines 38-52, WorldKbUrl fallback),LlmChatInterpreter.cs(lines 330, 403, catch→null),WebSocketBridge.cs(line 378, catch→null), API provider files (Anthropic/Gemini/OpenAI catch→null),Program.cs(lines 392-400, stale version) - Core memories:
agent-runtime-lifecycle,agent-planner-architecture,agent-goal-types-catalog,agent-architecture-bounded-contexts,agent-phase-status-current - Task files: TSK-0083 through TSK-0099
- Council skill:
.github/skills/council/SKILL.md - Agent guidelines:
AGENTS.md
Methods
This review was conducted as a 6-seat subagent council with explicit user permission. Each seat was given the full evidence pack and asked to produce independent findings, risks, recommendations, assumptions, open questions, and confidence percentages. The Synthesizer merged all reviews into a unified plan.
Findings
| Seat | Recommendation | Confidence | Blocking concern |
|---|---|---|---|
| Source-Grounded Archivist | 13 of 14 validated claims are accurate; 8 are worse than originally claimed. The most actionable new finding is the 3-LLM-provider catch→null cluster (M-1) and the WebSocketBridge fire-and-forget receive loop (M-2). Fix those first. | 0.92 | The audit missed: ChatServices.cs 20+ bare catch blocks in the MemorySmith base repo. That's outside scope for the Agent repo but must be surfaced as a cross-repo request. |
| Data Model Architect | The null-overloading problem (11+ meanings) is the single highest-leverage schema fix. Introduce ReplanResult, ParseResult, and a BuildOrigin value object. These three schema changes eliminate entire classes of bugs without changing any behavioral logic. |
0.88 | BuildOrigin value object changes HasExplicitOrigin semantics from OR to AND — this is a breaking change requiring updates to all 4 origin-resolution touch points. Must be gated behind tests first. |
| Retrieval Specialist | World KB fallback (Validated Claim 7) is the only retrieval-affecting finding. The alias-drift (Claim 10) is a semantic retrieval concern — different canonical IDs produce different search results. Neither is P0. | 0.85 | The alias-drift (TSK-0099) has LOW priority but the semantic impact is real: "iron"→"raw_iron" vs "iron"→"iron_ore" means search returns different results depending on path. |
| Human Learning Advocate | The silent-failure culture hurts developers most. 5 distinct silent-failure patterns (catch→null, catch→return false, catch{}, fire-and-forget, ReceiveLoopAsync) mean every bug investigation starts with "did it actually fail?" not "what went wrong?" | 0.82 | Without structured error results, every bug requires reading debug logs. The 3 LLM providers' catch→null means LLM integration bugs are invisible until a user reports "the bot ignored me." |
| Skeptical Reviewer | The audit claims are well-supported but over-prioritize new architectural constructs. BuildOrigin value object and ReplanResult are good designs but building them doesn't fix the core problem — silent failures. Fix the catch→null blocks with logging first (5-minute fixes), then design new types. |
0.78 | Claim 14 (3 origin layers) is architecturally messy but has caused zero known bugs. Replan losing goal context (Claim 8) IS a real bug pattern. Merge claims 1, 2, 3, 14 into one "BuildOrigin consolidation" task. |
| Synthesizer | Sprint 46 must address ALL catch→null and fire-and-forget paths as P0, then BuildOrigin consolidation as P1, then runtime decomposition as P2. The 3-LLM-provider catch→null plus ChatServices catch blocks are the most impactful 10-minute fix in the entire backlog. | 0.90 | The Sprint 45 plan (Wave A/B) was well-executed based on the previous council report. These audits surface a deeper layer of issues that Wave A/B didn't touch. Sprint 46 needs a different theme. |
Synthesis
SPRINT 46 THEME: "Observability First" — fix all silent failure paths, then consolidate the build-origin model, then begin runtime decomposition planning.
Sprint 46 Plan
| Priority | Area | Description | Task ref |
|---|---|---|---|
| P0 | Observability | WebSocketBridge.ReceiveLoopAsync: wrap in try-catch with log + reconnect | TSK-0100 |
| P0 | Observability | All 7 catch→null paths: add LogWarning before returning null | TSK-0101 |
| P0 | Cross-repo | Document ChatServices.cs 20+ bare catch blocks as base-repo request | TSK-0102 |
| P1 | BuildOrigin | Consolidate BuildOrigin model: merge TSK-0098 + TSK-0095 into single value object | TSK-0103 |
| P1 | Planner | Introduce ReplanResult type + fix unused originalGoal parameter in ReplanAsync | TSK-0104 |
| P2 | Docs | Fix documentation drift: ReplanGovernor, HtnPlanner, phase-status memory | TSK-0105 |
| P2 | Aliases | Extract shared alias dictionaries (TSK-0099) — promote from P3 | TSK-0099 |
| P2 | Tests | Add error-path tests covering all catch→null and fire-and-forget locations | TSK-0106 |
| P3 | Tests | Complete remaining checkpoint tests (TSK-0083 remaining ~50%) | TSK-0083 |
| P3 | Architecture | Runtime decomposition planning: scope 6 interfaces, create extraction plan | TSK-0107 |
| P3 | Cleanup | State duplication (M-4): remove redundant _currentGoal field access | TSK-0108 |
Updated Existing Tasks
- TSK-0099 (alias extraction): PROMOTE from P3 to P2. The semantic drift (
"iron"→"raw_iron"vs"iron"→"iron_ore") has higher retrieval impact than previously assessed. Still defer to Sprint 46 Wave B after P0 items. - TSK-0098 (ReadOriginFact consolidation): ABSORB into TSK-0103 (BuildOrigin value object). Both implementations will be replaced by the shared resolver. Close TSK-0098 as "merged into TSK-0103."
- TSK-0083 (checkpoint tests): RETAIN at P3. The remaining ~50% of checkpoint tests are needed but not blocking anything. Move to Sprint 47.
- TSK-0084 (ApplySmeltComplete): RETAIN at P3. Defer until smelting functionality matures.
- TSK-0085 (HasFailed dead code): RETAIN at P3. Low priority — affects smelt and craft goals equally.
- TSK-0086 (stale doc comments): ABSORB by TSK-0105. Close as "merged into TSK-0105."
- TSK-0096 (mining double-counting): RETAIN at P1 (Wave B from Sprint 45). Not addressed by Sprint 46 theme. Move to Sprint 47.
- TSK-0093 (structured parse result): RETAIN at P1. The
ParseResult<T>type from TSK-0104 may absorb this — reassign after TSK-0104 implementation.
Dissent
Disagreement on BuildOrigin sentinel severity. The Source-Grounded Archivist and Data Model Architect rate the sentinel-overloading as HIGH because it violates the "no magic numbers" rule from AGENTS.md. The Skeptical Reviewer argues the collision probability is ~0.1% and the sentinel is documented. Resolution: Fix during BuildOrigin consolidation (P1) but do NOT prioritize separately.
Disagreement on ReplanResult priority. The Human Learning Advocate argues it should be P0 because "null overloading is the #1 debugging friction." The Skeptical Reviewer argues catch→log fixes are P0 while structural type changes are P1. Resolution: Two-phase approach. Catch→log is TSK-0101 (P0, 15 minutes). ReplanResult type is TSK-0104 (P1, 2 hours).
Disagreement on runtime decomposition timeline. The Data Model Architect wants decomposition to start immediately. The Skeptical Reviewer argues the God Object (~80KB) works and decomposing it prematurely could introduce bugs. Resolution: Planning (TSK-0107) starts Sprint 46. Extraction starts Sprint 47+.
Acceptance Criteria
- [ ] TSK-0100 (WebSocketBridge): Inject exception into ReceiveLoopAsync → log clearly states crash + reconnect attempts. No silent-death path remains.
- [ ] TSK-0101 (catch→log): All 7 locations produce
LogWarningoutput when exceptions occur. Mock logger verification in tests. - [ ] TSK-0102 (cross-repo request): Document created at
Data/Pages/MS-Requests/with specific file paths, line numbers, evidence, and fix recommendations. - [ ] TSK-0103 (BuildOrigin):
BuildOriginsealed record withSourceenum, nullable coordinates, andIsExplicit/IsResolvedproperties.HasExplicitOriginrequires all 3 axes. Sentinel(0,0,0)eliminated. 4 files changed. All existing tests pass. - [ ] TSK-0104 (ReplanResult):
ReplanResult<IPlan>replacesIPlan?return.originalGoalparameter is consumed. Typed goal data survives replanning. - [ ] TSK-0105 (docs): ReplanGovernor, HtnPlanner, and phase-status memory all reflect current behavior.
- [ ] TSK-0106 (error-path tests): 8+ new NUnit tests covering all catch→null and fire-and-forget paths.
Mock<ILogger<T>>verification. - [ ] Overall:
dotnet test MemorySmith.Agent.Testspasses with zero failures.dotnet buildpasses with zero warnings. - [ ] Overall: No P0 items remain open at Sprint 46 end. P1 items are either complete or have documented progress with clear next steps.
Open Questions
- WebSocketBridge ownership: Is
WebSocketBridgeowned byMinecraftAdapterorAgentBackgroundService? Need to verify the lifecycle before restructuring. - TSK-0102 handoff process: Who in the MemorySmith base repo owns ChatServices.cs? Suggest
@TheMasonXas initial reviewer. - LLM provider abstraction: The 3 LLM providers (Anthropic, Gemini, OpenAI) share the same catch→null pattern. Would a shared base class or middleware simplify all 3 fixes?
- Coverage collection: Should Sprint 46 include coverage instrumentation? The error-path tests (TSK-0106) would benefit from knowing which catch blocks are actually covered.
- Event loop health check: After fixing WebSocketBridge.ReceiveLoopAsync (TSK-0100), should we add a periodic health check ("PING" event every 30s, crash if no response in 60s)?
General Themes & Cross-Cutting Patterns
The three audits and this council review surfaced four major cross-cutting themes:
Theme 1: Silent Failure is the Dominant Risk Pattern (HIGHEST CONCENTRATION)
- 7 catch→null locations in 5 files
- 20+ bare catch blocks in ChatServices.cs (base repo)
- 11+ fire-and-forget calls in AgentBackgroundService
- WebSocketBridge entire receive loop fire-and-forget
- Pattern: The codebase is resilient-by-fallthrough. Every layer has a "don't crash" blanket catch, but the aggregate effect is opacity — when something fails, there's no breadcrumb trail.
Theme 2: Null is Overloaded Everywhere (11+ meanings)
- Parse failure, planner failure, LLM timeout, JSON error, network error, item not found, blueprint not found, not addressed, too far away, low confidence, unavailable
- Pattern: The codebase uses nullable return types as a universal error channel. Every nullable method has its own private semantics that callers must infer from context.
Theme 3: Architectural Migration Debt
- AgentBackgroundService is a documented God Object with 18+ responsibilities, 18 constructor parameters
- 3 separate BuildOrigin resolution paths in 3 layers
- HtnPlanner still has type-switch branches that docs claim were removed
- Dual state tracking (_currentGoal + facts, dual replan-throttle, dual ConcurrentDictionary)
- Pattern: The migration from deterministic-first to LLM-first architecture is incomplete. Old and new systems coexist with implicit priority rules.
Theme 4: Documentation Drift is a Trust Problem
/api/aboutreports Sprint 36 / v0.37.0 at Sprint 45+- ReplanGovernor doc says "60s timeout" — actual is graduated delay [10,20,30,60]s
- HtnPlanner doc says "type-switch removed" — code still has them
- Phase status memory says "Sprint 41" — codebase is at Sprint 45+
- Pattern: Documentation is treated as a one-time artifact, not a living resource. Each sprint adds drift.
Patterns That Were NOT Called Out in the Original Audits
This council discovered 8 missed patterns (M-1 through M-8):
| # | Pattern | Files | Severity | Found By |
|---|---|---|---|---|
| M-1 | LLM provider catch→null bare catches | Anthropic/Gemini/OpenAI providers | HIGH | Cross-reference LLM providers |
| M-2 | WebSocketBridge ReceiveLoopAsync fire-and-forget | WebSocketBridge.cs | CRITICAL | Trace event receive path |
| M-3 | int.MinValue sentinel | HtnPlanner + HtnTaskLibrary | MEDIUM | Search for sentinel values |
| M-4 | State duplication | AgentBackgroundService (multiple fields) | LOW-MEDIUM | Trace state ownership |
| M-5 | ReplanGovernor XML doc stale | ReplanGovernor.cs | LOW | Compare doc to code |
| M-6 | HtnPlanner XML doc misleading | HtnPlanner.cs | LOW | Compare doc to code |
| M-7 | ChatServices.cs bare catches (base repo) | MemorySmith.App/Services/ChatServices.cs | HIGH | Search for catch-all blocks |
| M-8 | Test coverage gaps | Tests project | MEDIUM | No tests for any catch→null path |
The common root cause across all missed patterns: the audits focused on what they set out to find (planner architecture, build origin, runtime design) and didn't systematically search for identical patterns in adjacent code (LLM providers, WebSocket bridge, base repo services).