Comprehensive Codebase Audit Report - Ultra Cross-Referenced Pass - 2026-05-24
Scope
- Included: full repository audit across docs, structured memories, task records, source, tests, CI, e2e, benchmarks, configuration, security/governance, chat/retrieval/MCP, observability, release scripts, and prior Agent Smith trackers.
- Excluded: full browser execution, live service deployment, external SaaS security testing, and real remote OAuth validation. Those are converted into validation gates and tasks where needed.
- Timebox: one ultra-high planning pass in discovery mode, with direct local reads plus read-only domain subagent reports.
Acceptance Criteria
- Findings are evidence-backed and include severity plus confidence.
- High-impact decisions include council-style review with dissent.
- Findings are cross-referenced to existing
/tasksrecords before adding new tasks. - Sprint plan uses
Data/Tasksas the implementation surface andData/Pages/Tasks/Sprintsas narrative summary. - Planned validation gates are executable or explicitly deferred.
Evidence Reviewed
- Product/source-of-truth docs:
README.md,.github/copilot-instructions.md,.github/agents/smith.agent.md,.github/skills/codebase-audit-sprint-planner/SKILL.md,.github/skills/council/SKILL.md(slash command:/council). - Project wiki memories:
Data/Memories/Core/project-wiki-active-architecture.json,project-wiki-validation-command.json,project-wiki-test-architecture.json,project-wiki-chat-agent-provider.json,project-wiki-source-link-security-boundaries.json,project-wiki-operational-diagnostics-dashboard.json. - Prior trackers/audits:
logs/agent-smith-20260524-codebase-audit-ci-testing.md,logs/agent-smith-20260523-codebase-audit-task-vetting.md,logs/agent-smith-20260524-source-governance-sprint.md,Audit_20260521_191625.md. - Current task tracker at audit time: 111 JSON task records parsed, status distribution
Backlog=95,Done=15,Archived=1, and one duplicate key pair forTSK-0060. Follow-up on 2026-05-24 resolved this collision by renumbering the screenshot task toTSK-0117and addingScripts/Test-TaskRecords.ps1. - Source hotspot metrics: largest C# files include
ChatServices.cs2449 lines,MaintenanceAgentServices.cs1834,PagesAndChatTests.cs1370,SqliteMemorySmithDatabase.cs1269,MemoryApplicationService.cs1167,TaskDomainService.cs930,ChatToolCatalog.cs765. - UI hotspot metrics:
Chat.razor2326 lines,Admin.razor1033,Tasks.razor955,Pages.razor932,MemoryViewer.razor827. - Validation inventory:
dotnet test MemorySmith.Tests/MemorySmith.Tests.csproj --no-build --list-tests --verbosity normallisted 292 tests. - CI/release evidence:
.github/workflows/ci.yml,.github/workflows/docs-pages.yml,Scripts/Validate-Repo.ps1,e2e/tests/navigation-freeze.spec.ts,e2e/playwright.config.ts,Scripts/Redeploy-MemorySmithService.ps1. - Security/governance evidence:
Program.cs,MemorySmithRequestGuardMiddleware.cs,SecurityServices.cs,TasksController.cs,MaintenanceAgentServices.cs,ChatServices.cs,AdminSettingsService.cs. - Package/advisory evidence:
dotnet test --list-testsrestore/build emitted NU1902 warnings forOpenTelemetry.ApiandOpenTelemetry.Exporter.OpenTelemetryProtocol1.15.0.
Findings
| ID | Domain | Severity | Confidence | Summary | Evidence | Task Mapping |
|---|---|---|---|---|---|---|
| F-001 | Architecture | High | 93% | Chat, maintenance, task, memory, and UI surfaces still have large files that increase review blast radius and regression risk. | Source metrics for ChatServices.cs, MaintenanceAgentServices.cs, MemoryApplicationService.cs, TaskDomainService.cs, Chat.razor, PagesAndChatTests.cs. |
Existing: TSK-0042, TSK-0043, TSK-0044, TSK-0045, TSK-0047, TSK-0049, TSK-0050. |
| F-002 | Chat/Agent Governance | High | 92% | Chat Agent writes still lack separate chat write-root options; approval-submitted page changes flow through maintenance write-root validation, so safe page proposals can still be blocked by maintenance constraints. | ChatOptions has AgentWritesEnabled only; BuildPageProposalChangeAsync creates page file changes; FileMaintenanceProposalStore validates via MaintenanceWritePermissionService; TSK-0016 audit comments reproduce the failure. |
Existing: TSK-0016, TSK-0021, TSK-0022, TSK-0019. |
| F-003 | Task Tracker Integrity | High | 97% | At audit time, the task tracker had duplicate key TSK-0060; follow-up resolved the collision and added task-record validation so future duplicates fail local/CI checks. |
Task integrity command found duplicate keys for source governance and screenshot capture; TaskDomainService.FindByIdOrKey uses FirstOrDefault over loaded items. Follow-up validation: Scripts/Test-TaskRecords.ps1 passed with 114 records and unique ids/keys. |
Completed: TSK-0114; related: TSK-0053, TSK-0029. |
| F-004 | CI/Browser Regression | High | 96% | Playwright navigation-freeze tests exist locally but are not run in GitHub Actions, so UI navigation/circuit regressions can merge without the browser gate. | Scripts/Validate-Repo.ps1 -IncludeE2E, e2e/tests/navigation-freeze.spec.ts, and e2e/playwright.config.ts exist; .github/workflows/ci.yml has no Playwright job. |
Existing: TSK-0067, TSK-0068, TSK-0069, TSK-0070, TSK-0071. |
| F-005 | Remote Security Hardening | High | 89% | Local-first auth/RBAC is strong, but remote deployment safety remains partially warning-first: no HSTS, no explicit secure cookie policy, and no forwarded-header/proxy trust controls in startup. | MemorySmithRequestGuardMiddleware blocks remote when AllowRemoteApi=false and API keys use fixed-time comparison; Program.cs only shows UseHttpsRedirection, not HSTS, secure cookie, or forwarded headers. |
Existing: TSK-0023, TSK-0037, TSK-0038, TSK-0039, TSK-0040, TSK-0041. |
| F-006 | Source Governance Drift | Medium | 90% | Source-read governance has partially landed in code and tests, but task/wiki surfaces lag: source read expansion and deny roots exist, while write-root separation remains deferred. | VarResolver supports context expansion, unrestricted read opt-in, and deny roots; AdminSettingsService exposes source-link controls; SecurityAndSourceLinkTests cover broad reads and deny roots; project-wiki-source-link-security-boundaries.json needed refresh. |
Existing: TSK-0011, TSK-0022, source-read TSK-0060; new consistency task TSK-0114. |
| F-007 | Observability Retargeting | Medium | 86% | Observability improved after prior tasks, so some task evidence is stale: request logging and correlation headers now exist, but central exception handling, ProblemDetails correlation, admin log search, and trend views remain open. | Program.cs now uses UseSerilogRequestLogging and X-Correlation-Id; repo memory records OTel v1; grep still found no UseExceptionHandler or AddProblemDetails. |
Existing: TSK-0105, TSK-0106, TSK-0107, TSK-0108; new: TSK-0115. |
| F-008 | Performance Measurement | Medium | 84% | Benchmark and search-quality tests exist, but no CI budget/regression comparison turns benchmark drift into an enforceable signal. | MemorySmith.Benchmarks/SearchBenchmarks.cs, [Category("Benchmark")] tests, README benchmark commands, and no benchmark job in .github/workflows/ci.yml. |
New: TSK-0115; related: TSK-0108, TSK-0069. |
| F-009 | Dependency Advisory Tracking | High | 91% | Current restore/test emits moderate NU1902 advisories for OpenTelemetry packages with no first-class task record tracking the upgrade/acceptance gate. | dotnet test MemorySmith.slnx --list-tests --verbosity quiet emitted NU1902 advisories for OpenTelemetry.Api and OpenTelemetry.Exporter.OpenTelemetryProtocol 1.15.0. |
New: TSK-0116; related: TSK-0113 post-implementation hardening. |
| F-010 | Historical Docs Noise | Medium | 82% | Older architecture/review docs still contain obsolete TODO/stub/security claims and are large enough to pollute search unless clearly classified. | MemorySmith.Core/Docs/Reviews/* and MemorySmith.Core/Docs/Plans/* include old claims; README and repo instructions already warn to verify against current code. |
Existing: TSK-0071, TSK-0046; related risk register item R-006. |
Existing Backlog Cross-Reference
| Area | Existing task set | Audit action |
|---|---|---|
| Chat write governance | TSK-0016, TSK-0017, TSK-0018, TSK-0019, TSK-0021, TSK-0022 |
Keep active; prioritize TSK-0016 and TSK-0022 before feature expansion. |
| Remote hardening | TSK-0023, TSK-0037, TSK-0038, TSK-0039, TSK-0040, TSK-0041 |
Keep active; make startup/transport/proxy decisions before remote-use docs are treated as safe. |
| Architecture decomposition | TSK-0042 through TSK-0051 |
Keep active; sequence after governance/CI gates to prevent refactor drift. |
| Task contract safety | TSK-0052, TSK-0053, TSK-0054, TSK-0055, TSK-0056 |
Add duplicate-key and stale-task consistency guardrail via TSK-0114. |
| Browser/CI validation | TSK-0067 through TSK-0071 |
Keep active; TSK-0067 is the highest-ROI CI gap. |
| Markdown/runtime work | TSK-0075 through TSK-0090 |
Defer behind stabilization unless a user-facing docs route blocks delivery. |
| Chat retrieval quality | TSK-0091 through TSK-0100 |
Keep active; prioritize only after write governance and browser gate. |
| Logging/OTel | TSK-0104 through TSK-0113 |
Retarget stale evidence and add package advisory task TSK-0116. |
Risk Register
- R-001: Duplicate task keys can make
/tasks/<key>ambiguous. Impact high, likelihood reduced afterTSK-0114; mitigation now includesScripts/Test-TaskRecords.ps1in local validation, CI, and the pre-commit hook. - R-002: Safe chat page approvals can fail through maintenance write-root coupling. Impact high, likelihood high, mitigation
TSK-0016plusTSK-0022. - R-003: Remote deployment can be configured into an unsafe posture through warning-only guardrails. Impact high, likelihood medium, mitigation
TSK-0023,TSK-0037,TSK-0038,TSK-0039. - R-004: Browser route/circuit regressions can merge without CI detection. Impact high, likelihood medium, mitigation
TSK-0067. - R-005: OTel package advisory warnings may normalize red builds/warnings if not tracked explicitly. Impact high, likelihood current, mitigation
TSK-0116. - R-006: Historical docs can mislead agents and humans when search surfaces stale review files before current code/wiki records. Impact medium, likelihood high, mitigation
TSK-0046,TSK-0071, and source-of-truth notes in future docs. - R-007: Large service/component files make stabilization harder because behavior, UI, and trust boundaries are reviewed in broad files. Impact medium-high, likelihood current, mitigation
TSK-0042,TSK-0043,TSK-0044,TSK-0047.
Open Questions
- Q-001: Should duplicate task keys be blocked at load time, write time, CI validation time, or all three? Proposed owner: task-domain implementer. Gate:
TSK-0114design review. - Q-002: Should chat write-root separation live under
MemorySmith:Chatonly, or should the proposal workflow accept caller-specific write-policy scopes? Proposed owner: chat/governance implementer. Gate:TSK-0022council-lite design note. - Q-003: Should
AllowRemoteApi=truewithoutApiKeybecome startup-fatal or remain admin-visible warning with blocked API/MCP? Proposed owner: security hardening implementer. Gate:TSK-0023. - Q-004: Should browser validation run on every PR or path-filtered UI/API changes only? Proposed owner: CI maintainer. Gate:
TSK-0069. - Q-005: What benchmark budgets are stable enough to enforce in CI versus report-only? Proposed owner: performance/observability implementer. Gate:
TSK-0115. - Q-006: Are current OpenTelemetry advisories exploitable in MemorySmith's local-first default configuration, or should they be accepted temporarily until fixed packages exist? Proposed owner: dependency hygiene implementer. Gate:
TSK-0116.
Confidence
- Audit evidence confidence: 88%.
- Sprint sequencing confidence: 80%.
- Residual uncertainty: no live browser run, no full dependency scanner beyond restore/test advisories, and no remote/proxy deployment execution in this pass.