Claude MCP Benchmark Follow-up - 2026-05-26
This page reviews Claude MCP Benchmark Report - 2026-05-25 and Claude MCP Benchmark Report - 2026-05-26 against the current repository state as of 2026-05-26.
Summary
The reports were useful and specific, especially where they named exact memory IDs and stale source-link paths. Several findings were still actionable and were converted into task records; the public search-contract documentation item was completed the same day. Several others were already stale by the time of this review because the repo, docs, and tag policy had already moved forward.
I also fixed the concrete KB-health drift that the report identified where the repair was obvious and low risk.
Disposition
| Claude finding | Disposition | Evidence | Follow-up |
|---|---|---|---|
| Edit-gated MCP write tools appear to hang for Viewer callers | Accepted for investigation | MemorySmith.App/Controllers/McpController.cs currently looks fail-fast for ChatToolRisk.Write, but there is no focused unauthorized denial-path regression coverage and loopback bootstrap compatibility can mask behavior differences. |
TSK-0183 |
| Public docs do not explain MCP write permissions | Closed as stale | README.md and Data/Pages/features/api-and-mcp.md already document View, Edit, and Source bundle boundaries for the current tool set. |
No new task |
| Per-tool MCP disable/enable controls are missing | Closed as stale | MemorySmith:EnabledTools and DisabledTools are already exposed in admin settings and documented. |
No new task |
format=envelope and advanced search contract are undocumented for public users |
Completed | Runtime supported envelope, and the public docs were expanded on 2026-05-26 across README, guides/search-and-chat, features/api-and-mcp, features/search-system, and the structured wiki memory for MCP search tools. |
TSK-0184 done |
project-wiki and fixture-tag governance noise should be exempted or suppressed |
Closed as stale | Current Data/Policies/tag-policy.json allowlists project-wiki and does not blocklist the fixture tags called out in the report. |
No new task |
| External agents need a first-class MCP health/stats surface | Accepted | Runtime already has /health, /api/stats, and /api/diagnostics, but no equivalent MCP tool exists. |
TSK-0185 |
| External agents need better server-side search ergonomics such as recency filters, thresholds, and paging | Accepted | Current search surfaces expose only the core filters and bounded limits; the ergonomics gap is real and unowned. | TSK-0186 |
| Broken memory references and stale source-link paths in structured wiki records | Fixed directly | The named records still had stale page paths and one working memory still referenced missing IDs. | Repaired during this audit |
KB Fixes Applied Directly
These were concrete wiki-data repairs, not code backlog items:
- Removed broken
Referencesentries fromtask-tracking-feature-20260523. - Updated moved page-path source links in
project-wiki-memory-status-classification-current. - Updated moved page-path source links in
project-wiki-markdown-pages. - Updated moved page-path source links in
ai-memory-suite-implementation-plan-20260520. - Updated moved page-path source links in
ai-memory-suite-governance-foundation-20260520. - Updated moved page-path source links in
memory-system-rfc-council-review-20260520.
Focused validation after those repairs: the edited memory JSON files parsed cleanly and each updated %MemorySmithRepo%... source-link target resolved to an existing file.
Notes Back To Claude
If the write-tool timeout is still reproducible on a current build, the most helpful follow-up evidence would be:
- The exact JSON-RPC request and the raw response or timeout behavior.
- Whether the target instance had an Admin account provisioned yet, because loopback bootstrap compatibility changes the effective auth path before first-admin setup.
- Whether the caller was using plain MCP HTTP, VS Code MCP, or another bridge/proxy layer.
Two other observations:
- The exact memory IDs and stale URIs in the KB-health section were the strongest part of the report because they were immediately testable and repairable.
- The permission-doc and tag-policy complaints were already stale by the time of this review, so future benchmark passes should note the repo snapshot or commit if possible.
Deferred For Later Review
I did not create new tasks yet for every longer-horizon recommendation in the report. In particular, semantic page search, memorysmith_propose, and larger diagnostics-envelope shaping changes still need product-scoping rather than straight backlog cloning from an external audit.