Implementation Plan: AI Memory Suite

Status: Council-reviewed implementation plan, 2026-05-20

Scope: Planning only. No code implementation is implied by this page.

Related: Core Memory System Improvements RFC, Council Workflow, Search and Chat, Deep Research Intake Notes

Decision

Build MemorySmith into a governed AI memory suite by adding a custom per-wiki tag policy, warning-first staleness and maintenance behavior, source-backed Agent write proposals, structured tool outputs, and evidence-gated schema/page retrieval upgrades. The plan rejects both extremes: unvalidated free-form conventions and immediate broad schema expansion.

Overall confidence: 86%. The main reason this confidence is not higher is that MemorySmith already has automatic maintenance paths that can move records to Deprecated based on score, which conflicts with the warning-first design goal unless it is audited and changed deliberately.

User Direction Incorporated

Evidence Reviewed

Council Findings

Seat Recommendation Confidence Blocking Concern
Source-Grounded Archivist Accept convention-first only if every convention is backed by explicit docs, validators, and source-linked evidence. 76% Current plan must list all internal clients and output contracts before changing tool shapes.
Data Model Architect Use a hybrid path: convention first, then schema promotion after thresholds prove machine parsing is needed. 82% Promotion gates must be quantified before Phase 1 ends.
Retrieval Specialist Prioritize staleness warnings, tool metadata, and search probes before ranking or chunking changes. 80% Do not alter ranking or RRF without local measurements and rollback tests.
Human Learning Advocate Make tag rules, staleness, source evidence, and Agent proposals visible in the UI before relying on them. 86% Free-text tag entry will undermine conventions unless chips/autocomplete/diagnostics are added.
Skeptical Reviewer Conditional approval only after resolving silent deprecation, tag drift, source-link validation, and ranking-rule mismatches. 71% Current maintenance can silently move records to Deprecated by score, contradicting warning-first policy.
Synthesizer Proceed with a phased AI memory suite plan: governance first, behavior changes later. 84% Phase gates must be enforced, not treated as aspirational prose.

Ten Deliberation Rounds

Round 1: Scope And Freedom To Move

Because MemorySmith has no external clients outside the app, the plan can change internal APIs, JSON shapes, schemas, prompts, and UI workflows when doing so improves long-term usefulness. That freedom should not become casual churn. The active app, tests, Data/Memories fixture, Data/Pages wiki, chat prompt, MCP endpoint, and Blazor UI are all internal consumers that need migration as a unit.

Decision: allow broad internal changes, but version or gate every behavior that affects search ranking, status transitions, tool output shape, or Agent writes.

Round 2: Current Maintenance Contradiction

The most important finding is that warning-first staleness is not just future policy. Current maintenance can already mutate memory status automatically:

Decision: Phase 0 must audit and redesign maintenance before building staleness semantics. The suite should not silently bury important old Core knowledge.

Round 3: Tag Schema

Tags should remain human-readable, but they need a policy layer. The proper schema is not one global taxonomy. It is a per-wiki tag policy with a small reserved core plus custom topic vocabularies.

Recommended starting schema:

Category Form Values Or Rule Purpose
Plain topic tags lower-kebab-case Custom per wiki, allowlisted or suggestion-only Domain discovery: search, chat, mcp, pages, ui, storage, tests.
Kind kind:<value> fact, rule, procedure, decision, plan, research, guide, concept, issue, example, index Tells agents and humans what kind of knowledge this is.
Priority priority:<value> critical, high, normal, low Review and retrieval hint, not ranking behavior until tested.
Audience audience:<value> agent, human, chat, developer, admin Helps format context and docs.
Scope scope:<value> Custom per wiki Local project boundary: app, storage, tests, docs, security, data, etc.
Review review-after:YYYY-MM Valid month only Warn that the record needs review.
Expiration expires:YYYY-MM Valid month only Warn that the record may be invalid after the date.
Stale risk stale-risk:YYYY-MM Valid month only Warn that the topic tends to drift after the date.
Supersession supersedes:<memory-id> Valid memory ID Temporary convention until typed relations are justified.
Superseded by superseded-by:<memory-id> Valid memory ID Temporary convention until typed relations are justified.

Rules:

Round 4: Tag Policy, Allowlist, Blocklist, And Lexical Analysis

The tag policy should be a file-backed, per-wiki configuration managed by the app. Proposed storage:

{
  "schemaVersion": 1,
  "mode": "warn",
  "namespaces": [
    {
      "name": "kind",
      "cardinality": "single",
      "allowedValues": ["fact", "rule", "procedure", "decision", "plan", "research", "guide", "concept", "issue", "example", "index"]
    }
  ],
  "plainTags": {
    "mode": "allowWithSuggestions",
    "allowlist": ["project-wiki", "search", "chat", "mcp", "pages", "ui", "storage", "tests"],
    "blocklist": ["misc", "general", "important", "stuff"],
    "aliases": {
      "retrieval": "search",
      "semantic-searching": "semantic-search"
    }
  }
}

The exact path should be chosen during implementation. Prefer a data-root path such as Data/Policies/tag-policy.json so the policy travels with the wiki instance and can be copied in tests. Do not store canonical tag policy only in user settings or appsettings; that would separate the rules from the knowledge base.

Lexical analysis should propose governance actions, not apply them automatically. Suggested analyses:

The Tag Manager should show suggested actions:

Round 5: Staleness, Expiration, And Supersession

Staleness is a warning and review workflow first, not a ranking formula. A memory can be old and still authoritative. A newly written record can be wrong. The memory suite needs explicit signals:

Phase 0 should stop unreviewed automatic deprecation from hiding records. Phase 1 should show stale/expired/superseded diagnostics in memory detail, search results, context packs, and chat references. Phase 2 should measure whether warnings are enough. Only Phase 3 may add filters or ranking changes, and only with tests and a clear override.

Round 6: Schema Evolution And Relations

The current MemoryRecord model is simple enough to inspect and edit. Keep that advantage while adding governance around it. Schema promotion should happen only when a convention needs reliable machine parsing.

Promotion gates:

Likely promotion order if gates pass:

  1. ReviewAfter and ValidUntil date fields, because stale safety is central and dates are fragile as tags.
  2. Kind enum, if kind: tags become common and UI/search behavior depends on them.
  3. Priority enum, only after retrieval probes prove it helps.
  4. Relations typed edge list, if supersession/dependency/conflict workflows become common.

Candidate typed relation model:

{
  "type": "Supersedes",
  "targetId": "project-wiki-old-record",
  "note": "Replaces the old deployment guidance after single-host consolidation."
}

Keep References and Conflicts until typed relations prove they can replace or derive those arrays without making records harder to read.

Round 7: Retrieval, Pages, And Chunking

Search is already a core strength: lexical, semantic, hybrid, context packs, source bundles, and tests exist. The next step is safer retrieval output, not cleverer ranking.

Near-term retrieval changes:

Page retrieval should become first-class but measured:

Chunking trigger proposal:

Round 8: Tool Outputs, MCP, And Chat

Agent-facing tools should return strict structured output with human-readable summaries. Markdown-only results are readable, but they force agents to parse prose.

Recommended tool envelope:

{
  "schemaVersion": "memorysmith.tool.v1",
  "query": "source links file references",
  "items": [],
  "warnings": [],
  "diagnostics": [],
  "summary": "Found 3 relevant memory records."
}

Plan:

Because no external clients need preserving, the app can eventually make JSON the default for MemorySmith's own agent path. Do that only after local chat, tests, and docs are updated.

Round 9: Agent Write Governance

Current Agent memory proposals are too thin for a governed memory suite: they include ID, title, content, tags, status, and confidence, but not source links, page citations, rationale, alternatives, validation diagnostics, or risk level. Agent writes should be reviewable proposals, not trusted writes.

Proposed proposal model additions:

Approval policy:

Round 10: Validation, Rollout, And Governance

Implementation should proceed by gates, not enthusiasm. Each phase should have tests, doc updates, rollback notes, and a council trigger for high-impact changes.

Council triggers:

Non-council changes:

Final Architecture Target

MemorySmith should become a local-first memory suite with these layers:

  1. Record Layer - JSON memory records remain inspectable and source-linked.
  2. Policy Layer - per-wiki tag policy, validation mode, aliases, allowlist, blocklist, and governance settings.
  3. Diagnostics Layer - tag, source-link, relation, staleness, maintenance, and chunking diagnostics.
  4. Retrieval Layer - lexical, semantic, hybrid, page, chunk, context-pack, and source-bundle retrieval with structured warnings.
  5. Human Workbench Layer - /memories, /pages, /chat, /health, /variables, and future /admin/tags surfaces for review and learning.
  6. Agent Governance Layer - source-backed proposals, trace evidence, approval gates, and council workflows for high-impact changes.
  7. Measurement Layer - search quality probes, page corpus stats, stale-result metrics, tag drift metrics, and Agent write review outcomes.

Implementation Phases

Phase 0: Freeze Risky Assumptions And Measure Baseline

Goal: prevent current automation from undermining the plan.

Tasks:

  1. Audit maintenance status transitions in MemoryMaintenanceTasks, MemoryMaintenanceService, MemoryStateMachine, and MemoryScorer.
  2. Decide whether automatic deprecation remains allowed. Recommended default: recommendation-only until staleness and review diagnostics exist.
  3. Add a planning note to docs explaining current maintenance behavior and the intended warning-first replacement.
  4. Inventory existing tags across Data/Memories.
  5. Inventory SourceLinks and unresolved %VarName% references.
  6. Capture current search benchmark results and top known probes.
  7. Capture page corpus statistics: count, size distribution, heading distribution.

Acceptance gates:

Phase 1: Policy And Diagnostics Foundation

Goal: create the governance model without changing ranking.

Tasks:

  1. Define TagPolicy and TagDiagnostic models.
  2. Choose file-backed policy storage under the wiki data root.
  3. Implement canonical tag validation: namespaces, dates, cardinality, aliases, blocked tags, and malformed tags.
  4. Implement source-link diagnostics: missing variables, missing files, invalid line ranges, disallowed roots, and oversized reads.
  5. Implement relationship diagnostics for References, Conflicts, supersedes, and superseded-by targets.
  6. Implement staleness diagnostics for review-after, expires, stale-risk, LastUpdated, and Deprecated status.
  7. Add diagnostics to memory save/update paths as warnings first.
  8. Add docs: memory writing guide, tag policy guide, source-link guide, and examples.

Acceptance gates:

Phase 2: Human Workbench And Tag Manager

Goal: make governance visible and easy to use.

Tasks:

  1. Replace comma-only tag editing in /memories with tag chips and autocomplete.
  2. Show tag validation warnings inline before save.
  3. Add a Tag Manager surface for allowlist, blocklist, aliases, namespace values, usage counts, and suggested merges.
  4. Add lexical-analysis suggestions for blocklist/alias candidates.
  5. Add staleness and supersession badges to memory list/detail/search results.
  6. Add source-link health indicators in the memory editor.
  7. Add a diagnostics panel per memory with tag/source/relation/stale issues.
  8. Add admin controls for tag policy mode: observe, warn, block invalid namespace values, block all unknown tags.

Acceptance gates:

Phase 3: Retrieval Output Safety

Goal: teach search, MCP, and chat to carry warnings and provenance.

Tasks:

  1. Add diagnostics to lexical, semantic, hybrid, unified, and page result DTOs.
  2. Add structured warnings to memorysmith_context_pack JSON output.
  3. Add schemaVersion and structured envelope support to MCP tool results.
  4. Expose ONNX vs token fallback in semantic match reasons or metadata.
  5. Add chat trace rendering for stale/source/tag/relation warnings.
  6. Add reference drawer chips for strict rules, stale records, expired records, and superseded records.
  7. Keep ranking unchanged until probes prove a change is helpful.

Acceptance gates:

Phase 4: Agent Write Governance

Goal: make Agent writes auditable and safe enough for a knowledge base.

Tasks:

  1. Expand Agent proposal models with evidence, citations, rationale, diagnostics, risk, and diff fields.
  2. Add proposal prevalidation before approval.
  3. Add approval checklist UI with tag/source/status/confidence/relation checks.
  4. Add RBAC rules: stricter approval for Core, strict-rule, critical, supersession, expired, or source-link-changing proposals.
  5. Add rejection reasons and reviewer notes.
  6. Log proposal outcomes for quality metrics.
  7. Update chat prompt and tool instructions so agents prefer source-backed proposals.

Acceptance gates:

Phase 5: Measurement And Promotion Gates

Goal: decide whether to promote tags into schema or alter retrieval behavior.

Tasks:

  1. Measure tag drift: unknown tags, alias suggestions, blocked attempts, duplicate clusters.
  2. Measure stale-result impact: top-k stale rates, expired citations, user overrides, answer quality probes.
  3. Measure Agent proposal quality: rejection rate, missing-source rate, approval time.
  4. Measure page retrieval: page length distribution, miss rate, Recall@5/MRR on page queries.
  5. Decide whether specific conventions should become schema fields.
  6. Run a focused council review before each schema/ranking/chunking promotion.

Acceptance gates:

Phase 6: Optional Schema Promotion

Goal: move proven conventions into stable model fields.

Candidate tasks:

  1. Promote ReviewAfter and ValidUntil to optional date fields if date tags prove fragile or widely used.
  2. Promote Kind to enum if kind: becomes central to retrieval/UI.
  3. Promote Priority only if quality probes prove it improves ranking or review workflow.
  4. Promote typed Relations only if supersession/dependency/conflict queries become common.
  5. Provide migration from tag conventions to fields, with compatibility reading old tags during transition.
  6. Add UI controls for new fields before enforcing them.

Acceptance gates:

Phase 7: Page Metadata, Chunking, And Page Embeddings

Goal: make long-form docs first-class retrieval sources when the corpus needs it.

Tasks:

  1. Add optional page frontmatter for tags, audience, related memories, review-after, and source links.
  2. Add page diagnostics parallel to memory diagnostics.
  3. Implement heading-based chunking behind a feature flag.
  4. Preserve slug, heading path, section ID, and source line range where feasible.
  5. Add page chunk search and optional embeddings only after page metrics trigger the need.
  6. Render cited page sections in chat references.

Acceptance gates:

Phase 8: Advanced Council And Learning Flows

Goal: use the memory suite to help humans learn and make better decisions, not just store facts.

Tasks:

  1. Add a guided council workflow in chat for high-impact decisions.
  2. Let chat assemble evidence packs from memories, pages, source bundles, and search benchmarks.
  3. Keep council seats independent and preserve dissent in generated reports.
  4. Add templates for decision records, implementation plans, and research intake notes.
  5. Measure whether councils improve outcomes: fewer missed risks, fewer stale citations, better acceptance-gate clarity.

Acceptance gates:

Phase 9: Sustained Operations

Goal: keep the memory suite healthy after features ship.

Tasks:

  1. Quarterly tag-policy review.
  2. Quarterly stale/expired memory review.
  3. Monthly broken source-link report.
  4. Search quality probe updates when new major wiki topics appear.
  5. Agent proposal quality report.
  6. Wiki docs cleanup: remove obsolete planning pages or mark supersession clearly.

Acceptance gates:

Test And Benchmark Plan

Use NUnit, consistent with project preference.

Recommended test additions:

Recommended metrics:

Risks And Mitigations

Risk Mitigation
Automatic maintenance hides authoritative records. Phase 0 audit; switch to warning/recommendation mode before staleness logic depends on age.
Tag policy becomes too rigid. Start in observe/warn mode; allow per-wiki custom allowlists and explicit overrides.
Tag policy stays too loose. Use lexical analysis and metrics; move namespaces to blocking mode after drift proves harmful.
Lexical analysis creates noisy suggestions. Require human approval and support ignore rules per wiki.
Schema promotion creates migration churn. Promote one field at a time only after thresholds; keep old tags readable during transition.
JSON tool output breaks local chat/tests. Add versioned envelope first, update internal consumers, then change defaults.
Page chunking adds complexity before need. Gate on page metrics and retrieval probes.
Agent approvals become burdensome. Risk-tier proposals; fast path only for low-risk Working/Unconsolidated writes.
Council workflow becomes bureaucracy. Require council only for schema, ranking, maintenance, MCP default, Agent governance, or chunking changes.

Open Questions

Immediate Next Planning Actions

  1. Decide and document maintenance behavior: apply status automatically, recommend only, or require human approval.
  2. Draft the first TagPolicy schema and UI mock for Tag Manager.
  3. Create the memory writing/tagging guide with examples.
  4. Create baseline tag/source/page/search reports.
  5. Add Phase 0 and Phase 1 tests before any ranking or schema work.
  6. Run a focused council review before implementing Phase 6 schema promotion or Phase 7 chunking.