Executive Summary

Recommendations by Research Question

Question (Condensed) Recommendation for MemorySmith Evidence Level
1. Namespaced tags & UI metadata Use simple prefix conventions (e.g. type:, priority:) initially, but provide UI affordances before scale. Implement tag autocomplete/chips and bulk tag editing. Reserve formal schema fields only for very stable categories. Consider using a lightweight tag validation plugin or allow a “tag library” that suggests existing tags.(Maintain as conventions until proven needed in prototypes.) Weak / practice-based
2. Expiration & staleness Add optional “review-after” or “valid-until” date fields to memories and pages. Display warnings or badges on stale content in UI. In search ranking, decay (demote) older docs but do not remove them silently. Provide filters (e.g. “only show non-deprecated”). Ensure agents include timestamps in citations. Moderate (industry)
3. Page chunking & embeddings Start without chunking for small docs. Once pages exceed ~500–1000 tokens (or performance degrades), split by headings/sections with some overlap. Preserve headings and source links in each chunk’s metadata. Embed and index chunks, not only full pages. Evaluate changes with metrics (recall@k, nDCG) or user QA tests before/after. Moderate (LLM guides)
4. JSON vs Markdown output Default agent-tool output to structured JSON (for examples: records, warnings, errors, relationships). Separately generate or allow markdown summaries for UI display. In chat, present JSON-derived answers in a user-friendly way. This hybrid approach maximizes machine-readability without sacrificing human readability. Moderate (tool docs)
5. Relationship typing For key relations (DependsOn, Supersedes, ConflictsWith, etc.), consider adding explicit schema fields (arrays of IDs). In the meantime, encourage convention tags/fields (like supersedes:<id> in a “References” list). Plan a migration: e.g. detect patterns in “References” text to auto-populate new fields. A full graph DB seems unnecessary unless very complex querying is needed; simple JSON linking is enough for now. Weak (usage patterns)
6. Agent write approval & governance Treat AI-suggested changes as proposals: include full context, source links, confidence, and allow approve/reject. For critical records (strict rules, core schema), require explicit human approval (possibly with an “Admin” or “Maintainer” role). Provide a review UI akin to code review (diff view, comments). Encourage AI to cite sources for every factual claim. (E.g. “sources”:[…] field in JSON output.) Weak (best practice)
7. Strict rules in markdown & extraction Use Markdown conventions to flag rules: e.g. admonition boxes or a special YAML header (e.g. tags: [rule, strict]). Parse with a Markdown AST (like CommonMark) to reliably extract these. In retrieval, treat rules as untrusted context: e.g. include as plain text in context but have the agent re-verify or cite them rather than assume correctness. Weak (engineering)
8. Council-style review Implement an optional “agent council” workflow for complex tasks: e.g. run two LLM chains with different prompts (validator vs critic) and merge their outputs. However, keep it simple: maybe just pair “expert” and “questioner” agents. Limit multi-agent chains to high-risk tasks to avoid cost and confusion. Encourage dissent by, for example, forcing each agent to find counterpoints. Weak (research/experts)

Evidence levels: Strong = replicated studies or formal docs; Moderate = industry reports/blogs with examples; Weak = practitioner opinion or analogous cases. Most findings above rely on documented practices and case studies, not controlled experiments.

Key Findings and Examples

Risks and Anti-Patterns

Decision Gates: Convention → UI → Schema

Evaluation Metrics and Test Probes

Convention vs Schema vs Page Prose

Convention-first: Use lightweight prefix/tag conventions and format hints for flexible metadata (especially low-volume tags or fields). This keeps the system agile and local-first (no database changes). For example, continue using supercedes:<id> in text until it’s common.

Schema-first: Use explicit schema fields for core concepts that are critical and stable (e.g. a memory’s SourceLinks, LastUpdated, Status levels, or any field that many tools consume programmatically). Schemas add upfront maintenance but pay off when used heavily.

Page prose: Reserve free-form markdown for narrative context, examples, or documentation that isn’t directly machine-queried. Instructional content, verbose policy explanations, or detailed records should stay in pages. Only distill the actionable bits into the structured memory records or schema fields.

In summary, favor conventions and human-edited text until usage patterns emerge. Add UI guidance early, schema only when needed (e.g. triggered by user confusion or query errors). Keep most knowledge in markdown pages for readability, and use the JSON record format for the distilled metadata and facts that agents rely on.