OpenTelemetry Local-First V1 Plan (MemorySmith)
Summary
This plan delivers a local-first OpenTelemetry implementation that complements existing Serilog + Windows Event Log observability. It focuses on safe-by-default telemetry with explicit controls for performance and data exposure.
Goals
- Add OpenTelemetry tracing and metrics with optional OTLP export for local collector use.
- Keep overhead bounded via sampling, path filters, and low-cardinality tags.
- Avoid sensitive data capture (no raw prompts, no attachment content, no API secrets, no full payload logs in OTel attributes).
- Keep all telemetry controls configurable in existing admin settings.
- Preserve current diagnostics and health workflows while adding OTel readiness visibility.
Non-Goals (V1)
- Full custom dashboard builder engine.
- External SaaS telemetry dependency.
- High-cardinality per-user/per-record telemetry dimensions.
Architecture (V1)
-
App instrumentation: - ASP.NET Core request instrumentation - HttpClient instrumentation - Runtime metrics instrumentation - Custom ActivitySource + Meter for MemorySmith domain operations
-
Export path: - Default: local-only instrumentation with exporter disabled - Optional: OTLP exporter enabled to local collector endpoint
-
Existing observability: - Keep Serilog sinks and diagnostics log APIs as-is - Add telemetry config/health surfacing for operator visibility
Guardrails
-
Performance: - Parent-based sampling with configurable percentage (default low) - Exclude low-value noisy endpoints (health, diagnostics, static assets) - Exporter disabled by default
-
Privacy: - Do not record query text, attachment content, auth headers, tokens, or API keys as attributes - Use operation-level tags only (operation name/category, success, slow-path) - Keep dimensions bounded
-
Operability: - Admin-configurable toggles and endpoints/protocol settings - Clear diagnostics visibility of effective telemetry config
Deliverables
- Config model and appsettings defaults for telemetry.
- Admin settings descriptors for telemetry controls.
- OpenTelemetry package wiring and startup registration.
- MemoryApplicationService low-cardinality instrumentation.
- Validation and council-review remediation pass.
Acceptance Criteria
- Build/tests pass with telemetry enabled defaults.
- Telemetry exporter can be toggled without code changes.
- No raw sensitive request content is added to telemetry attributes.
- Request and core memory-operation latency/error telemetry is observable via OTel.
- Path filtering and sampling controls are effective and admin-editable.
Risks and Mitigations
-
Risk: Cardinality explosion. - Mitigation: fixed operation tags, no freeform text attributes.
-
Risk: Extra CPU/network overhead. - Mitigation: low default sampling, exporter off by default, filtered paths.
-
Risk: accidental sensitive data capture. - Mitigation: no payload attributes, no query text tags, no body capture.
Confidence
88%
Open Questions
- Should V1 include a direct Prometheus scrape endpoint, or stay OTLP-to-collector only?
- Should chat provider/model be included as a bounded dimension for selected operation metrics?