MemorySmith Audit #6 — Training Harness, MCP Changes, and Branch State
Date: 2026-05-29
Branch: feature/code-search-high-roi-batch8 @ abd614f6a910fd1091c224f62f3a8ae53b75a80f
Base: master @ c4d7a28ade1a2878d270f1479bfb255f5058482b
Scope: All new code on the feature branch (~43 commits since master). Focus on the training harness, training workbench UI, MCP tool changes, and integration correctness.
Audit family: Companion to Audits #1–5. Cross-references inlined.
0. Executive summary
The branch introduces a functional training harness and workbench — a real, running pipeline from Blazor UI → C# process manager → Python PEFT LoRA loop → GPU. The MCP surface gains three new tools. The num_ctx bug from Audit #5 is fixed. The Modelfile has been migrated to ChatML (Path B, as recommended).
However, the training pipeline has two Critical-severity findings that will cause silent quality degradation or create a remote code execution surface. The most consequential: harness.py formats training data using a bare <role>\ncontent format while the Modelfile and Ollama inference path use ChatML (<|im_start|>role\ncontent<|im_end|>). Every training run executed under the current code fine-tunes the model on a template it never sees at inference time. This is the single highest-priority fix in the entire audit.
Severity rollup
| Severity | Count | Net-new vs prior audits |
|---|---|---|
| Critical | 2 | 2 new |
| High | 5 | 4 new, 1 carried (TRAIN-005 = new pattern of Audit #5 configurability gap) |
| Medium | 8 | 7 new, 1 reconfirmed (MCP protocol version) |
| Low | 3 | 3 new |
| Info | 3 | 3 new |
| Total | 21 | 19 net-new |
Closures from prior audits
| Prior finding | Status | How |
|---|---|---|
num_ctx display-only bug (Audit #5 §4.4) |
Closed | BuildOllamaRequestOptions() now sends num_ctx when OllamaContextWindowTokens > 0 |
| Chat template drift hazard (Audit #5 §7.1) | Partially closed | Modelfile uses ChatML. But harness.py does NOT use ChatML for training data — see TRAIN-001 |
| Zero server-side chat logging (Audit #5 §4.6) | Closed | ChatTranscriptWriter added with metadata-only default + optional content companion |
| Zero feedback mechanism (Audit #5 §4.7) | Closed | ChatFeedbackStore added (SQLite-backed, thumbs up/down) |
1. Branch overview and commit log
Default branch: master (protected, still at c4d7a28a)
Active feature branch: feature/code-search-high-roi-batch8 — 43+ commits, last commit 2026-05-29 17:09 UTC
Key commits (newest first)
| Date | SHA prefix | Message | Audit relevance |
|---|---|---|---|
| 05-29 17:09 | abd614f6 |
Record API key test harness requirement | Config |
| 05-29 16:57 | b7e76c60 |
UI: admin refresh, training workbench nav shortcut removal, distillation task page | UX |
| 05-29 16:09 | e458f3f3 |
Training: fix workbench harness path resolution | Bug fix |
| 05-29 15:41 | a5fa5941 |
Docs: fix ultra codebase audit markdown structure | Docs |
| 05-29 15:40 | aca7dcae |
Skills: council rename, core inheritance, hooks, request pages | Feature |
| 05-29 15:13 | b313fb59 |
Training: persist HF auth context in run status artifacts | Security |
| 05-29 15:11 | 2e54e207 |
Training: surface HF auth presence in active run status | Telemetry |
| 05-29 15:09 | d2a49cb4 |
Training: show HF auth in UI run launch | UX |
| 05-29 15:08 | 09c81607 |
Training: wire optional HF token env for UI harness runs | Security |
| 05-29 15:05 | b14a5396 |
Training: support optional HF token in harness runner | Security |
| 05-29 15:03 | 34a1d8e0 |
Training: surface train progress in final status metrics | Telemetry |
| 05-29 15:00 | b97fdb76 |
Training: add completedEpochs to LoRA telemetry | Telemetry |
| 05-29 14:55 | 207f2e89 |
Training: replace deprecated torch_dtype with dtype | Compat |
| 05-29 14:51 | 93052596 |
Training: enable multi-epoch loss telemetry and configurable step caps | Feature |
| 05-28 05:03 | 3502bf93 |
Harden code search embedding failure handling | Reliability |
| 05-28 04:59 | 5793893d |
Resolve PR review findings, add vector search whitepaper notes | Cleanup |
| 05-28 04:04 | ce2bdf4e |
Fix: catch ArgumentException in slug normalization | Bug fix |
| 05-28 03:57 | 1d751e6c |
Add code search batch embedding benchmarks | Perf |
| 05-28 02:55 | 4d414141 |
Reduce semantic prewarm startup log noise | UX |
| 05-28 02:54 | 7f3a43b3 |
Add semantic prewarm and code search timing telemetry | Observability |
| 05-27 23:04 | 961f69fa |
Add configurable ONNX execution providers | Feature |
| 05-27 19:10 | 826d7324 |
Fix code search cache invalidation | Bug fix |
| 05-27 19:07 | 27d5e812 |
Harden code search indexing workflow | Reliability |
| 05-27 17:01 | 2840bd23 |
Document memorysmith.home.arpa MCP alias setup | Docs |
| 05-27 13:43 | f8f5fccf |
Close PR43 follow-up and backlog slice | Cleanup |
| 05-26 21:56 | 58f983f0 |
Audit untracked architecture gaps | Docs |
| 05-26 20:46 | 3c756252 |
Checkpoint UI sweep and repo updates | UX |
| 05-26 19:06 | 8cf4aaca |
TSK-0130 rebalance pages narrow layout | UX |
2. New file inventory
Training subsystem (entirely new)
| File | Size | Purpose |
|---|---|---|
MemorySmith.Training/harness.py |
~400 lines | Python training orchestrator: data loading, LoRA training, inference comparison, telemetry |
MemorySmith.Training/synthetic/starter_sft.jsonl |
2.5 KB | Synthetic starter SFT examples |
MemorySmith.Training/synthetic/starter_sft.expanded.jsonl |
6.6 KB | Expanded synthetic SFT examples |
MemorySmith.App/Components/Pages/TrainingWorkbench.razor |
~800 lines | Blazor admin page at /training-workbench |
MemorySmith.App/Services/TrainingOptions.cs |
~120 lines | Configuration block bound at MemorySmith:Training |
MemorySmith.App/Services/Training/TrainingHarnessRunnerService.cs |
~350 lines | C# process manager: spawn, probe deps, timeout/kill |
MemorySmith.App/Services/Training/TrainingPathResolver.cs |
~150 lines | Project-root and venv path resolution |
MemorySmith.App/Services/Training/ChatTranscriptWriter.cs |
~200 lines | JSONL transcript persistence with redaction |
MemorySmith.App/Services/Training/ChatTurnRecord.cs |
~80 lines | Turn record DTOs |
MemorySmith.App/Services/Training/ChatFeedbackStore.cs |
~180 lines | SQLite-backed thumbs feedback |
Data/Training/exports/ |
directory | Export target (.gitkeep only) |
Changed files (substantive modifications)
| File | What changed |
|---|---|
MemorySmith.App/Services/MemorySmithOptions.cs |
Added TrainingOptions Training property |
MemorySmith.App/Services/ChatToolCatalog.cs |
3 new tools: code_search_merge_shard, page_save, page_delete |
MemorySmith.App/Services/ChatServices.cs |
BuildOllamaRequestOptions() sends num_ctx; transcript writer integration |
MemorySmith.Core/Docs/Prompts/wiki-chat-agent.modelfile |
ChatML template migration, sampling parameter tuning |
MemorySmith.App/Controllers/McpController.cs |
Protocol version 2025-06-18 |
3. Training harness deep audit (harness.py)
3.1 Architecture
The harness is a single Python script that:
1. Reads request.json from a work directory (written by the C# side).
2. Loads chat examples from transcript JSONL files + synthetic starters.
3. Attempts real LoRA training via PEFT/transformers.
4. Falls back to "simulated" training if PEFT or CUDA fails.
5. Runs inference comparison (base vs fine-tuned) on eval prompts.
6. Writes status.json atomically via rename.
7. Emits JSON event lines on stdout for the C# wrapper to consume.
3.2 Training data format — THE CRITICAL BUG
Finding TRAIN-001 (see § 7). The to_training_text() function formats messages as:
system
<content>
user
<content>
assistant
<content>
But the Modelfile uses ChatML:
<|im_start|>system
<content><|im_end|>
<|im_start|>user
<content><|im_end|>
<|im_start|>assistant
This means every training run teaches the model to produce outputs wrapped in a format it will never be prompted with at inference time. The model learns to emit text after assistant\n when at inference time it will be asked to emit text after <|im_start|>assistant\n. The mismatch causes:
- Degraded instruction following (model "forgets" the ChatML boundary tokens).
- Increased hallucination rate (model has weaker boundary understanding).
- Tool-call discipline regression (the JSON envelope boundary is blurred by the wrong template).
This is the single most impactful fix available. Remediation: use tokenizer.apply_chat_template() to format training data, or manually construct ChatML sequences matching the Modelfile's TEMPLATE directive.
3.3 LoRA configuration
| Parameter | Value | Assessment |
|---|---|---|
| rank | 8 | Conservative for 4B model. Rank 16 would give more capacity without significant VRAM cost. |
| alpha | 16 | Alpha/rank ratio = 2. Standard. |
| dropout | 0.05 | Fine for >100 examples; negligible effect under 50. |
| target_modules | q_proj k_proj v_proj o_proj |
Standard attention targets. Missing MLP projections (gate_proj, up_proj, down_proj) which would improve quality for tool-call and formatting behaviors. |
| bias | "none" | Correct. |
| task_type | "CAUSAL_LM" | Correct. |
3.4 CUDA and VRAM handling
No pre-flight VRAM check. No gradient accumulation. Model is loaded in bf16 (or float32 fallback), moved to CUDA with .to(training_device). For Qwen3.5-4B:
- bf16 weights: ~8 GB
- LoRA adapters (r=8): ~30 MB
- Optimizer state (AdamW): ~120 MB
- Activations at seq_len=1024: ~2-3 GB with no gradient checkpointing
Total: ~10-11 GB minimum. This will not fit on the 8 GB RTX 5060 without QLoRA (4-bit base). The code does not use BitsAndBytesConfig or Unsloth's 4-bit loading. The current harness will OOM on the target hardware.
3.5 Hyperparameter clamping
The harness clamps:
- epochs to [1, 3]
- sequence_length to [128, 1024]
- learning_rate to [1e-6, 5e-3]
- max_train_steps to [1, 256]
These are safe ranges. The sequence length cap at 1024 is conservative — it prevents VRAM blowout but limits the model's ability to learn long-context behaviors. Consider raising to 2048 once QLoRA is implemented.
3.6 Eval gate
The minimum data threshold is records >= 2. With one synthetic starter and one logged transcript, training "passes" on 2 examples. This is meaningless — the model learns nothing useful from 2 examples but the harness reports success.
3.7 Model ID resolution
The resolve_model_id() function maps Ollama tags to HuggingFace paths:
- qwen3.5 → Qwen/Qwen3.5-4B (hardcoded)
- qwen3 → Qwen/Qwen3-4B (hardcoded)
- Everything else → passed through verbatim
Combined with trust_remote_code=True, this is a remote code execution vector (see TRAIN-002).
3.8 HuggingFace token handling
Recent commits (09c81607, b14a5396, 2e54e207, b313fb59) wire the HF_TOKEN environment variable from TrainingOptions.HuggingFaceTokenEnvironmentVariable. The token is:
- Read from the OS environment by the C# side.
- Passed into the Python process environment.
- Persisted in run status artifacts (b313fb59).
This last point is a concern: if status.json or events.jsonl is ever exposed (e.g., via the diagnostics endpoint), the HF token leaks. The existing TranscriptRedactionEnabled pattern should be extended to status artifacts.
4. Training workbench UI audit (TrainingWorkbench.razor)
4.1 Architecture
- Route:
/training-workbench - Auth:
[Authorize(Policy = MemorySmithPolicies.CanAdminMemorySmith)]— admin-only. Correct. - Pattern: Blazor Server with 2-second
PeriodicTimerpolling for run status. - Features: Run list with status, start new run, dependency probe, export list, settings editor with import/export, artifact browser.
4.2 Positive findings
- Admin-only authorization is correctly applied.
- Settings import validates against known keys (unknown keys silently ignored, not injected).
- Artifact browsing constrains allowed paths via
GetTrainingArtifactRoots(). - Process spawn uses
UseShellExecute = false— no shell injection. - Status file reads handle parse errors gracefully (returns null, surfaces warning).
- CSRF is inherently mitigated by Blazor Server's SignalR transport.
4.3 Concerns
- Polling overhead (TRAIN-015): 2-second poll fires continuously. Each poll re-scans up to 40 run directories. When idle, this burns I/O for no value.
- CancellationToken.None in StartRunInternalAsync (TRAIN-008): Navigation away doesn't cancel the launch. The training run itself is fire-and-forget by design, but the launch sequence (dependency probe, request.json write) should respect a scoped token.
- No way to cancel a running job from the UI: The harness has no cancellation protocol. The C# side can kill the process, but there's no "Cancel run" button wired up.
- Process argument construction inconsistency (TRAIN-007): Probe uses safe
ArgumentListAPI; run uses manualQuote()+ string join. Both haveUseShellExecute = falseso the risk is low, but the inconsistency invites bugs.
4.4 UX observations
- No context-window picker in the chat UI. The design's § 1 dropdown (2K/4K/8K/16K/24K/32K/Custom) has not been implemented.
- No VRAM heuristic displayed before starting a run. The user has no idea if the run will fit on their GPU.
- No estimated training time displayed. The design's § 11 cost/time estimates are not surfaced.
- No model promotion UI. The workbench shows runs and their status but there is no "Promote to active" button wired through
appsettings.jsonmutation. - No eval score display in the run summary. The eval gate runs but results aren't surfaced in the UI.
- Training nav shortcut was removed (commit
b7e76c60). Users must navigate directly to/training-workbench. Consider adding it to the admin sidebar.
5. MCP tool changes
5.1 New tools
memorysmith_code_search_merge_shard — Write risk, MCP-only, disabled by default.
- Purpose: Merge an external shard SQLite DB into the code search index.
- Concern: The shardPath parameter accepts an arbitrary filesystem path with no containment validation at the tool level. See TRAIN-006.
memorysmith_page_save — Write risk, MCP-only, disabled by default, available in Agent mode.
- Purpose: Create or update a wiki page.
- Positive: Proper slug validation, markdown content validation, minimumRole authorization check.
memorysmith_page_delete — Write risk, MCP-only, disabled by default.
- Purpose: Delete a wiki page by slug.
- Positive: Proper slug validation and authorization.
5.2 Protocol version
Still 2025-06-18 — forward-dated relative to the MCP spec (current spec: 2025-03-26). See TRAIN-009.
5.3 num_ctx fix confirmed
BuildOllamaRequestOptions() in ChatServices.cs now reads chatOptions.OllamaContextWindowTokens and includes "num_ctx" in the options dict when the value is set and > 0. Both streaming and non-streaming payloads receive the options. The Audit #5 display-only bug is closed.
6. Modelfile and prompt changes
6.1 ChatML migration — complete
The Modelfile now uses:
TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{- range .Messages }}<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{ end }}<|im_start|>assistant
"""
This matches Qwen3.5's native ChatML. Path B (recommended in the design doc) is implemented.
6.2 Missing num_ctx in Modelfile
The Modelfile does not set PARAMETER num_ctx. While the API layer now sends num_ctx via BuildOllamaRequestOptions(), direct Modelfile invocations (e.g., ollama run) will use the model's default. Given the system prompt is ~4000+ tokens, a default of 2048 would truncate it. Recommend adding PARAMETER num_ctx 8192 at minimum.
6.3 Double repetition penalty
Both repeat_penalty 1.25 and presence_penalty 0.6 are set. Ollama applies both, which can cause coherence degradation on longer responses. Recommend using one or the other.
6.4 Sampling parameters
| Parameter | Value | Assessment |
|---|---|---|
| temperature | 0.4 | Conservative for factual wiki. Good. |
| top_p | 0.9 | Standard. |
| top_k | 40 | Reasonable. |
| repeat_penalty | 1.25 | Strong. See § 6.3. |
| presence_penalty | 0.6 | Combined with repeat_penalty, may be excessive. |
| frequency_penalty | 0.4 | Moderate. Stacks with the above. |
7. Configuration surface
7.1 TrainingOptions
New configuration class bound at MemorySmith:Training. All defaults are off/conservative:
| Option | Default | Assessment |
|---|---|---|
ChatTranscriptEnabled |
false | Safe default. |
StoreChatContent |
false | Safe default. |
TranscriptRedactionEnabled |
true | Good security posture. |
FeedbackEnabled |
false | Safe default. |
PythonVenvPath |
.venv |
Relative path, resolved by TrainingPathResolver. |
PythonHarnessScript |
MemorySmith.Training/harness.py |
Relative. |
HuggingFaceTokenEnvironmentVariable |
HF_TOKEN |
Reads from OS env, not stored in config. |
MaxRunMinutes |
360 | 6-hour cap. Reasonable. |
PreferenceFormat |
FilteredSft | Correct for v1. |
ShadowEvalEnabled |
false | Correct — would OOM on 8GB. |
7.2 SecurityProfile gap (TRAIN-005)
No check prevents training from running in RemoteHardened profile. Spawning Python processes and downloading HuggingFace models should be disabled in hardened mode.
8. Chat transcript and feedback
8.1 ChatTranscriptWriter
Implemented per the scaffold design. Metadata-only default with optional content companion. TranscriptRedactionEnabled runs regex patterns for Bearer tokens, API keys, secrets, and connection strings.
Redaction concern (TRAIN-013): The BearerPattern regex \bBearer\s+[A-Za-z0-9._\-]+ misses JWT base64 characters (+, /, =). JWTs are the most common Bearer token format. The pattern should be \bBearer\s+[A-Za-z0-9._\-+/=]+.
8.2 ChatFeedbackStore
SQLite-backed upsert with _initialized flag. Double-checked locking pattern with non-volatile bool (TRAIN-014 — low severity).
9. Severity-tagged findings
Critical
TRAIN-001 | Critical | Training/inference chat template mismatch
- File: MemorySmith.Training/harness.py, to_training_text() (~line 230)
- Description: Training data formatted as bare <role>\ncontent but inference uses ChatML (<|im_start|>role\ncontent<|im_end|>). Every LoRA fine-tune under the current code degrades the model's instruction-following because it learns the wrong template boundaries.
- Impact: Trained models will produce worse output than the base model on ChatML-prompted turns. Tool-call discipline, citation formatting, and mode-switching all regress.
- Remediation: Replace to_training_text() with a function that constructs proper ChatML sequences. Use tokenizer.apply_chat_template() if available, or manually build <|im_start|>role\ncontent<|im_end|> strings matching the Modelfile TEMPLATE.
- Confidence: 0.98
TRAIN-002 | Critical | trust_remote_code=True with user-controllable model ID
- File: MemorySmith.Training/harness.py, train_lora() (~line 243), infer_lora() (~line 283)
- Description: AutoModelForCausalLM.from_pretrained() and AutoTokenizer.from_pretrained() called with trust_remote_code=True. The model ID derives from admin-configurable settings. A compromised admin or settings file can point at a malicious HuggingFace repo → arbitrary Python code execution in the harness process.
- Impact: Remote code execution. The Python process runs with the same privileges as the MemorySmith app. On Windows service deployments, this may be SYSTEM.
- Remediation: Remove trust_remote_code=True (Qwen3.5 works without it in transformers >= 4.45). Maintain an allowlist of known-safe model IDs. As defense-in-depth, run the harness in a restricted user context.
- Confidence: 0.95
High
TRAIN-003 | High | No CUDA OOM handling or VRAM budget check
- File: MemorySmith.Training/harness.py, train_lora()
- Description: Qwen3.5-4B in bf16 = ~8 GB weights alone. With LoRA + optimizer + activations at seq_len=1024, total is ~10-11 GB. No QLoRA (4-bit) loading, no gradient checkpointing, no BitsAndBytesConfig. Will OOM on RTX 5060 (8 GB).
- Remediation: Add 4-bit QLoRA via BitsAndBytesConfig(load_in_4bit=True) or use Unsloth's FastLanguageModel.from_pretrained(load_in_4bit=True). Add torch.cuda.mem_get_info() pre-flight check. Add gradient checkpointing.
- Confidence: 0.95
TRAIN-004 | High | No requirements.txt for Python training deps
- File: MemorySmith.Training/ (missing)
- Description: dtype parameter needs transformers >= 4.45. No version pinning anywhere. Dependency probe checks existence, not version.
- Remediation: Add requirements-training.txt with minimum: torch>=2.1, transformers>=4.45, peft>=0.12, datasets>=2.19, bitsandbytes>=0.43.
- Confidence: 0.99
TRAIN-005 | High | No SecurityProfile gating for training
- File: MemorySmith.App/Services/Training/TrainingHarnessRunnerService.cs
- Description: Training spawns Python processes and downloads HuggingFace models regardless of security profile. Should be disabled in RemoteHardened.
- Remediation: Check MemorySmithOptions.SecurityProfile in StartRunAsync(). Return error if RemoteHardened.
- Confidence: 0.90
TRAIN-006 | High | memorysmith_code_search_merge_shard path traversal
- File: MemorySmith.App/Services/ChatToolCatalog.cs, memorysmith_code_search_merge_shard handler
- Description: shardPath parameter passes an arbitrary filesystem path to MergeShardAsync() with no containment check. An MCP caller with write permission could read/corrupt any SQLite file on the filesystem.
- Remediation: Validate shardPath is within an allowed directory (e.g., the code search index root). Return error for paths outside the allowed root.
- Confidence: 0.90
TRAIN-020 | High | HF token leaks into status artifacts
- File: MemorySmith.Training/harness.py, commit b313fb59
- Description: The "persist HF auth context in run status artifacts" commit writes HF token presence (and potentially the token value) into status.json or events.jsonl. If these artifacts are ever served (diagnostics endpoint, admin page artifact browser), the token leaks.
- Remediation: Store only a boolean "hf_token_present" flag, never the token value. Extend the redaction pattern to status artifact writes.
- Confidence: 0.85 (need to verify exactly what is persisted — the commit message says "context" not "token", but the boundary is unclear)
Medium
TRAIN-007 | Medium | Process argument construction inconsistency
- File: TrainingHarnessRunnerService.cs, RunHarnessAsync()
- Description: Probe uses safe ArgumentList; run uses manual Quote() + string join. Inconsistent.
- Remediation: Use ArgumentList consistently.
TRAIN-008 | Medium | No cancellation support in Python harness
- File: MemorySmith.Training/harness.py
- Description: No signal handler for SIGTERM/SIGINT. No cancel.flag polling. C# kill-process is the only stop mechanism.
- Remediation: Add signal.signal(signal.SIGTERM, handler). Check a cancellation flag per training step.
TRAIN-009 | Medium | Forward-dated MCP protocol version
- File: McpController.cs, BuildInitializeResult()
- Description: 2025-06-18 is not an official MCP spec version. May break spec-validating clients.
- Remediation: Use 2025-03-26 or a clearly-custom version string.
TRAIN-010 | Medium | Eval gate too permissive (records >= 2)
- File: MemorySmith.Training/harness.py, run()
- Description: 2 examples is meaningless. One synthetic + one real = passes gate.
- Remediation: Raise to >= 10. Add unique token count check.
TRAIN-011 | Medium | Double repetition penalty in Modelfile
- File: wiki-chat-agent.modelfile
- Description: repeat_penalty 1.25 + presence_penalty 0.6 double-penalize repetition.
- Remediation: Use one or the other.
TRAIN-012 | Medium | No num_ctx in Modelfile
- File: wiki-chat-agent.modelfile
- Description: Direct ollama run uses default context. System prompt alone is ~4000 tokens.
- Remediation: Add PARAMETER num_ctx 8192 minimum.
TRAIN-013 | Medium | Transcript redaction regex misses JWT tokens
- File: ChatTranscriptWriter.cs
- Description: Bearer pattern excludes +/=, common in JWTs.
- Remediation: Expand to [A-Za-z0-9._\-+/=]+.
TRAIN-021 | Medium | Missing MLP target modules in LoRA config
- File: MemorySmith.Training/harness.py, LoRA config
- Description: Only targets attention projections (q_proj, k_proj, v_proj, o_proj). Missing gate_proj, up_proj, down_proj MLP projections. For tool-call and formatting tasks, MLP layers carry significant formatting behavior. Including them would improve formatting discipline at modest VRAM cost.
- Remediation: Add "gate_proj", "up_proj", "down_proj" to target_modules when VRAM allows.
Low
TRAIN-014 | Low | Non-volatile _initialized flag in ChatFeedbackStore
- File: ChatFeedbackStore.cs, EnsureSchemaAsync()
- Description: Double-checked locking with non-volatile bool. Safe on x86 but technically a data race.
- Remediation: Mark volatile or use Volatile.Read().
TRAIN-015 | Low | Polling fires every 2s even when idle
- File: TrainingWorkbench.razor, PollRunsAsync()
- Description: Continuous 2-second poll with no idle backoff.
- Remediation: 10s when idle, 2s when active run detected.
TRAIN-016 | Low | Hardcoded fallback training example
- File: harness.py, load_chat_examples() fallback
- Description: Fallback reveals internal architecture details.
- Remediation: Require minimum real examples; fail explicitly.
Info
TRAIN-017 | Info | dtype parameter compatibility
- File: harness.py, train_lora(), infer_lora()
- Description: dtype requires transformers >= 4.45. No version check.
- Remediation: Runtime check or requirements.txt.
TRAIN-018 | Info | Inference comparison parameters inconsistency
- File: harness.py, infer_lora()
- Description: do_sample=False with top_p=0.9 and temperature=0.7. Sampling params ignored in greedy mode.
- Remediation: Remove top_p/temperature when do_sample=False.
TRAIN-019 | Info | TrainingPathResolver walks entire directory tree
- File: TrainingPathResolver.cs, EnumerateCandidateBaseDirectories()
- Description: Walks up to filesystem root. Could resolve to unexpected locations.
- Remediation: Limit upward walk to 3 levels or the repository root.
10. UX recommendations
10.1 Immediate wins (ship with the branch)
- Add num_ctx to the Modelfile — prevents OOM on
ollama runinvocations. One line. - Surface eval scores in run summaries — the eval gate runs but results aren't shown in the workbench.
- Add a "Cancel run" button — even if it's just process kill, users need a stop mechanism.
- Add training nav link to admin sidebar — the shortcut was removed; users can't discover the page.
10.2 Medium-term (next sprint)
- Context-window dropdown (design supplement § 1) — users have no way to adjust context without editing config.
- VRAM heuristic pre-flight (design supplement § 2) — display estimated memory before starting a run.
- Estimated training time — based on dataset size and hardware profile.
- Model promotion button — one-click swap of
ActiveModelTagthrough the UI. - Training data quality dashboard — show topic coverage, example count per category, avg token length.
10.3 Longer-term
- SignalR for live progress — replace 2-second polling with push.
- Regenerate button on assistant turns — enables DPO v2 pipeline.
- Per-conversation context override — let users adjust context per chat.
- Memory-type chips — visual taxonomy indicator on memory renders.
11. Cross-references to prior audits
| Prior finding | Current status |
|---|---|
| Audit #5 Clipboard-paste external fetch (memorysmith.js:813-832) | Still open — untouched by this branch |
| Audit #5 ChatReferenceLinkPolicy event handler bypass | Still open |
| Audit #5 Mermaid innerHTML XSS | Still open |
| Audit #5 OllamaContextWindowTokens display-only bug | CLOSED — fixed by BuildOllamaRequestOptions() |
| Audit #5 Zero server-side chat logging | CLOSED — ChatTranscriptWriter implemented |
| Audit #5 Zero feedback mechanism | CLOSED — ChatFeedbackStore implemented |
| Audit #5 23 configurability gaps | 3 addressed by TrainingOptions toggles; 20 still open |
| Audit #4 Code search findings | Partially addressed — batch embedding benchmarks, cache invalidation fix, embedding failure hardening, ONNX execution providers |
12. Assumptions and confidence
| Section | Confidence | Notes |
|---|---|---|
| TRAIN-001 (template mismatch) | 0.98 | Verified by reading to_training_text() and Modelfile TEMPLATE |
| TRAIN-002 (trust_remote_code) | 0.95 | Verified from from_pretrained() calls in harness.py |
| TRAIN-003 (VRAM math) | 0.90 | Based on published bf16 sizes; actual may vary ±15% |
| TRAIN-006 (shard path) | 0.90 | Need to verify MergeShardAsync implementation |
| TRAIN-020 (HF token leak) | 0.85 | Need to inspect exact fields persisted in status |
| Overall branch health | Medium-high | The training subsystem works end-to-end but has 2 critical bugs blocking production use |
13. Recommended priority
P0 — Block release
- TRAIN-001 — Fix
to_training_text()to produce ChatML. Every training run until this is fixed makes the model worse. - TRAIN-002 — Remove
trust_remote_code=True. One-line fix, removes RCE surface.
P1 — Fix before first real training run
- TRAIN-003 — Add QLoRA / 4-bit loading. Current code OOMs on target hardware.
- TRAIN-004 — Add
requirements-training.txt. - TRAIN-006 — Add path containment to merge_shard.
P2 — Fix before beta
- TRAIN-005 — SecurityProfile gating.
- TRAIN-010 — Raise eval gate minimum.
- TRAIN-012 — Add
num_ctxto Modelfile. - TRAIN-011 — Remove double repetition penalty.
P3 — Polish
- TRAIN-008 — Add cancellation support.
- TRAIN-013 — Expand redaction patterns.
- UX recommendations from § 10.