Local Fine-Tune Getting Started Guide

This guide walks through setting up and running MemorySmith's local fine-tuning pipeline from scratch.

Prerequisites

MemorySmith running locally with admin access
Python 3.11+ installed
An NVIDIA GPU with 8+ GB VRAM (optional — simulated mode works without GPU)

Step 1: Set Up the Python Environment

# From the repo root
python -m venv .venv
.venv/Scripts/Activate.ps1  # Windows
# or: source .venv/bin/activate  # Linux/macOS

pip install torch transformers datasets trl peft
# Optional for faster training:
pip install unsloth

Or use the automated script:

./Scripts/Test-FinetuneHarnessPrereqs.ps1

Step 2: Enable Transcript Capture

In /admin settings (or appsettings.LocalOverrides.json):

{
  "MemorySmith": {
    "Training": {
      "ChatTranscriptEnabled": true,
      "StoreChatContent": true,
      "FeedbackEnabled": true,
      "TranscriptRedactionEnabled": true
    }
  }
}

Step 3: Generate Training Data

Use the chat normally. Every conversation turn generates: - Metadata ({date}.jsonl) — provider, model, timing, tool calls, content hash - Content ({date}.content.jsonl) — full user/assistant messages (if StoreChatContent=true)

Rate good responses with thumbs up, bad ones with thumbs down.

Step 4: Run a Dry Run

Visit /training-workbench and click Start dry run. This validates: - Python environment and dependencies - CUDA availability - Training data format - Export pipeline

Review the output in the run history section.

Step 5: Run Real Training

Click Start training run. The harness will: 1. Export training data from transcripts 2. Load the base model (default: Qwen/Qwen3.5-4B) 3. Apply QLoRA with 4-bit quantization 4. Train for up to 200 steps with cosine LR schedule 5. Save the LoRA adapter to the work directory

Expected time: 15-25 minutes for ~375 examples on an 8GB GPU.

Step 6: Use the Fine-Tuned Model

After training, the LoRA adapter is saved in runs/{run-id}/. To use it with Ollama:

Create a Modelfile pointing to the adapter
Build the Ollama model: ollama create athena -f Modelfile
Set the model in MemorySmith admin settings

Troubleshooting

Issue	Solution
"Simulated mode" warning	Install CUDA-compatible PyTorch and ensure GPU is detected
"Missing core deps"	Install: `pip install torch transformers datasets trl peft`
Training timeout	Increase `Training:MaxRunMinutes` (default 360)
Low quality results	Add more training data, increase epochs, adjust learning rate
OOM during training	Reduce batch size or enable gradient checkpointing (default on)