Code Search

MemorySmith includes a dedicated code-search subsystem that indexes the repository codebase and provides semantic + lexical search over code chunks.

Architecture

The code search pipeline has three stages:

Indexing — Files are chunked (40-line windows with 8-line overlap by default), each chunk is embedded using the configured ONNX model (E5-base-v2 default), and the chunks + embeddings are stored in a SQLite database at Data/Graph/code-search/code-search.db.
Search — Queries go through a hybrid scoring pipeline: vector similarity (cosine) + lexical token matching, fused with configurable weights (HybridVectorWeight=0.75, HybridLexicalWeight=0.25). A SQL-based prefilter reduces the candidate set before vector scoring. Results are balanced across documents (MaxResultsPerDocument=2).
Presentation — Results include document path, line range, score, match reason, and a syntax-highlighted snippet. Available via MCP (memorysmith_code_search), chat tool, and the /code-search Blazor UI.

Hybrid scoring with configurable vector/lexical weights and saturation normalization
Document-balanced results — prevents one file from monopolizing top-K
Resumable builds — crashed builds resume from last checkpoint
Shard merging — external index shards can be merged into the main index
Staleness cooldown — rapid queries don't each trigger rebuild checks
Target weighting — test files and docs are down-weighted for implementation queries
Identifier splitting — camelCase and snake_case query tokens are expanded

All code search settings live under MemorySmith:CodeSearch in appsettings.json. Key settings:

Setting	Default	Description
`Enabled`	`true`	Master switch for code search
`EmbeddingBatchSize`	`8`	Chunks per embedding batch (GPU: use 32-128)
`VectorCandidatePrefilterEnabled`	`true`	Enable SQL prefilter before vector scoring
`HybridVectorWeight`	`0.75`	Weight for vector similarity in hybrid score
`HybridLexicalWeight`	`0.25`	Weight for lexical match in hybrid score
`MaxResultsPerDocument`	`2`	Max results from a single file
`ChunkLineCount`	`40`	Lines per chunk
`ChunkOverlapLineCount`	`8`	Overlap between adjacent chunks

Tool	Description
`memorysmith_code_search`	Search indexed code with vector + lexical hybrid
`memorysmith_code_search_status`	Get index status and build progress
`memorysmith_code_search_merge_shard`	Merge an external index shard (Write permission)

Use Scripts/Install-CodeSearchModel.ps1 to download and prepare embedding models. See the Code Search Model Export Workflow guide.