Code Search Benchmark Breakdown: CPU Fallback vs CUDA (2026-05-28)

This page captures the current measured code-search indexing benchmarks from local MemorySmith runs, with direct CPU-fallback versus CUDA comparisons and a reliability-oriented interpretation.

Scope And Caveats

Source Artifacts

Primary comparison set:

CUDA batch sweep set:

CPU Fallback vs CUDA: Main Comparison (225 Files / 2,388 Chunks)

Scenario Elapsed (ms) Build Duration (ms) Files/s Chunks/s Provider Status
CPU fallback cold (ForceRebuild=true) 418,824 417,000 0.54 5.73 Requested CUDA unavailable; fell back to CPU
CPU fallback warm (ForceRebuild=false) 771 0 n/a n/a Requested CUDA unavailable; fell back to CPU
CUDA cold (ForceRebuild=true) 525,224 524,000 0.43 4.56 ONNX provider available via Cuda (0)
CUDA warm first run (ForceRebuild=false) 53,849 0 n/a n/a ONNX provider available via Cuda (0)
CUDA warm post-cold (ForceRebuild=false) 865 0 n/a n/a ONNX provider available via Cuda (0)

Derived Comparison

CUDA Batch Sweep (228 Files / 2,417 Chunks)

CUDA Batch Size Elapsed (ms) Build Duration (ms) Embedding Calls Embedded Chunks Avg Embedding ms/call
1 116,992 116,000 2,417 2,417 46.834
2 125,565 125,000 1,273 2,417 96.159
4 140,630 139,000 705 2,417 195.363
8 147,988 146,000 433 2,417 335.215
16 159,699 159,000 300 2,417 523.653

Batch Sweep Interpretation

Timing-Breakdown Highlights

From the batch sweep timing fields:

This supports the working hypothesis that pipeline-level embedding overhead (tokenization, padding, tensor assembly, and per-call orchestration) is currently the decisive factor, not SQLite writes.

Operational Conclusions

  1. Keep EmbeddingBatchSize=1 as the current default for code-search rebuilds on this host profile.
  2. Keep CUDA optional and preserve CPU fallback as a reliability baseline.
  3. Treat first CUDA warm runs separately from steady-state warm measurements.
  4. Prioritize better batching strategy research (token-budget, length-aware grouping, adaptive batching) before changing defaults again.
  5. Continue benchmarking with strict run labeling (cold, warm-first, warm-steady, corpus size) to avoid mixed conclusions.

Cross-Reference