Code Search Benchmark: E5 vs Nomic Embed Text v1.5 (2026-05-28)
This page tracks the direct A/B benchmark and relevance comparison between the current E5 baseline and nomic-embed-text-v1.5 for MemorySmith code search.
Executive Summary
- Rebuild throughput: E5 remains faster (
1,055,795 ms) than Nomic (1,310,318 ms), with Nomic about 24% slower on the same 161-file / 1746-chunk corpus.
- Query latency: Nomic is faster on all six benchmark prompts in this run (about
11% to 77% better average latency).
- Relevance suite implementation: Added fixed suite + scorer via
Scripts/code-search-relevance-suite.json and Scripts/Measure-CodeSearchRelevance.ps1.
- Relevance suite current result: both E5 and Nomic pass
8/8 after forcing model-consistent index ownership before E5 scoring.
- Latency under relevance-suite methodology still favors Nomic, while E5 retains the faster rebuild profile.
Scope
- Repository:
MemorySmith
- Targets:
MemorySmith.App, MemorySmith.Core, MemorySmith.Storage, MemorySmith.Tests, Scripts
- Benchmark harness:
Scripts/Warm-CodeSearchIndex.ps1
Scripts/Measure-CodeSearchQueries.ps1
- Query baseline file (E5):
artifacts/browser-validation/code-search-query-baseline-e5-v2-20260528-clean.json
- Rebuild baseline file (E5):
artifacts/browser-validation/code-search-index-summary-e5-v2-20260528.json
- Query baseline file (Nomic):
artifacts/browser-validation/code-search-query-baseline-nomic-v1-5-20260528.json
- Rebuild baseline file (Nomic):
artifacts/browser-validation/code-search-index-summary-nomic-v1-5-20260528.json
- Relevance suite file (E5 corrected):
artifacts/browser-validation/code-search-relevance-e5-v3-20260528.json
- Relevance suite file (Nomic):
artifacts/browser-validation/code-search-relevance-nomic-v1-5-20260528.json
Current Status
- E5 and Nomic rebuild/query artifacts are captured on the same corpus size (
161 files / 1746 chunks).
- Earlier E5 query file
code-search-query-baseline-e5-v2-20260528.json is retained for history but was captured while indexing was active (1742 chunks) and is not used for final A/B interpretation.
E5 Baseline Snapshot
Rebuild (cold)
| Model |
Files |
Chunks |
Elapsed (ms) |
Build (ms) |
Files/s |
Chunks/s |
Avg Embedding ms/call |
E5 (e5-base-v2.onnx) |
161 |
1746 |
1055795 |
1053000 |
0.15 |
1.66 |
3341.042 |
Query Baseline (warm)
The first sample in each query set is consistently higher than subsequent runs, so median and min values better represent steady-state behavior on this host.
| Query Name |
Avg ms |
Median ms |
Min ms |
Max ms |
Top Document |
| semantic provider path |
592.500 |
1527.675 |
115.037 |
1527.675 |
MemorySmith.App/Components/Pages/Admin.razor |
| query telemetry |
261.165 |
554.907 |
108.028 |
554.907 |
MemorySmith.Tests/CodeSearchServiceTests.cs |
| vector prefilter |
258.636 |
543.455 |
82.549 |
543.455 |
MemorySmith.Tests/CodeSearchServiceTests.cs |
| benchmark harness |
191.579 |
506.250 |
33.445 |
506.250 |
MemorySmith.Tests/CodeSearchServiceTests.cs |
| page validation |
297.627 |
695.855 |
49.263 |
695.855 |
MemorySmith.App/Components/Pages/Pages.razor |
| semantic provider tests |
279.794 |
698.804 |
34.710 |
698.804 |
MemorySmith.Tests/SemanticEmbeddingPathTests.cs |
Nomic Snapshot
Rebuild (cold)
| Model |
Files |
Chunks |
Elapsed (ms) |
Build (ms) |
Files/s |
Chunks/s |
Avg Embedding ms/call |
Nomic (nomic-embed-text-v1.5.onnx) |
161 |
1746 |
1310318 |
1306000 |
0.12 |
1.34 |
4152.744 |
Query Baseline (warm)
| Query Name |
Avg ms |
Median ms |
Min ms |
Max ms |
Top Document |
| semantic provider path |
134.940 |
277.313 |
17.376 |
277.313 |
MemorySmith.App/Services/SemanticEmbeddingSearchService.cs |
| query telemetry |
84.586 |
131.603 |
22.733 |
131.603 |
MemorySmith.App/Services/CodeSearchService.cs |
| vector prefilter |
144.456 |
347.477 |
16.294 |
347.477 |
MemorySmith.App/Services/CodeSearchService.cs |
| benchmark harness |
170.376 |
367.349 |
18.833 |
367.349 |
MemorySmith.Tests/SearchBenchmarkTests.cs |
| page validation |
104.484 |
270.826 |
20.495 |
270.826 |
MemorySmith.App/Services/PageService.cs |
| semantic provider tests |
126.504 |
224.954 |
32.620 |
224.954 |
MemorySmith.App/Services/SemanticEmbeddingSearchService.cs |
Side-By-Side Summary
Rebuild Cost
- Nomic rebuild elapsed:
1,310,318 ms vs E5 1,055,795 ms (Nomic 24.11% slower).
- Nomic avg embedding call time:
4,152.744 ms vs E5 3,341.042 ms (Nomic 24.29% slower).
Query Latency
| Query Name |
E5 Avg ms |
Nomic Avg ms |
Nomic Faster % |
| semantic provider path |
592.500 |
134.940 |
77.23 |
| query telemetry |
261.165 |
84.586 |
67.61 |
| vector prefilter |
258.636 |
144.456 |
44.15 |
| benchmark harness |
191.579 |
170.376 |
11.07 |
| page validation |
297.627 |
104.484 |
64.89 |
| semantic provider tests |
279.794 |
126.504 |
54.79 |
Top-Result Drift (Quick Read)
- Nomic top hits for implementation-focused telemetry/prefilter queries are
MemorySmith.App/Services/CodeSearchService.cs.
- E5 clean run top hits for those same queries skewed toward test files (
MemorySmith.Tests/CodeSearchServiceTests.cs) under current weighting behavior.
- This suggests Nomic may be more aligned with implementation intent on these sample queries, but a larger fixed-query spot-check set is still recommended before default switch.
Relevance Suite Results (Completed)
The fixed relevance suite now exists in Scripts/code-search-relevance-suite.json and is executed by Scripts/Measure-CodeSearchRelevance.ps1.
Pass/fail totals:
- E5 corrected run (
code-search-relevance-e5-v3-20260528.json): 8/8 passed (100%).
- Nomic run (
code-search-relevance-nomic-v1-5-20260528.json): 8/8 passed (100%).
Case-Level Readout
| Case |
E5 |
Nomic |
E5 Warm Avg ms (no first) |
Nomic Warm Avg ms (no first) |
| implementation-telemetry |
pass |
pass |
2251.307 |
13.331 |
| implementation-prefilter |
pass |
pass |
1931.827 |
66.092 |
| page-validation-intent |
pass |
pass |
2146.240 |
97.564 |
| semantic-provider-path |
pass |
pass |
2211.436 |
14.801 |
| semantic-provider-tests |
pass |
pass |
2113.255 |
41.070 |
| benchmark-harness-intent |
pass |
pass |
2139.772 |
12.637 |
| ranking-weight-implementation |
pass |
pass |
2041.100 |
48.266 |
| nunit-regression-tests |
pass |
pass |
2393.986 |
45.714 |
Interpretation:
- The first E5 relevance sample includes forced rebuild work (
~607,846 ms) by design; warm-only averages above exclude that first sample.
- With fair indexing and fixed expectations, both models meet relevance intent constraints on this suite.
- Nomic remains materially faster under this relevance-suite query workload.
Relevance Spot-Check Plan
- Expand the fixed suite from
8 to 20+ cases covering page routing, auth/admin, chat tools, and task workflows.
- Add expected-in-top-3 constraints (not only top-1) for ambiguous intents.
- Version the suite by date/model family so trend comparisons stay reproducible.
Recommendation
- Keep E5 as the default rebuild path for now because rebuild throughput is materially better (
24% faster on this corpus).
- Continue Nomic as an optional query-optimized candidate because query latency was lower across all measured prompts in this run.
- Keep the new fixed relevance suite in CI and extend it to 20-30 queries with expected top-3 paths by intent category.
- Prioritize separating semantic and code-search embedding configuration seams so model experimentation does not couple both surfaces.