Retrieval strategies — by metric
Five aedln0d retrieval strategies measured against a fixed SSMFS eval set. Higher exact means the expected URL substring lands in the top-K hits. domain means at least one SSM-domain page lands in the top-K. kw_cov measures section-keyword coverage in the retrieved chunks. Latency is end-to-end /search round-trip. Candidates is chunks considered before the final top-K filter — a proxy for compute spent.
Data: shell/landing/dash-retrieval-latest.json,
re-written by scripts/kgrag-nested-bench.py
(Makefile target kgrag-bench) on every run.
| strategy | exact | domain | kw_cov | p50_ms | p95_ms | cand | exact trend |
|---|---|---|---|---|---|---|---|
| loading… | |||||||
flat — current default, hybrid search with no
client-side filter. scoped_nation &
scoped_regulator — pass server-side filters.
client_nested — over-fetch k×3 and bias toward
the question's regulator domain (the strategy now live on
the booster /ask/lenses path for SSM-shaped queries).
two_stage_rerank — BM25 first then dense-rerank
the top-K; consistently worst on this eval set because the
BM25 stage drops semantic matches before dense can rescue.