Evaluation Dashboard
BEIR Benchmark — Full pipeline (Dense + BM25 + RRF + Cross-Encoder)
{% if datasets %}
{% for d in datasets %}

{% if d.name == "scifact" %}🔬{% else %}🏥{% endif %} {{ d.name | title }} — {{ d.queries }} queries

NDCG@10
{{ "%.4f" | format(d.ndcg) }}
MRR
{{ "%.4f" | format(d.mrr) }}
MAP@100
{{ "%.4f" | format(d.map) }}
Recall@100
{{ "%.4f" | format(d.recall) }}
P@10
{{ "%.4f" | format(d.precision) }}
{% endfor %}
{% for d in datasets %}
Ablation Table — {{ d.name | title }}
{% for mode_name, m in d.modes.items() %} {% endfor %}
Mode NDCG@10 MAP@100 MRR Recall@100 P@10
{{ mode_name }} {{ "%.4f" | format(m.get("NDCG@10", 0)) }} {{ "%.4f" | format(m.get("MAP@100", 0)) }} {{ "%.4f" | format(m.get("MRR", 0)) }} {{ "%.4f" | format(m.get("Recall@100", 0)) }} {{ "%.4f" | format(m.get("P@10", 0)) }}
{% endfor %} {% else %}

No evaluation results found.

Run: python -m evaluation.run_eval --datasets scifact nfcorpus --mode all

{% endif %}