File size: 6,523 Bytes
d83a23f
 
 
 
 
 
 
 
 
 
 
26aa131
d83a23f
 
 
 
 
 
 
 
 
 
 
10b31b2
d83a23f
 
 
 
26aa131
 
d83a23f
 
 
 
 
 
26aa131
10b31b2
 
 
26aa131
 
d83a23f
 
 
 
 
10b31b2
 
 
 
 
d83a23f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5921ea4
 
d83a23f
 
 
 
 
 
26aa131
 
 
 
 
 
 
 
dc4ca26
 
 
 
 
 
 
 
d83a23f
5921ea4
 
 
 
d83a23f
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>FreshStack Leaderboard</title>
  <link rel="preconnect" href="https://fonts.googleapis.com">
  <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
  <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600&family=Outfit:wght@400;500;700&display=swap" rel="stylesheet">
  <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/all.min.css">
  <link rel="stylesheet" href="./style.css">
  <script src="https://cdn.plot.ly/plotly-2.35.2.min.js" charset="utf-8"></script>
</head>
<body>
  <div class="bg-blobs">
    <div class="blob blob-1"></div>
    <div class="blob blob-2"></div>
  </div>

  <header>
    <h1>FreshStack Leaderboard</h1>
    <p class="subtitle">Realistic Retrieval Benchmarking on Technical Documentation</p>
    <p class="intro">
      FreshStack is a holistic framework for building realistic & challenging RAG benchmarks from community-asked questions and answers on niche and fast-growing domains. FreshStack evaluates retrieval models on five domains: <b>LangChain</b>, <b>Yolo v7 &amp; v8</b>, <b>Laravel 10 &amp; 11</b>,
      <b>Angular 16, 17 &amp; 18</b>, and <b>Godot4</b>. Metrics include <b>alpha-nDCG@10</b>, <b>Coverage@20</b>, and <b>Recall@50</b>.
    </p>

    <div class="top-actions">
      <a href="https://openreview.net/forum?id=54TTgXlS2U" target="_blank" class="action-btn"><i class="fa-solid fa-file-lines"></i> Paper</a>
      <button class="action-btn" id="toggle-metrics"><i class="fa-solid fa-chart-line"></i> Metric Details</button>
      <a href="https://github.com/fresh-stack/freshstack" target="_blank" class="action-btn"><i class="fa-brands fa-github"></i> Code</a>
      <a href="https://huggingface.co/freshstack" target="_blank" class="action-btn"><i class="fa-solid fa-database"></i> Dataset</a>
      <a href="https://fresh-stack.github.io/" target="_blank" class="action-btn"><i class="fa-solid fa-house"></i> Project Home</a>
      <button class="action-btn" id="toggle-submit"><i class="fa-solid fa-paper-plane"></i> Submit Here</button>
    </div>

    <div id="metrics-panel" class="panel hidden">
      <p><b>alpha-nDCG@10 (α@10)</b>: diversity-aware ranking metric based on nDCG@10 but penalizes redundant documents (i.e., documents supporting the same nugget) by a geometric factor of alpha. Read more in <a href="https://dl.acm.org/doi/abs/10.1145/1390334.1390446" target="_blank">[Clarke et al. 2008]</a>.</p>
      <p><b>Coverage@20 (C@20)</b>: fraction of unique nuggets supported by top-20 retrieved documents. Defined in our <a href="https://openreview.net/forum?id=54TTgXlS2U" target="_blank">[paper]</a>.</p>
      <p><b>Recall@50 (R@50)</b>: traditional retrieval metric measuring the fraction of relevant documents retrieved in top-50 documents.</p>
    </div>

    <div id="submit-panel" class="panel hidden">
      <p>Submit your results by adding a new row to <code>leaderboard_data.json</code> and opening a PR.</p>
      <p><a href="https://github.com/fresh-stack/fresh-stack.github.io/blob/master/leaderboard_data.json" target="_blank">Open leaderboard_data.json</a></p>
      <textarea readonly rows="14">{
  "info": {
    "name": "Your Model Name", // try to follow the format of other models
    "size": "600M", // in millions (<1B) or billions (7B)
    "type": "open_source", // open_source, proprietary
    "date": "2026-04-07", // date of model release
    "link": "https://model-or-paper-link" // link to model or documentation
  },
  "datasets": {
    "langchain": {"alpha_ndcg_10": 0.000, "coverage_20": 0.000, "recall_50": 0.000},
    "yolo": {"alpha_ndcg_10": 0.000, "coverage_20": 0.000, "recall_50": 0.000},
    "laravel": {"alpha_ndcg_10": 0.000, "coverage_20": 0.000, "recall_50": 0.000},
    "angular": {"alpha_ndcg_10": 0.000, "coverage_20": 0.000, "recall_50": 0.000},
    "godot": {"alpha_ndcg_10": 0.000, "coverage_20": 0.000, "recall_50": 0.000},
    "average": {"alpha_ndcg_10": 0.000, "coverage_20": 0.000, "recall_50": 0.000}
  }
}</textarea>
    </div>
  </header>

  <main>
    <div class="controls">
      <input type="text" id="search" placeholder="Search retriever..." />
      <div class="types">
        <label><input type="checkbox" class="type-filter" value="open_source" checked> Open Source</label>
        <label><input type="checkbox" class="type-filter" value="proprietary" checked> Proprietary</label>
        <label><input type="checkbox" class="type-filter" value="upper_baseline"> Oracle</label>
      </div>
    </div>

    <div class="table-outer">
      <div class="table-wrap">
        <table id="leaderboard-table">
          <thead>
            <tr id="header-row-top"></tr>
            <tr id="header-row-sub"></tr>
          </thead>
          <tbody id="body-row"></tbody>
        </table>
      </div>
    </div>

    <section class="plots">
      <h3>FreshStack Metrics vs. Model Parameters</h3>
      <p class="plot-sub">Average scores across 5 domains vs model parameter size; points are colored by model family.</p>
      <div id="plot-avg-alpha10" class="plot-box"></div>
      <div id="plot-avg-c20" class="plot-box"></div>
      <div id="plot-avg-r50" class="plot-box"></div>
    </section>

    <section class="plots">
      <h3>FreshStack Metrics vs. Model Release Date</h3>
      <p class="plot-sub">Average scores across 5 domains vs model release date; points are colored by model family.</p>
      <div id="plot-date-avg-alpha10" class="plot-box"></div>
      <div id="plot-date-avg-c20" class="plot-box"></div>
      <div id="plot-date-avg-r50" class="plot-box"></div>
    </section>

    <section class="citation">
      <div class="citation-head">
        <h3>Cite FreshStack</h3>
        <button id="copy-citation-btn"><i class="fa-regular fa-copy"></i> Copy</button>
      </div>
      <pre id="citation-text">@inproceedings{
  thakur2025freshstack,
  title={FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents},
  author={Nandan Thakur and Jimmy Lin and Sam Havens and Michael Carbin and Omar Khattab and Andrew Drozdov},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2025},
  url={https://openreview.net/forum?id=54TTgXlS2U}
}</pre>
    </section>
  </main>

  <script type="module" src="./main.js"></script>
</body>
</html>