| <!DOCTYPE html> |
| <html> |
| <head> |
| <meta charset="utf-8"> |
| <meta name="description" content="Atla Selene Mini: A General Purpose Evaluation Model"> |
| <meta name="viewport" content="width=device-width, initial-scale=1"> |
| <title>Atla Selene Mini: A General Purpose Evaluation Model</title> |
|
|
| <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet"> |
| <link rel="stylesheet" href="./static/css/bulma.min.css"> |
| <link rel="stylesheet" href="./static/css/bulma-carousel.min.css"> |
| <link rel="stylesheet" href="./static/css/bulma-slider.min.css"> |
| <link rel="stylesheet" href="./static/css/fontawesome.all.min.css"> |
| <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css"> |
| <link rel="stylesheet" href="./static/css/index.css"> |
| <link rel="icon" href="./static/images/favicon.svg"> |
|
|
| <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script> |
| <script defer src="./static/js/fontawesome.all.min.js"></script> |
| <script src="./static/js/bulma-carousel.min.js"></script> |
| <script src="./static/js/bulma-slider.min.js"></script> |
| <script src="./static/js/index.js"></script> |
|
|
| <style> |
| @keyframes rainbow-shimmer { |
| 0% { background-position: 0% 50%; } |
| 50% { background-position: 100% 50%; } |
| 100% { background-position: 0% 50%; } |
| } |
| |
| .rainbow-button { |
| background: linear-gradient(45deg, #ff0000, #ff7f00, #ffff00, #00ff00, #0000ff, #8b00ff); |
| background-size: 300% 300%; |
| color: white !important; |
| font-weight: bold; |
| animation: rainbow-shimmer 5s ease infinite; |
| transition: all 0.3s ease; |
| } |
| |
| .rainbow-button:hover { |
| transform: scale(1.05); |
| box-shadow: 0 0 10px rgba(0,0,0,0.2); |
| } |
| </style> |
| </head> |
| <body> |
|
|
| <section class="hero"> |
| <div class="hero-body"> |
| <div class="container is-max-desktop"> |
| <div class="columns is-centered"> |
| <div class="column has-text-centered"> |
| <h1 class="title is-1 publication-title">Atla Selene Mini:<br>A General Purpose Evaluation Model</h1> |
| <div class="is-size-5 publication-authors"> |
| <span class="author-block"> |
| <a href="https://huggingface.co/inwaves" target="_blank">Andrei Alexandru</a><sup>1</sup>,</span> |
| <span class="author-block"> |
| <a href="https://huggingface.co/NinaCalvi" target="_blank">Antonia Calvi</a><sup>1</sup>,</span> |
| <span class="author-block"> |
| <a href="https://huggingface.co/HennersBro98" target="_blank">Henry Broomfield</a><sup>1</sup>,</span> |
| <span class="author-block"> |
| <a href="https://huggingface.co/jacksongolden" target="_blank">Jackson Golden</a><sup>1</sup>,</span> |
| <span class="author-block"> |
| <a href="https://huggingface.co/kaikaidai" target="_blank">Kyle Dai</a><sup>1</sup>,</span> |
| </div> |
| <div class="is-size-5 publication-authors"> |
| <span class="author-block"> |
| <a href="https://huggingface.co/mathias-atla" target="_blank">Mathias Leys</a><sup>1</sup>,</span> |
| <span class="author-block"> |
| <a href="https://huggingface.co/MauriceBurg" target="_blank">Maurice Burger</a><sup>1</sup>,</span> |
| <span class="author-block"> |
| <a href="https://huggingface.co/mbartolo" target="_blank">Max Bartolo</a><sup>2,3</sup>,</span> |
| <span class="author-block"> |
| <a href="https://huggingface.co/RomanEngeler1805" target="_blank">Roman Engeler</a><sup>1</sup>,</span> |
| </div> |
| <div class="is-size-5 publication-authors"> |
| <span class="author-block"> |
| <a href="https://huggingface.co/spisupat" target="_blank">Sashank Pisupati</a><sup>1</sup>,</span> |
| <span class="author-block"> |
| <a href="https://huggingface.co/tobydrane" target="_blank">Toby Drane</a><sup>1</sup>,</span> |
| <span class="author-block"> |
| <a href="https://huggingface.co/youngsunpark" target="_blank">Young Sun Park</a><sup>1</sup></span> |
| </div> |
|
|
| <div class="is-size-5 publication-authors"> |
| <span class="author-block"><sup>1</sup>atla,</span> |
| <span class="author-block"><sup>2</sup>University College London,</span> |
| <span class="author-block"><sup>3</sup>Cohere</span> |
| </div> |
|
|
| <div class="column has-text-centered"> |
| <div class="publication-links"> |
| |
| <span class="link-block"> |
| <a href="https://arxiv.org/pdf/2501.17195v1" target="_blank" |
| class="external-link button is-normal is-rounded is-dark"> |
| <span class="icon"> |
| <i class="fas fa-file-pdf"></i> |
| </span> |
| <span>arXiv</span> |
| </a> |
| </span> |
| |
| <span class="link-block"> |
| <a href="https://hf.co/AtlaAI/Selene-1-Mini-Llama-3.1-8B" target="_blank" |
| class="external-link button is-normal is-rounded is-dark"> |
| <span class="icon"> |
| 🤗 |
| </span> |
| <span>HuggingFace</span> |
| </a> |
| </span> |
| |
| <span class="link-block"> |
| <a href="https://github.com/atla-ai/selene-mini" target="_blank" |
| class="external-link button is-normal is-rounded is-dark"> |
| <span class="icon"> |
| <i class="fab fa-github"></i> |
| </span> |
| <span>Cookbooks</span> |
| </a> |
| </span> |
| |
| <span class="link-block"> |
| <a href="https://ollama.com/atla/selene-mini" target="_blank" |
| class="external-link button is-normal is-rounded is-dark"> |
| <span class="icon"> |
| <i class="fas fa-code"></i> |
| </span> |
| <span>Ollama</span> |
| </a> |
| </span> |
| </div> |
|
|
| |
| <div class="publication-links" style="margin-top: 1rem;"> |
| <span class="link-block"> |
| <a href="https://www.atla-ai.com" target="_blank" |
| class="external-link button is-normal is-rounded rainbow-button"> |
| <span>Learn more</span> |
| </a> |
| </span> |
| </div> |
| </div> |
| </div> |
| </div> |
| </div> |
| </div> |
| </section> |
|
|
| <section class="section" style="padding-top: 0;"> |
| <div class="container is-max-desktop"> |
| |
| <div class="columns is-centered has-text-centered"> |
| <div class="column is-2"> |
| <div style="max-width: 200px; margin: 0 auto;"> |
| <img src="figs/atla-logo.png" alt="Atla Logo" style="width: 100%; height: auto;"> |
| </div> |
| </div> |
| </div> |
|
|
| |
| <div class="columns is-centered has-text-centered"> |
| <div class="column is-four-fifths"> |
| <h2 class="title is-3">Abstract</h2> |
| <div class="content has-text-justified"> |
| <p> |
| We introduce Atla Selene Mini, a state-of-the-art small language model-as-a-judge (SLMJ). Selene Mini is a general-purpose evaluator that outperforms the best SLMJs and GPT-4o-mini on overall performance across 11 out-of-distribution benchmarks, spanning absolute scoring, classification, and pairwise preference tasks. It is the highest-scoring 8B generative model on RewardBench, surpassing strong baselines like GPT-4o and specialized judges. |
| </p> |
| <p> |
| To achieve this, we develop a principled data curation strategy that augments public datasets with synthetically generated critiques and ensures high quality through filtering and dataset ablations. We train our model on a combined direct preference optimization (DPO) and supervised fine-tuning (SFT) loss, and produce a highly promptable evaluator that excels in real-world scenarios. |
| </p> |
| <p> |
| Selene Mini shows dramatically improved zero-shot agreement with human expert evaluations on financial and medical industry datasets. It is also robust to variations in prompt format. Preliminary results indicate that Selene Mini is the top-ranking evaluator in a live, community-driven <a href="https://huggingface.co/blog/arena-atla" target="_blank">Judge Arena</a>. We release the model weights on <a href="https://hf.co/AtlaAI/Selene-1-Mini-Llama-3.1-8B" target="_blank">HuggingFace</a> and <a href="https://ollama.com/atla/selene-mini" target="_blank">Ollama</a> to encourage widespread community adoption. |
| </p> |
| </div> |
| </div> |
| </div> |
|
|
| |
| <div class="columns is-centered"> |
| <div class="column is-four-fifths"> |
| <div class="content has-text-centered"> |
| <video controls width="800" autoplay loop muted> |
| <source src="figs/demo.mp4" type="video/mp4"> |
| Your browser does not support the video element. |
| </video> |
| <p class="subtitle"> |
| Demo of Atla Selene Mini on our playground |
| </p> |
| </div> |
| </div> |
| </div> |
|
|
| |
| <div class="columns is-centered"> |
| <div class="column is-four-fifths"> |
| <h2 class="title is-3 has-text-centered">Key Results</h2> |
| <div class="content has-text-justified"> |
| <div class="columns is-centered has-text-centered"> |
| Read the full technical report <a href="https://arxiv.org/pdf/2501.17195v1" target="_blank">here</a> |
| </div> |
| <figure class="image"> |
| <img src="figs/Fig1.png" alt="Performance comparison"> |
| <figcaption> |
| <b>Figure 1:</b> Atla Selene Mini outperforms current state-of-the-art SLMJs: a) Overall task-average performance, comparing Atla Selene Mini (black) with the best and most widely used SLMJs. b) Breakdown of performance by task type and benchmark. |
| </figcaption> |
| </figure> |
|
|
| <figure class="image"> |
| <img src="figs/Fig2.png" alt="Data curation strategy"> |
| <figcaption> |
| <b>Figure 2:</b> Data curation strategy: The process of transforming a candidate dataset (left) into the final training mix (right). Yellow boxes indicate filtering steps, purple represents synthetic generation of chosen and rejected pairs for preference optimization. |
| </figcaption> |
| </figure> |
|
|
| <figure class="image"> |
| <img src="figs/Fig3.png" alt="Real-world evaluation"> |
| <figcaption> |
| <b>Figure 3:</b> Real-world evaluation: a) Performance on domain-specific industry benchmarks b) Performance on RewardBench with different prompt formats c) Performance measured by ELO scores in Judge Arena. |
| </figcaption> |
| </figure> |
|
|
| <div class="columns is-centered has-text-centered"> |
| We’re now using everything we’ve learned to power the next frontier:  <a href="https://calendly.com/atla-team/agents-demo" target="_blank">agent evaluation.</a>  |
| </div> |
| </div> |
| </div> |
| </div> |
| </div> |
| </section> |
|
|
| <footer class="footer"> |
| <div class="container"> |
| <div class="content has-text-centered"> |
| <p> |
| © 2025 Atla AI |
| </p> |
| </div> |
| </div> |
| </footer> |
|
|
| </body> |
| </html> |