File size: 3,663 Bytes
ff69d25 a967838 ff69d25 8de18f7 ff69d25 a967838 ff69d25 a6bb318 68d9058 8c1e8b3 68d9058 af3a616 f347901 68d9058 a6bb318 68d9058 c6c4ae7 68d9058 c6c4ae7 1c40d36 c6c4ae7 68d9058 c6c4ae7 68d9058 c6c4ae7 68d9058 c6c4ae7 a6bb318 c6c4ae7 68d9058 c6c4ae7 42b3895 a6bb318 42b3895 eb5e86e 42b3895 664e158 42b3895 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | ---
base_model:
- EuroBERT/EuroBERT-210m
datasets:
- hgissbkh/BERTJudge-Dataset
language:
- en
library_name: transformers
pipeline_tag: text-classification
---
# BERTJudge
BERT-as-a-Judge is a family of encoder-based models designed for efficient, reference-based evaluation of LLM outputs. Moving beyond rigid lexical extraction and matching, these models evaluate semantic correctness, accommodating variations in phrasing and formatting while using only a fraction of the computational resources required by LLM-as-a-Judge approaches.
## Model Summary
- **Paper:** [BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation](https://huggingface.co/papers/2604.09497)
- **Code:** [https://github.com/artefactory/BERT-as-a-Judge](https://github.com/artefactory/BERT-as-a-Judge)
- **Model Type:** Encoder-based Judge (EuroBERT-210m backbone)
- **Language:** English
## Intended Use
BERTJudge models are designed as sequence classifiers that output a sigmoid score reflecting answer correctness. For inference, we suggest using the [BERT-as-a-Judge](https://github.com/artefactory/BERT-as-a-Judge) package.
### Installation
```zsh
git clone https://github.com/artefactory/BERT-as-a-Judge.git
cd BERT-as-a-Judge
pip install -e .
```
### Usage
Example:
```python
from bert_judge.judges import BERTJudge
# 1) Initialize the judge
judge = BERTJudge(
model_path="artefactory/BERTJudge",
trust_remote_code=True,
dtype="bfloat16",
)
# 2) Define one question, one reference, and several candidate answers
question = "What is the capital of France?"
reference = "Paris"
candidates = [
"Paris.",
"The capital of France is Paris.",
"I'm hesitating between Paris and London. I would say Paris.",
"London.",
"The capital of France is London.",
"I'm hesitating between Paris and London. I would say London.",
]
# 3) Predict scores (one score per candidate)
scores = judge.predict(
questions=[question] * len(candidates),
references=[reference] * len(candidates),
candidates=candidates,
batch_size=1,
)
print(scores)
```
## Naming Convention Breakdown
Models follow a standardized naming structure: `BERTJudge-<Candidate_Format>-<Input_Structure>-<Additional_Info>`.
* **Candidate Format:**
* `Free`: Trained on unconstrained model generations.
* `Formatted`: Trained on outputs that adhere to specific structural constraints. For optimized evaluation under the formatted setup, candidate outputs should ideally conclude with `"Final answer: <final_answer>"` (see the paper for details).
* **Input Structure:**
* `QCR`: The input sequence consists of [Question, Candidate, Reference].
* `CR`: The input sequence consists only of [Candidate, Reference].
* **Additional Info:**
* `OOD`: Indicates evaluation of Out-of-Distribution performance (where specific generative models were withheld during training).
* `100k/200k/500k`: Denotes the total training steps (default regime being 1 million).
**Note: For optimal evaluation performance, we recommend using `BERTJudge-Free-QCR`, available as `artefactory/BERTJudge`.**
## Citation
If you find this model useful for your research, please consider citing:
```
@article{gisserotboukhlef2026bertasajudgerobustalternativelexical,
title={BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation},
author={Gisserot-Boukhlef, Hippolyte and Boizard, Nicolas and Malherbe, Emmanuel and Hudelot, C{\'e}line and Colombo, Pierre},
year={2026},
eprint={2604.09497},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2604.09497}
}
``` |