nielsr HF Staff commited on
Commit
b61b108
·
verified ·
1 Parent(s): be487aa

Add pipeline tag, library name and improve model card

Browse files

Hi, I'm Niels from the Hugging Face community science team.

I've opened this PR to improve the model card for **BERTJudge-Formatted-CR**:
- Added the `text-classification` pipeline tag and `transformers` library name to ensure the model is correctly categorized and the "Use in Transformers" button works.
- Added `license: apache-2.0` metadata.
- Improved the Markdown content by linking the [original paper](https://huggingface.co/papers/2604.09497) and the [GitHub repository](https://github.com/artefactory/BERT-as-a-Judge).
- Added a usage section based on the provided GitHub documentation to help researchers get started with the `bert-judge` library.

Please let me know if you'd like me to adjust anything!

Files changed (1) hide show
  1. README.md +21 -18
README.md CHANGED
@@ -1,17 +1,23 @@
1
  ---
 
 
2
  datasets:
3
  - hgissbkh/BERTJudge-Dataset
4
  language:
5
  - en
6
- base_model:
7
- - EuroBERT/EuroBERT-210m
 
8
  ---
 
9
  # BERTJudge-Formatted-CR
10
 
11
  BERT-as-a-Judge is a family of encoder-based models designed for efficient, reference-based evaluation of LLM outputs. Moving beyond rigid lexical extraction and matching, these models evaluate semantic correctness, accommodating variations in phrasing and formatting while using only a fraction of the computational resources required by LLM-as-a-Judge approaches.
12
 
 
 
13
  ## Model Summary
14
- - **Paper:** [BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation](https://arxiv.org/abs/2604.09497)
15
  - **Code:** [https://github.com/artefactory/BERT-as-a-Judge](https://github.com/artefactory/BERT-as-a-Judge)
16
  - **Model Type:** Encoder-based Judge (EuroBERT-210m backbone)
17
  - **Language:** English
@@ -22,7 +28,7 @@ BERTJudge models are designed as sequence classifiers that output a sigmoid scor
22
 
23
  ### Installation
24
 
25
- ```zsh
26
  git clone https://github.com/artefactory/BERT-as-a-Judge.git
27
  cd BERT-as-a-Judge
28
  pip install -e .
@@ -30,33 +36,30 @@ pip install -e .
30
 
31
  ### Usage
32
 
33
- Example:
34
 
35
  ```python
36
  from bert_judge.judges import BERTJudge
37
 
38
  # 1) Initialize the judge
39
  judge = BERTJudge(
40
- model_path="artefactory/BERTJudge",
41
  trust_remote_code=True,
42
  dtype="bfloat16",
43
  )
44
 
45
- # 2) Define one question, one reference, and several candidate answers
46
- question = "What is the capital of France?"
47
  reference = "Paris"
48
  candidates = [
49
  "Paris.",
50
  "The capital of France is Paris.",
51
- "I'm hesitating between Paris and London. I would say Paris.",
52
  "London.",
53
- "The capital of France is London.",
54
- "I'm hesitating between Paris and London. I would say London.",
55
  ]
56
 
57
  # 3) Predict scores (one score per candidate)
58
  scores = judge.predict(
59
- questions=[question] * len(candidates),
60
  references=[reference] * len(candidates),
61
  candidates=candidates,
62
  batch_size=1,
@@ -71,24 +74,24 @@ Models follow a standardized naming structure: `BERTJudge-<Candidate_Format>-<In
71
 
72
  * **Candidate Format:**
73
  * `Free`: Trained on unconstrained model generations.
74
- * `Formatted`: Trained on outputs that adhere to specific structural constraints. For optimized evaluation under the formatted setup, candidate outputs should ideally conclude with `"Final answer: <final_answer>"` (see the paper for details).
75
  * **Input Structure:**
76
  * `QCR`: The input sequence consists of [Question, Candidate, Reference].
77
  * `CR`: The input sequence consists only of [Candidate, Reference].
78
  * **Additional Info:**
79
- * `OOD`: Indicates evaluation of Out-of-Distribution performance (where specific generative models were withheld during training).
80
- * `100k/200k/500k`: Denotes the total training steps (default regime being 1 million).
81
 
82
- **Note: For optimal evaluation performance, we recommend using `BERTJudge-Free-QCR`, available as `artefactory/BERTJudge`.**
83
 
84
  ## Citation
85
 
86
  If you find this model useful for your research, please consider citing:
87
 
88
- ```
89
  @article{gisserotboukhlef2026bertasajudgerobustalternativelexical,
90
  title={BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation},
91
- author={Gisserot-Boukhlef, Hippolyte and Boizard, Nicolas and Malherbe, Emmanuel and Hudelot, C{\'e}line and Colombo, Pierre},
92
  year={2026},
93
  eprint={2604.09497},
94
  archivePrefix={arXiv},
 
1
  ---
2
+ base_model:
3
+ - EuroBERT/EuroBERT-210m
4
  datasets:
5
  - hgissbkh/BERTJudge-Dataset
6
  language:
7
  - en
8
+ library_name: transformers
9
+ pipeline_tag: text-classification
10
+ license: apache-2.0
11
  ---
12
+
13
  # BERTJudge-Formatted-CR
14
 
15
  BERT-as-a-Judge is a family of encoder-based models designed for efficient, reference-based evaluation of LLM outputs. Moving beyond rigid lexical extraction and matching, these models evaluate semantic correctness, accommodating variations in phrasing and formatting while using only a fraction of the computational resources required by LLM-as-a-Judge approaches.
16
 
17
+ This specific variant, **BERTJudge-Formatted-CR**, is optimized for evaluating candidate answers that adhere to specific structural constraints (formatted) and utilizes the **[Candidate, Reference]** input structure.
18
+
19
  ## Model Summary
20
+ - **Paper:** [BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation](https://huggingface.co/papers/2604.09497)
21
  - **Code:** [https://github.com/artefactory/BERT-as-a-Judge](https://github.com/artefactory/BERT-as-a-Judge)
22
  - **Model Type:** Encoder-based Judge (EuroBERT-210m backbone)
23
  - **Language:** English
 
28
 
29
  ### Installation
30
 
31
+ ```bash
32
  git clone https://github.com/artefactory/BERT-as-a-Judge.git
33
  cd BERT-as-a-Judge
34
  pip install -e .
 
36
 
37
  ### Usage
38
 
39
+ Example using the `bert_judge` library:
40
 
41
  ```python
42
  from bert_judge.judges import BERTJudge
43
 
44
  # 1) Initialize the judge
45
  judge = BERTJudge(
46
+ model_path="hgissbkh/BERTJudge-Formatted-CR",
47
  trust_remote_code=True,
48
  dtype="bfloat16",
49
  )
50
 
51
+ # 2) Define a reference and several candidate answers
52
+ # Note: For CR models, the question is not used in the sequence
53
  reference = "Paris"
54
  candidates = [
55
  "Paris.",
56
  "The capital of France is Paris.",
 
57
  "London.",
 
 
58
  ]
59
 
60
  # 3) Predict scores (one score per candidate)
61
  scores = judge.predict(
62
+ questions=[""] * len(candidates),
63
  references=[reference] * len(candidates),
64
  candidates=candidates,
65
  batch_size=1,
 
74
 
75
  * **Candidate Format:**
76
  * `Free`: Trained on unconstrained model generations.
77
+ * `Formatted`: Trained on outputs that adhere to specific structural constraints (ideally concluding with `"Final answer: <final_answer>"`).
78
  * **Input Structure:**
79
  * `QCR`: The input sequence consists of [Question, Candidate, Reference].
80
  * `CR`: The input sequence consists only of [Candidate, Reference].
81
  * **Additional Info:**
82
+ * `OOD`: Indicates evaluation of Out-of-Distribution performance.
83
+ * `100k/200k/500k`: Denotes the total training steps (default is 1 million).
84
 
85
+ **Note: For optimal general evaluation performance, the authors recommend using `BERTJudge-Free-QCR`, available as `artefactory/BERTJudge`.**
86
 
87
  ## Citation
88
 
89
  If you find this model useful for your research, please consider citing:
90
 
91
+ ```bibtex
92
  @article{gisserotboukhlef2026bertasajudgerobustalternativelexical,
93
  title={BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation},
94
+ author={Gisserot-Boukhlef, Hippolyte and Boizard, Nicolas and Malherbe, Emmanuel and Hudelot, C{\\'e}line and Colombo, Pierre},
95
  year={2026},
96
  eprint={2604.09497},
97
  archivePrefix={arXiv},