Add pipeline tag, library name and improve model card

Hi, I'm Niels from the Hugging Face community science team.

I've opened this PR to improve the model card for **BERTJudge-Formatted-CR**:
- Added the `text-classification` pipeline tag and `transformers` library name to ensure the model is correctly categorized and the "Use in Transformers" button works.
- Added `license: apache-2.0` metadata.
- Improved the Markdown content by linking the [original paper](https://huggingface.co/papers/2604.09497) and the [GitHub repository](https://github.com/artefactory/BERT-as-a-Judge).
- Added a usage section based on the provided GitHub documentation to help researchers get started with the `bert-judge` library.

Please let me know if you'd like me to adjust anything!

Files changed (1) hide show

README.md +21 -18

README.md CHANGED Viewed

@@ -1,17 +1,23 @@
 ---
 datasets:
 - hgissbkh/BERTJudge-Dataset
 language:
 - en
-base_model:
-- EuroBERT/EuroBERT-210m
 ---
 # BERTJudge-Formatted-CR
 BERT-as-a-Judge is a family of encoder-based models designed for efficient, reference-based evaluation of LLM outputs. Moving beyond rigid lexical extraction and matching, these models evaluate semantic correctness, accommodating variations in phrasing and formatting while using only a fraction of the computational resources required by LLM-as-a-Judge approaches.
 ## Model Summary
-- **Paper:** [BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation](https://arxiv.org/abs/2604.09497)
 - **Code:** [https://github.com/artefactory/BERT-as-a-Judge](https://github.com/artefactory/BERT-as-a-Judge)
 - **Model Type:** Encoder-based Judge (EuroBERT-210m backbone)
 - **Language:** English
@@ -22,7 +28,7 @@ BERTJudge models are designed as sequence classifiers that output a sigmoid scor
 ### Installation
-```zsh
 git clone https://github.com/artefactory/BERT-as-a-Judge.git
 cd BERT-as-a-Judge
 pip install -e .
@@ -30,33 +36,30 @@ pip install -e .
 ### Usage
-Example:
 ```python
 from bert_judge.judges import BERTJudge
 # 1) Initialize the judge
 judge = BERTJudge(
-    model_path="artefactory/BERTJudge",
     trust_remote_code=True,
     dtype="bfloat16",
 )
-# 2) Define one question, one reference, and several candidate answers
-question = "What is the capital of France?"
 reference = "Paris"
 candidates = [
     "Paris.",
     "The capital of France is Paris.",
-    "I'm hesitating between Paris and London. I would say Paris.",
     "London.",
-    "The capital of France is London.",
-    "I'm hesitating between Paris and London. I would say London.",
 ]
 # 3) Predict scores (one score per candidate)
 scores = judge.predict(
-    questions=[question] * len(candidates),
     references=[reference] * len(candidates),
     candidates=candidates,
     batch_size=1,
@@ -71,24 +74,24 @@ Models follow a standardized naming structure: `BERTJudge-<Candidate_Format>-<In
 * **Candidate Format:**
   * `Free`: Trained on unconstrained model generations.
-  * `Formatted`: Trained on outputs that adhere to specific structural constraints. For optimized evaluation under the formatted setup, candidate outputs should ideally conclude with `"Final answer: <final_answer>"` (see the paper for details).
 * **Input Structure:**
   * `QCR`: The input sequence consists of [Question, Candidate, Reference].
   * `CR`: The input sequence consists only of [Candidate, Reference].
 * **Additional Info:**
-  * `OOD`: Indicates evaluation of Out-of-Distribution performance (where specific generative models were withheld during training).
-  * `100k/200k/500k`: Denotes the total training steps (default regime being 1 million).
-**Note: For optimal evaluation performance, we recommend using `BERTJudge-Free-QCR`, available as `artefactory/BERTJudge`.**
 ## Citation
 If you find this model useful for your research, please consider citing:
-```
 @article{gisserotboukhlef2026bertasajudgerobustalternativelexical,
   title={BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation},
-  author={Gisserot-Boukhlef, Hippolyte and Boizard, Nicolas and Malherbe, Emmanuel and Hudelot, C{\'e}line and Colombo, Pierre},
   year={2026},
   eprint={2604.09497},
   archivePrefix={arXiv},

 ---
+base_model:
+- EuroBERT/EuroBERT-210m
 datasets:
 - hgissbkh/BERTJudge-Dataset
 language:
 - en
+library_name: transformers
+pipeline_tag: text-classification
+license: apache-2.0
 ---
 # BERTJudge-Formatted-CR
 BERT-as-a-Judge is a family of encoder-based models designed for efficient, reference-based evaluation of LLM outputs. Moving beyond rigid lexical extraction and matching, these models evaluate semantic correctness, accommodating variations in phrasing and formatting while using only a fraction of the computational resources required by LLM-as-a-Judge approaches.
+This specific variant, **BERTJudge-Formatted-CR**, is optimized for evaluating candidate answers that adhere to specific structural constraints (formatted) and utilizes the **[Candidate, Reference]** input structure.
 ## Model Summary
+- **Paper:** [BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation](https://huggingface.co/papers/2604.09497)
 - **Code:** [https://github.com/artefactory/BERT-as-a-Judge](https://github.com/artefactory/BERT-as-a-Judge)
 - **Model Type:** Encoder-based Judge (EuroBERT-210m backbone)
 - **Language:** English
 ### Installation
+```bash
 git clone https://github.com/artefactory/BERT-as-a-Judge.git
 cd BERT-as-a-Judge
 pip install -e .
 ### Usage
+Example using the `bert_judge` library:
 ```python
 from bert_judge.judges import BERTJudge
 # 1) Initialize the judge
 judge = BERTJudge(
+    model_path="hgissbkh/BERTJudge-Formatted-CR",
     trust_remote_code=True,
     dtype="bfloat16",
 )
+# 2) Define a reference and several candidate answers
+# Note: For CR models, the question is not used in the sequence
 reference = "Paris"
 candidates = [
     "Paris.",
     "The capital of France is Paris.",
     "London.",
 ]
 # 3) Predict scores (one score per candidate)
 scores = judge.predict(
+    questions=[""] * len(candidates),
     references=[reference] * len(candidates),
     candidates=candidates,
     batch_size=1,
 * **Candidate Format:**
   * `Free`: Trained on unconstrained model generations.
+  * `Formatted`: Trained on outputs that adhere to specific structural constraints (ideally concluding with `"Final answer: <final_answer>"`).
 * **Input Structure:**
   * `QCR`: The input sequence consists of [Question, Candidate, Reference].
   * `CR`: The input sequence consists only of [Candidate, Reference].
 * **Additional Info:**
+  * `OOD`: Indicates evaluation of Out-of-Distribution performance.
+  * `100k/200k/500k`: Denotes the total training steps (default is 1 million).
+**Note: For optimal general evaluation performance, the authors recommend using `BERTJudge-Free-QCR`, available as `artefactory/BERTJudge`.**
 ## Citation
 If you find this model useful for your research, please consider citing:
+```bibtex
 @article{gisserotboukhlef2026bertasajudgerobustalternativelexical,
   title={BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation},
+  author={Gisserot-Boukhlef, Hippolyte and Boizard, Nicolas and Malherbe, Emmanuel and Hudelot, C{\\'e}line and Colombo, Pierre},
   year={2026},
   eprint={2604.09497},
   archivePrefix={arXiv},