artefactory
/

BERTJudge

@@ -6,7 +6,7 @@ language:
 base_model:
 - EuroBERT/EuroBERT-210m
 ---
-# BERTJudge-Free-QCR
 BERT-as-a-Judge is a family of encoder-based models designed for efficient, reference-based evaluation of LLM outputs. Moving beyond rigid lexical extraction and matching, these models evaluate semantic correctness, accommodating variations in phrasing and formatting while using only a fraction of the computational resources required by LLM-as-a-Judge approaches.
@@ -16,23 +16,9 @@ BERT-as-a-Judge is a family of encoder-based models designed for efficient, refe
 - **Model Type:** Encoder-based Judge (EuroBERT-210m backbone)
 - **Language:** English
-## Naming Convention Breakdown
-Models follow a standardized naming structure: `BERTJudge-<Candidate_Format>-<Input_Structure>-<Additional_Info>`.
-* **Candidate Format:**
-  * `Free`: Trained on unconstrained model generations.
-  * `Formatted`: Trained on outputs that adhere to specific structural constraints. For optimized evaluation under the formatted setup, candidate outputs should ideally conclude with `"Final answer: <final_answer>"` (see the paper for details).
-* **Input Structure:**
-  * `QCR`: The input sequence consists of [Question, Candidate, Reference].
-  * `CR`: The input sequence consists only of [Candidate, Reference].
-* **Additional Info:**
-  * `OOD`: Indicates evaluation of Out-of-Distribution performance (where specific generative models were withheld during training).
-  * `100k/200k/500k`: Denotes the total training steps (default regime being 1 million).
 ## Intended Use
-These models are designed as sequence classifiers that output a sigmoid score indicating answer correctness. For inference, we suggest using the [BERT-as-a-Judge](https://github.com/artefactory/BERT-as-a-Judge) package. In most scenarios, we specifically recommend **BERTJudge-Free-QCR** for its superior and more robust evaluation performance.
 ### Installation
@@ -51,7 +37,7 @@ from bert_judge.judges import BERTJudge
 # 1) Initialize the judge
 judge = BERTJudge(
-    model_path="artefactory/BERTJudge-Free-QCR",
     trust_remote_code=True,
     dtype="bfloat16",
 )
@@ -79,6 +65,22 @@ scores = judge.predict(
 print(scores)
 ```
 ## Citation
 If you find this model useful for your research, please consider citing:

 base_model:
 - EuroBERT/EuroBERT-210m
 ---
+# BERTJudge
 BERT-as-a-Judge is a family of encoder-based models designed for efficient, reference-based evaluation of LLM outputs. Moving beyond rigid lexical extraction and matching, these models evaluate semantic correctness, accommodating variations in phrasing and formatting while using only a fraction of the computational resources required by LLM-as-a-Judge approaches.
 - **Model Type:** Encoder-based Judge (EuroBERT-210m backbone)
 - **Language:** English
 ## Intended Use
+BERTJudge models are designed as sequence classifiers that output a sigmoid score reflecting answer correctness. For inference, we suggest using the [BERT-as-a-Judge](https://github.com/artefactory/BERT-as-a-Judge) package.
 ### Installation
 # 1) Initialize the judge
 judge = BERTJudge(
+    model_path="artefactory/BERTJudge",
     trust_remote_code=True,
     dtype="bfloat16",
 )
 print(scores)
 ```
+## Naming Convention Breakdown
+Models follow a standardized naming structure: `BERTJudge-<Candidate_Format>-<Input_Structure>-<Additional_Info>`.
+* **Candidate Format:**
+  * `Free`: Trained on unconstrained model generations.
+  * `Formatted`: Trained on outputs that adhere to specific structural constraints. For optimized evaluation under the formatted setup, candidate outputs should ideally conclude with `"Final answer: <final_answer>"` (see the paper for details).
+* **Input Structure:**
+  * `QCR`: The input sequence consists of [Question, Candidate, Reference].
+  * `CR`: The input sequence consists only of [Candidate, Reference].
+* **Additional Info:**
+  * `OOD`: Indicates evaluation of Out-of-Distribution performance (where specific generative models were withheld during training).
+  * `100k/200k/500k`: Denotes the total training steps (default regime being 1 million).
+**Note: For optimal evaluation performance, we recommend using `BERTJudge-Free-QCR`, available as `artefactory/BERTJudge`.**
 ## Citation
 If you find this model useful for your research, please consider citing: