artefactory
/

BERTJudge

Text Classification

Model card Files Files and versions

hgissbkh commited on 13 days ago

Commit

68d9058

·

verified ·

1 Parent(s): 8a32eb3

Create README.md

Files changed (1) hide show

README.md +64 -0

README.md ADDED Viewed

	@@ -0,0 +1,64 @@

+# BERT-as-a-Judge: A Robust Alternative for LLM Evaluation
+BERT-as-a-Judge is a family of encoder-based models designed for efficient, reference-based evaluation of LLM outputs. By moving beyond rigid lexical matching (like Exact Match or ROUGE), these models assess **semantic correctness**, allowing for variations in phrasing and formatting while maintaining a fraction of the computational cost of LLM-as-a-Judge approaches.
+## Model Summary
+- **Paper:** [BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation](URL_TO_PAPER)
+- **Model Type:** Encoder-based Judge (EuroBERT-210m backbone)
+- **Language:** English
+---
+## Model Variations & Collection Overview
+The models are named using the convention: `BERTJudge-<Output_Guidelines>-<Input_Format>-<Additional_Info>`.
+### Naming Convention Breakdown:
+* **Output Guidelines:** * `Free`: Trained on unconstrained model outputs.
+    * `Formatted`: Trained on outputs constrained by specific instructions (e.g., "Conclude with Answer: [X]").
+* **Input Format:** * `QCR`: Input contains [Question, Candidate, Reference].
+    * `CR`: Input contains only [Candidate, Reference].
+* **Additional Info:** * `OOD`: Evaluates Out-of-Distribution performance (certain generative models excluded from training).
+    * `100k/200k/500k`: Number of training steps (Default is 1 Million).
+### Model Selection Table
+| Model Name | Input Format | Guidelines | Training Steps | OOD Tested |
+| :--- | :---: | :---: | :---: | :---: |
+| **BERTJudge-Free-QCR** | QCR | Free | 1M | No |
+| **BERTJudge-Formatted-QCR** | QCR | Formatted | 1M | No |
+| **BERTJudge-Free-CR** | CR | Free | 1M | No |
+| **BERTJudge-Free-QCR-OOD** | QCR | Free | 1M | **Yes** |
+| **BERTJudge-Free-QCR-100k** | QCR | Free | 100k | No |
+| **BERTJudge-Free-QCR-200k** | QCR | Free | 200k | No |
+| **BERTJudge-Free-QCR-500k** | QCR | Free | 500k | No |
+---
+## Intended Use
+### How to Use
+These models are typically used as sequence classifiers that output a score (0 for incorrect, 1 for correct).
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+model_name = "hgissbkh/BERTJudge-Free-QCR"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+question = "What is the capital of France?"
+reference = "Paris"
+candidate = "The capital city is Paris."
+# Construct input based on model type (QCR)
+input_text = f"Question: {question} Reference: {reference} Candidate: {candidate}"
+inputs = tokenizer(input_text, return_tensors="pt")
+with torch.no_grad():
+    logits = model(**inputs).logits
+    prediction = torch.argmax(logits, dim=-1)
+print("Correct" if prediction.item() == 1 else "Incorrect")