Update README.md
Browse files
README.md
CHANGED
|
@@ -34,29 +34,51 @@ Models follow a standardized naming structure: `BERTJudge-<Candidate_Format>-<In
|
|
| 34 |
|
| 35 |
## Intended Use
|
| 36 |
|
| 37 |
-
These models
|
| 38 |
|
| 39 |
-
|
| 40 |
-
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 41 |
-
import torch
|
| 42 |
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
|
|
|
|
|
|
| 46 |
|
| 47 |
-
|
| 48 |
-
reference = "Paris"
|
| 49 |
-
candidate = "The capital city is Paris."
|
| 50 |
|
| 51 |
-
|
| 52 |
-
input_text = f"<|question|>{question}<|candidate|>{candidate}<|reference|>{reference}"
|
| 53 |
-
inputs = tokenizer(input_text, return_tensors="pt")
|
| 54 |
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
```
|
| 61 |
|
| 62 |
## Citation
|
|
|
|
| 34 |
|
| 35 |
## Intended Use
|
| 36 |
|
| 37 |
+
These models are designed as sequence classifiers that output a sigmoid score indicating answer correctness. For inference, we recommend using the [BERT-as-a-Judge](https://github.com/artefactory/BERT-as-a-Judge) package. In general settings, we further recommend **BERTJudge-Free-QCR**, as it provides the strongest and most robust evaluation performance.
|
| 38 |
|
| 39 |
+
### Installation
|
|
|
|
|
|
|
| 40 |
|
| 41 |
+
```zsh
|
| 42 |
+
git clone [https://github.com/artefactory/BERT-as-a-Judge.git](https://github.com/artefactory/BERT-as-a-Judge.git)
|
| 43 |
+
cd BERT-as-a-Judge
|
| 44 |
+
pip install -e .
|
| 45 |
+
```
|
| 46 |
|
| 47 |
+
### Usage
|
|
|
|
|
|
|
| 48 |
|
| 49 |
+
Example:
|
|
|
|
|
|
|
| 50 |
|
| 51 |
+
```python
|
| 52 |
+
from bert_judge.judges import BERTJudge
|
| 53 |
+
|
| 54 |
+
# 1) Initialize the judge
|
| 55 |
+
judge = BERTJudge(
|
| 56 |
+
model_path="hgissbkh/BERTJudge-Free-QCR",
|
| 57 |
+
trust_remote_code=True,
|
| 58 |
+
dtype="bfloat16",
|
| 59 |
+
)
|
| 60 |
|
| 61 |
+
# 2) Define one question, one reference, and several candidate answers
|
| 62 |
+
question = "What is the capital of France?"
|
| 63 |
+
reference = "Paris"
|
| 64 |
+
candidates = [
|
| 65 |
+
"Paris.",
|
| 66 |
+
"The capital of France is Paris.",
|
| 67 |
+
"I'm hesitating between Paris and London. I would say Paris.",
|
| 68 |
+
"London.",
|
| 69 |
+
"The capital of France is London.",
|
| 70 |
+
"I'm hesitating between Paris and London. I would say London.",
|
| 71 |
+
]
|
| 72 |
+
|
| 73 |
+
# 3) Predict scores (one score per candidate)
|
| 74 |
+
scores = judge.predict(
|
| 75 |
+
questions=[question] * len(candidates),
|
| 76 |
+
references=[reference] * len(candidates),
|
| 77 |
+
candidates=candidates,
|
| 78 |
+
batch_size=1,
|
| 79 |
+
)
|
| 80 |
+
|
| 81 |
+
print(scores)
|
| 82 |
```
|
| 83 |
|
| 84 |
## Citation
|