Update README.md
Browse files
README.md
CHANGED
|
@@ -10,10 +10,9 @@ pipeline_tag: image-text-to-text
|
|
| 10 |
# TruthfulJudge
|
| 11 |
|
| 12 |
TruthfulJudge is a reliable evaluation pipeline designed to mitigate the pitfalls of AI-as-judge setups. Our methodology emphasizes in-depth human involvement to prevent feedback loops of hallucinated errors, ensuring faithful assessment of multimodal model truthfulness. Our specialized judge model, TruthfulJudge, is well-calibrated (ECE=0.11), self-consistent, and highly inter-annotator agreed (Cohen's κ = 0.79), achieving 88.4% judge accuracy.
|
|
|
|
| 13 |
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
## Installation
|
| 17 |
|
| 18 |
```bash
|
| 19 |
pip install vllm transformers torch pillow
|
|
|
|
| 10 |
# TruthfulJudge
|
| 11 |
|
| 12 |
TruthfulJudge is a reliable evaluation pipeline designed to mitigate the pitfalls of AI-as-judge setups. Our methodology emphasizes in-depth human involvement to prevent feedback loops of hallucinated errors, ensuring faithful assessment of multimodal model truthfulness. Our specialized judge model, TruthfulJudge, is well-calibrated (ECE=0.11), self-consistent, and highly inter-annotator agreed (Cohen's κ = 0.79), achieving 88.4% judge accuracy.
|
| 13 |
+
This model is a pairwise critique-label judge trained to judge the preference of two responses to TruthfulVQA dataset open-ended questions.
|
| 14 |
|
| 15 |
+
## Dependencies
|
|
|
|
|
|
|
| 16 |
|
| 17 |
```bash
|
| 18 |
pip install vllm transformers torch pillow
|