Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
datasets:
|
| 3 |
+
- lytang/LLM-AggreFact
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
+
pipeline_tag: text-classification
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# Model Summary
|
| 10 |
+
|
| 11 |
+
This is a fact-checking model from the work ([GitHub Repo](https://github.com/Liyan06/MiniCheck)):
|
| 12 |
+
|
| 13 |
+
📃 **MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents** ([link](https://arxiv.org/pdf/2404.10774.pdf))
|
| 14 |
+
|
| 15 |
+
The model is based on Flan-T5-Large that predicts a binary label - 1 for supported and 0 for unsupported.
|
| 16 |
+
The model is doing predictions on the *sentence-level*. It takes as input a document and a sentence and determine
|
| 17 |
+
whether the sentence is supported by the document: **MiniCheck-Model(document, claim) -> {0, 1}**
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
MiniCheck-Flan-T5-Large is fine tuned from `google/flan-t5-large` ([Chung et al., 2022](https://arxiv.org/pdf/2210.11416.pdf))
|
| 21 |
+
on the combination of 35K data:
|
| 22 |
+
- 21K ANLI data ([Nie et al., 2020](https://aclanthology.org/2020.acl-main.441.pdf))
|
| 23 |
+
- 14K synthetic data generated from scratch in a structed way (more details in the paper).
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
### Model Variants
|
| 27 |
+
We also have other two MiniCheck model variants:
|
| 28 |
+
- [lytang/MiniCheck-RoBERTa-Large](https://huggingface.co/lytang/MiniCheck-RoBERTa-Large)
|
| 29 |
+
- [lytang/MiniCheck-DeBERTa-v3-Large](https://huggingface.co/lytang/MiniCheck-DeBERTa-v3-Large)
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
### Model Performance
|
| 33 |
+
The performance of these model is evaluated on our new collected benchmark, [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact),
|
| 34 |
+
from 10 recent human annotated datasets on fact-checking and grounding LLM generations. Our most capable model MiniCheck-Flan-T5-Large outperform all
|
| 35 |
+
exisiting specialized fact-checkers with a similar scale by a large margin and is on par with GPT-4. See full results in our work.
|
| 36 |
+
|
| 37 |
+
# Model Usage Demo
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
|