Update README.md
Browse files
README.md
CHANGED
|
@@ -7,11 +7,29 @@ metrics:
|
|
| 7 |
- bleurt
|
| 8 |
- bleu
|
| 9 |
- bertscore
|
| 10 |
-
pipeline_tag: sentence-similarity
|
| 11 |
---
|
| 12 |
# AlignScoreCS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
MultiTask multilingual model for assessing facticity in various NLU tasks in Czech and English language. We followed the initial paper AlignScore https://arxiv.org/abs/2305.16739.
|
| 14 |
-
We trained a model using a shared architecture of checkpoint xlm-roberta-large https://huggingface.co/FacebookAI/xlm-roberta-large with three linear layers for regression,
|
| 15 |
binary classification and ternary classification.
|
| 16 |
|
| 17 |
|
|
|
|
| 7 |
- bleurt
|
| 8 |
- bleu
|
| 9 |
- bertscore
|
|
|
|
| 10 |
---
|
| 11 |
# AlignScoreCS
|
| 12 |
+
|
| 13 |
+
A MultiTask multilingual model is developed to assess factual consistency in context-claim pairs across various Natural Language Understanding (NLU) tasks,
|
| 14 |
+
including Summarization, Question Answering (QA), Semantic Textual Similarity (STS), Paraphrase, Fact Verification (FV), and Natural Language Inference (NLI).
|
| 15 |
+
AlignScoreCS is fine-tuned on a vast multi-task dataset consisting of 7 million documents, encompassing these NLU tasks in both Czech and English languages.
|
| 16 |
+
Its multilingual pre-training enables its potential utilization in various other languages. The architecture is capable of processing tasks using regression,
|
| 17 |
+
binary classification, or ternary classification, although for evaluation purposes, we recommend employing the AlignScore function.
|
| 18 |
+
|
| 19 |
+
This work is influenced by its English counterpart [AlignScore: Evaluating Factual Consistency with a Unified Alignment Function](https://arxiv.org/abs/2305.16739).
|
| 20 |
+
However, we employed homogeneous batches instead of heterogeneous ones during training and utilized three distinct architectures sharing a single encoder.
|
| 21 |
+
This setup allows for the independent use of each architecture with its classification head.
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
## Evaluation
|
| 25 |
+
As in the paper AlignScore, we use their AlignScore function which chunk context into roughly 350 tokens and splits claim into sentences
|
| 26 |
+
each context chunk is evaluated against each claim sentence and aggregated one consistency score
|
| 27 |
+
|
| 28 |
+
AlignScoreCS model is built on three XLM-RoBERTa architectures sharing one encoder
|
| 29 |
+
|
| 30 |
+
|
| 31 |
MultiTask multilingual model for assessing facticity in various NLU tasks in Czech and English language. We followed the initial paper AlignScore https://arxiv.org/abs/2305.16739.
|
| 32 |
+
We trained a model using a shared architecture of checkpoint xlm-roberta-large [xlm-roberta](https://huggingface.co/FacebookAI/xlm-roberta-large) with three linear layers for regression,
|
| 33 |
binary classification and ternary classification.
|
| 34 |
|
| 35 |
|