Update README.md
Browse files
README.md
CHANGED
|
@@ -4,6 +4,8 @@ language:
|
|
| 4 |
- en
|
| 5 |
metrics:
|
| 6 |
- bertscore
|
|
|
|
|
|
|
| 7 |
base_model:
|
| 8 |
- facebook/bart-base
|
| 9 |
---
|
|
@@ -120,16 +122,25 @@ This model does not explicitly disaggregate results by demographic group, signer
|
|
| 120 |
|
| 121 |
#### Metrics
|
| 122 |
|
| 123 |
-
- **Primary metric**: BERTScore (F1)
|
| 124 |
- **Model selection**: Best checkpoint based on highest validation BERTScore-F1
|
| 125 |
-
- BERTScore is
|
| 126 |
|
| 127 |
### Results
|
| 128 |
|
| 129 |
-
After 2 epochs of training, the model achieved:
|
| 130 |
|
| 131 |
-
- **BERTScore-F1**: 0.83
|
| 132 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 133 |
|
| 134 |
|
| 135 |
#### Summary
|
|
@@ -142,4 +153,4 @@ This model demonstrates strong potential for gloss-to-English translation, with
|
|
| 142 |
|
| 143 |
## Model Card Contact
|
| 144 |
|
| 145 |
-
- rrrr66254@gmail.com
|
|
|
|
| 4 |
- en
|
| 5 |
metrics:
|
| 6 |
- bertscore
|
| 7 |
+
- bleu
|
| 8 |
+
- rouge
|
| 9 |
base_model:
|
| 10 |
- facebook/bart-base
|
| 11 |
---
|
|
|
|
| 122 |
|
| 123 |
#### Metrics
|
| 124 |
|
| 125 |
+
- **Primary metric**: BERTScore (F1), BLEU, and ROUGE
|
| 126 |
- **Model selection**: Best checkpoint based on highest validation BERTScore-F1
|
| 127 |
+
- BERTScore is used to evaluate semantic alignment, while BLEU and ROUGE provide additional insight into surface-level n-gram overlap. All metrics were evaluated using the same held-out set of 500 gloss-reference pairs.
|
| 128 |
|
| 129 |
### Results
|
| 130 |
|
| 131 |
+
After 2 epochs of training, the model achieved the following on the 500-pair evaluation set:
|
| 132 |
|
| 133 |
+
- **BERTScore-F1**: 0.83
|
| 134 |
+
- **BLEU Scores**:
|
| 135 |
+
- BLEU-1: 0.7063
|
| 136 |
+
- BLEU-2: 0.6175
|
| 137 |
+
- BLEU-3: 0.5479
|
| 138 |
+
- BLEU-4: 0.4821
|
| 139 |
+
- **ROUGE Scores**:
|
| 140 |
+
- ROUGE-1: 0.7587
|
| 141 |
+
- ROUGE-2: 0.5874
|
| 142 |
+
- ROUGE-L: 0.7312
|
| 143 |
+
- Qualitative inspection shows that most model outputs are fluent and contextually accurate. Common errors include omission of function words and minor verb tense mismatches.
|
| 144 |
|
| 145 |
|
| 146 |
#### Summary
|
|
|
|
| 153 |
|
| 154 |
## Model Card Contact
|
| 155 |
|
| 156 |
+
- rrrr66254@gmail.com
|