Upload README.md
Browse files
README.md
CHANGED
|
@@ -93,8 +93,7 @@ The following datasets were **not used** during training or distillation. All re
|
|
| 93 |
| **HANS** | Heuristic / syntactic bias test | Zero-shot |
|
| 94 |
| **SciTail** | Science-domain entailment | Evaluated in binary setting |
|
| 95 |
| **XNLI (English)** | Cross-lingual NLI test | Zero-shot on English split |
|
| 96 |
-
|
| 97 |
-
| **MedNLI** | Clinical NLI | Not used (access restricted) |
|
| 98 |
|
| 99 |
---
|
| 100 |
|
|
@@ -201,6 +200,21 @@ SciTail originally has entailment vs neutral classes. For evaluation, the model
|
|
| 201 |
|
| 202 |
This demonstrates strong cross-domain and cross-benchmark generalization, even without explicit multilingual or XNLI-specific training.
|
| 203 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 204 |
---
|
| 205 |
|
| 206 |
## ⚡ Efficiency
|
|
|
|
| 93 |
| **HANS** | Heuristic / syntactic bias test | Zero-shot |
|
| 94 |
| **SciTail** | Science-domain entailment | Evaluated in binary setting |
|
| 95 |
| **XNLI (English)** | Cross-lingual NLI test | Zero-shot on English split |
|
| 96 |
+
|
|
|
|
| 97 |
|
| 98 |
---
|
| 99 |
|
|
|
|
| 200 |
|
| 201 |
This demonstrates strong cross-domain and cross-benchmark generalization, even without explicit multilingual or XNLI-specific training.
|
| 202 |
|
| 203 |
+
## Results
|
| 204 |
+
|
| 205 |
+
| Task | Dataset | Split | Accuracy | Macro-F1 |
|
| 206 |
+
|------|---------|--------|-----------|-----------|
|
| 207 |
+
| Natural Language Inference | MNLI (matched) | validation | 90.47% | 90.42% |
|
| 208 |
+
| Natural Language Inference | MNLI (mismatched) | validation | 90.12% | 90.07% |
|
| 209 |
+
| Natural Language Inference | SNLI | test | ~88–89% | ~88–89% |
|
| 210 |
+
| Adversarial NLI | ANLI R1 | test_r1 | 73.60% | 73.61% |
|
| 211 |
+
| Adversarial NLI | ANLI R2 | test_r2 | 57.70% | 57.60% |
|
| 212 |
+
| Adversarial NLI | ANLI R3 | test_r3 | 53.67% | 53.68% |
|
| 213 |
+
| Zero-shot | RTE (GLUE) | validation | 86.28% | 86.20% |
|
| 214 |
+
| Zero-shot | HANS | validation | 77.74% | 76.60% |
|
| 215 |
+
| Zero-shot (binary) | SciTail | dev | 78.83% | 78.81% |
|
| 216 |
+
| Zero-shot | XNLI (English) | test | 90.92% | 90.94% |
|
| 217 |
+
|
| 218 |
---
|
| 219 |
|
| 220 |
## ⚡ Efficiency
|