samerzaher80 commited on
Commit
4610f03
·
verified ·
1 Parent(s): 439e750

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -2
README.md CHANGED
@@ -93,8 +93,7 @@ The following datasets were **not used** during training or distillation. All re
93
  | **HANS** | Heuristic / syntactic bias test | Zero-shot |
94
  | **SciTail** | Science-domain entailment | Evaluated in binary setting |
95
  | **XNLI (English)** | Cross-lingual NLI test | Zero-shot on English split |
96
- | **FEVER** | Fact verification | Zero-shot probing |
97
- | **MedNLI** | Clinical NLI | Not used (access restricted) |
98
 
99
  ---
100
 
@@ -201,6 +200,21 @@ SciTail originally has entailment vs neutral classes. For evaluation, the model
201
 
202
  This demonstrates strong cross-domain and cross-benchmark generalization, even without explicit multilingual or XNLI-specific training.
203
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
204
  ---
205
 
206
  ## ⚡ Efficiency
 
93
  | **HANS** | Heuristic / syntactic bias test | Zero-shot |
94
  | **SciTail** | Science-domain entailment | Evaluated in binary setting |
95
  | **XNLI (English)** | Cross-lingual NLI test | Zero-shot on English split |
96
+
 
97
 
98
  ---
99
 
 
200
 
201
  This demonstrates strong cross-domain and cross-benchmark generalization, even without explicit multilingual or XNLI-specific training.
202
 
203
+ ## Results
204
+
205
+ | Task | Dataset | Split | Accuracy | Macro-F1 |
206
+ |------|---------|--------|-----------|-----------|
207
+ | Natural Language Inference | MNLI (matched) | validation | 90.47% | 90.42% |
208
+ | Natural Language Inference | MNLI (mismatched) | validation | 90.12% | 90.07% |
209
+ | Natural Language Inference | SNLI | test | ~88–89% | ~88–89% |
210
+ | Adversarial NLI | ANLI R1 | test_r1 | 73.60% | 73.61% |
211
+ | Adversarial NLI | ANLI R2 | test_r2 | 57.70% | 57.60% |
212
+ | Adversarial NLI | ANLI R3 | test_r3 | 53.67% | 53.68% |
213
+ | Zero-shot | RTE (GLUE) | validation | 86.28% | 86.20% |
214
+ | Zero-shot | HANS | validation | 77.74% | 76.60% |
215
+ | Zero-shot (binary) | SciTail | dev | 78.83% | 78.81% |
216
+ | Zero-shot | XNLI (English) | test | 90.92% | 90.94% |
217
+
218
  ---
219
 
220
  ## ⚡ Efficiency